127. Cilium-operator 无法在运行 RKE2-Cilium v1.18.0 或 v1.18.1 的 RKE2 集群中的控制平面/ETCCD 角色节点上调度

张开发
2026/4/7 22:33:47 15 分钟阅读

分享文章

127. Cilium-operator 无法在运行 RKE2-Cilium v1.18.0 或 v1.18.1 的 RKE2 集群中的控制平面/ETCCD 角色节点上调度
Situation 地理位置In a Rancher-provisioned RKE2 cluster, running rke2-cilium v1.18.0 or v1.18.1, with separate controlplane/etcd and worker role nodes, the cilium-operator Pods remain in a Pending state during cluster provisioning. The cilium-operator Pods have FailedScheduling events with a message 0/1 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/etcd: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling. As a result the cluster provisioning fails to progress.在 Rancher 配置的 RKE2 集群中运行 rke2-cilium v1.18.0 或 v1.18.1拥有独立的控制平面/etcd 和工作者角色节点纤毛操作员 Pod 在集群配置期间保持待处理状态。纤毛操作员的 Pods 有 FailedScheduling 事件提示为“0/1 节点可用1 个节点有不可容忍的污染 {node-role.kubernetes.io/etcd }。抢占可用节点为 0/1抢占对调度无益。”结果集群配置无法推进。$ kubectl -n kube-system get pods -l app.kubernetes.io/namecilium-operator NAME READY STATUS cilium-operator-59fcfc5dbb-2b5jm 0/1 Pending cilium-operator-59fcfc5dbb-4cqm9 0/1 Pending $ kubectl -n kube-system describe pod cilium-operator-59fcfc5dbb-2b5jm [...] Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 11m default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/etcd: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling. Warning FailedScheduling 52s (x2 over 5m52s) default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/etcd: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.Resolution 结局To resolve the issue, upgrade to a later RKE2 release, running rke2-cilium v1.18.2.为了解决这个问题升级到运行 rke2-cilium v1.18.2的后期 RKE2 版本。To workaround the issue in affected versions, the toleration below should be added to the cilium-operator in the rke2-cilium chart.为了在受影响版本中解决这个问题下面的耐受度应添加到 rke2-纤毛表中的纤毛算子中。- key: node-role.kubernetes.io/etcd operator: ExistsTo add this toleration:补充一下容忍度Navigate toCluster Managementwithin the Rancher UI在 Rancher 界面内导航到集群管理ClickEdit Configfor the affected cluster.点击编辑配置以查看受影响的集群。UnderCluster ConfigurationclickAdd-on: Cilium在集群配置中点击附加组件CiliumScroll down to the operator.tolerations block and add the node-role.kubernetes.io/etcd toleration:向下滚动到 operator.tolerations 块添加 node-role.kubernetes.io/etcd 容忍度[...] operator: [...] tolerations: - key: node-role.kubernetes.io/etcd operator: Exists - key: node-role.kubernetes.io/control-plane operator: Exists - key: node-role.kubernetes.io/master operator: Exists - key: node.kubernetes.io/not-ready operator: Exists [...]ClickSaveto update rke2-cilium with the new toleration点击保存以更新 rke2-cilium 的新耐受性Cause 病因The behaviour is caused by a change in the default calico-operator tolerations in the upstream cilium Helm chart and was reported upstream in cilium/41921. The issue is tracked in rke2/8974 and resolved by including the required node-role.kubernetes.io/etcd toleration in the default values for the rke2-cilium chart v1.18.2.该行为由上游纤毛 Helm 图表中默认三花猫耐受性变化引起并在纤毛/41921 上游报告。该问题在 rke2/8974 中被追踪并通过在 rke2-cilium v1.18.2的默认值中包含所需的 node-role.kubernetes.io/etcd 耐受度来解决。Additional Information 附加信息Environment 环境A Rancher-provisioned RKE2 cluster running rke2-cilium v1.18.0 or v1.18.1 (affected RKE2 versions are v1.30.14rke2r4, v1.31.12rke2r1 - v1.31.13rke2r1, v1.32.8rke2r1 - v1.32.9rke2r1, v1.33.4rke2r1 - v1.33.5rke2r1, and v1.34.1rke2r1).一个由 Rancher 配置的 RKE2 集群运行 rke2-cilium v1.18.0 或 v1.18.1受影响的 RKE2 版本包括 v1.30.14rke2r4、v1.31.12rke2r1 - v1.31.13rke2r1、v1.32.8rke2r1 - v1.32.9rke2r1、v1.33.4rke2r1 - v1.33.5rke2r1 以及 v1.34.1rke2r1。Separate controlplane/etcd and worker role nodes.控制平面/etcd 和工人角色节点分开。访问Rancher-K8S解决方案博主企业合作伙伴 https://blog.csdn.net/lidw2009

更多文章