harmony/docs/doc-remove-worker-flag.md

1.8 KiB
Raw Blame History

Remove Worker flag from OKD Control Planes

Context

On OKD user provisioned infrastructure the control plane nodes can have the flag node-role.kubernetes.io/worker which allows non critical workloads to be scheduled on the control-planes

Observed Symptoms

  • After adding HAProxy servers to the backend each back end appears down
  • Traffic is redirected to the control planes instead of workers
  • The pods router-default are incorrectly applied on the control planes rather than on the workers
  • Pods are being scheduled on the control planes causing cluster instability
  ss -tlnp | grep 80
  • shows process haproxy is listening at 0.0.0.0:80 on cps
  • same problem for port 443
  • In namespace rook-ceph certain pods are deploted on cps rather than on worker nodes

Cause

  • when intalling UPI, the roles (master, worker) are not managed by the Machine Config operator and the cps are made schedulable by default.

Diagnostic

check node labels:

   oc get nodes --show-labels | grep control-plane

Inspecter kubelet configuration:

cat /etc/systemd/system/kubelet.service

find the line:

   --node-labels=node-role.kubernetes.io/control-plane,node-role.kubernetes.io/master,node-role.kubernetes.io/worker

→ presence of label worker confirms the problem.

Verify the flag doesnt come from MCO

   oc get machineconfig | grep rendered-master

Solution: To make the control planes non schedulable you must patch the cluster scheduler resource

oc patch scheduler cluster --type merge -p '{"spec":{"mastersSchedulable":false}}'

after the patch is applied the workloads can be deplaced by draining the nodes

oc adm cordon <cp-node>
oc adm drain <cp-node> --ignore-daemonsets delete-emptydir-data