diff --git a/docs/doc-remove-worker-flag.md b/docs/doc-remove-worker-flag.md new file mode 100644 index 0000000..5f88812 --- /dev/null +++ b/docs/doc-remove-worker-flag.md @@ -0,0 +1,56 @@ +## **Remove Worker flag from OKD Control Planes** + +### **Context** +On OKD user provisioned infrastructure the control plane nodes can have the flag node-role.kubernetes.io/worker which allows non critical workloads to be scheduled on the control-planes + +### **Observed Symptoms** +- After adding HAProxy servers to the backend each back end appears down +- Traffic is redirected to the control planes instead of workers +- The pods router-default are incorrectly applied on the control planes rather than on the workers +- Pods are being scheduled on the control planes causing cluster instability + +``` + ss -tlnp | grep 80 +``` +- shows process haproxy is listening at 0.0.0.0:80 on cps +- same problem for port 443 +- In namespace rook-ceph certain pods are deploted on cps rather than on worker nodes + + ### **Cause** + - when intalling UPI, the roles (master, worker) are not managed by the Machine Config operator and the cps are made schedulable by default. + + ### **Diagnostic** +check node labels: +``` + oc get nodes --show-labels | grep control-plane +``` +Inspecter kubelet configuration: + +``` +cat /etc/systemd/system/kubelet.service +``` + +find the line: +``` + --node-labels=node-role.kubernetes.io/control-plane,node-role.kubernetes.io/master,node-role.kubernetes.io/worker +``` + → presence of label worker confirms the problem. + +Verify the flag doesnt come from MCO +``` + oc get machineconfig | grep rendered-master +``` + +**Solution:** +To make the control planes non schedulable you must patch the cluster scheduler resource + +``` +oc patch scheduler cluster --type merge -p '{"spec":{"mastersSchedulable":false}}' +``` +after the patch is applied the workloads can be deplaced by draining the nodes + +``` +oc adm cordon +oc adm drain --ignore-daemonsets –delete-emptydir-data +``` +