harmony/docs/doc-remove-worker-flag.md

57 lines
1.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## **Remove Worker flag from OKD Control Planes**
### **Context**
On OKD user provisioned infrastructure the control plane nodes can have the flag node-role.kubernetes.io/worker which allows non critical workloads to be scheduled on the control-planes
### **Observed Symptoms**
- After adding HAProxy servers to the backend each back end appears down
- Traffic is redirected to the control planes instead of workers
- The pods router-default are incorrectly applied on the control planes rather than on the workers
- Pods are being scheduled on the control planes causing cluster instability
```
ss -tlnp | grep 80
```
- shows process haproxy is listening at 0.0.0.0:80 on cps
- same problem for port 443
- In namespace rook-ceph certain pods are deploted on cps rather than on worker nodes
### **Cause**
- when intalling UPI, the roles (master, worker) are not managed by the Machine Config operator and the cps are made schedulable by default.
### **Diagnostic**
check node labels:
```
oc get nodes --show-labels | grep control-plane
```
Inspecter kubelet configuration:
```
cat /etc/systemd/system/kubelet.service
```
find the line:
```
--node-labels=node-role.kubernetes.io/control-plane,node-role.kubernetes.io/master,node-role.kubernetes.io/worker
```
→ presence of label worker confirms the problem.
Verify the flag doesnt come from MCO
```
oc get machineconfig | grep rendered-master
```
**Solution:**
To make the control planes non schedulable you must patch the cluster scheduler resource
```
oc patch scheduler cluster --type merge -p '{"spec":{"mastersSchedulable":false}}'
```
after the patch is applied the workloads can be deplaced by draining the nodes
```
oc adm cordon <cp-node>
oc adm drain <cp-node> --ignore-daemonsets delete-emptydir-data
```