BUG OKD "tcp server port" check is not enough when a node is half broken #163

Open
opened 2025-09-29 20:00:45 +00:00 by johnride · 0 comments
Owner

Seen in production :

cp0 node dies in a cluster

k get nodes shows node is dead

haproxy still sending traffic on ports 80 and 443 (probably 22623 too) as only the api server readyz health check failed so only port 6443 was corectly marked as down.

We need to figure out a better production configuration for this health check. When the server shuts down, or somehow the networking becomdes completely unavailable, tcp serverport is enough.

The correct solution would probably be to perform a full on http request on all ports.

Seen in production : cp0 node dies in a cluster k get nodes shows node is dead haproxy still sending traffic on ports 80 and 443 (probably 22623 too) as only the api server readyz health check failed so only port 6443 was corectly marked as down. We need to figure out a better production configuration for this health check. When the server shuts down, or somehow the networking becomdes completely unavailable, tcp serverport is enough. The correct solution would probably be to perform a full on http request on all ports.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: NationTech/harmony#163
No description provided.