harmony-node-readiness-endpoint
A lightweight, standalone Rust service for Kubernetes node health checking.
Designed for bare-metal Kubernetes clusters with external load balancers (HAProxy, OPNsense, F5, etc.).
Exposes a simple HTTP endpoint (/health) on each node:
- 200 OK — node is healthy and ready to receive traffic
- 503 Service Unavailable — node should be removed from the load balancer pool
- 500 Internal Server Error — misconfiguration (e.g.
NODE_NAMEnot set)
This project is not dependent on Harmony, but is commonly used as part of Harmony bare-metal Kubernetes deployments.
Why this project exists
In bare-metal environments, external load balancers often rely on pod-level or router-level checks that can lag behind the authoritative Kubernetes Node.status.conditions[Ready].
This service provides the true source-of-truth with fast reaction time.
Available checks
| Check name | Description | Status |
|---|---|---|
node_ready |
Queries Node.status.conditions[Ready] via Kubernetes API |
Implemented |
okd_router_1936 |
Probes OpenShift router /healthz/ready on port 1936 |
Implemented |
filesystem_ro |
Detects read-only mounts via /proc/mounts |
To be implemented |
kubelet |
Local probe to kubelet /healthz (port 10248) |
To be implemented |
container_runtime |
Socket check + runtime status | To be implemented |
disk_pressure |
Threshold checks on key filesystems | To be implemented |
network |
DNS resolution + gateway connectivity | To be implemented |
custom_conditions |
Reacts to extra conditions (NPD, etc.) | To be implemented |
All checks are combined with logical AND — any single failure results in 503.
Behavior
node_ready check — fail-open design
The node_ready check queries the Kubernetes API server to read Node.status.conditions[Ready].
Because this service runs on the node it is checking, there are scenarios where the API server is temporarily
unreachable (e.g. during a control-plane restart). To avoid incorrectly draining a healthy node in such cases,
the check is fail-open: it passes (reports ready) whenever the Kubernetes API is unavailable.
| Situation | Result | HTTP status |
|---|---|---|
Node.conditions[Ready] == True |
Pass | 200 |
Node.conditions[Ready] == False |
Fail | 503 |
Ready condition absent |
Fail | 503 |
| API server unreachable or timed out (1 s timeout) | Pass (assumes ready) | 200 |
| Kubernetes client initialization failed | Pass (assumes ready) | 200 |
NODE_NAME env var not set |
Hard error | 500 |
A warning is logged whenever the API is unavailable and the check falls back to assuming ready.
okd_router_1936 check
Sends GET http://127.0.0.1:1936/healthz/ready with a 5-second timeout.
Returns pass on any 2xx response, fail otherwise.
Unknown check names
Requesting an unknown check name (e.g. check=bogus) results in that check returning passed: false
with reason "Unknown check: bogus", and the overall response is 503.
How it works
Node name discovery
The service reads the NODE_NAME environment variable, which must be injected via the Kubernetes Downward API:
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
Kubernetes API authentication
- Uses standard in-cluster configuration — no external credentials needed.
- The ServiceAccount token and CA certificate are automatically mounted at
/var/run/secrets/kubernetes.io/serviceaccount/. - Requires only minimal RBAC:
getandliston thenodesresource (seedeploy/resources.yaml). - Connect and write timeouts are set to 1 second to keep checks fast.
Deploy
All Kubernetes resources (Namespace, ServiceAccount, ClusterRole, ClusterRoleBinding, and an OpenShift SCC RoleBinding for hostnetwork) are in a single file.
kubectl apply -f deploy/resources.yaml
kubectl apply -f deploy/daemonset.yaml
The DaemonSet uses hostNetwork: true and hostPort: 25001, so the endpoint is reachable directly on the node's IP at port 25001.
It tolerates all taints, ensuring it runs even on nodes marked unschedulable.
Configure your external load balancer
Example for HAProxy / OPNsense:
- Check type: HTTP
- URI:
/health - Port:
25001(configurable viaLISTEN_PORTenv var) - Interval: 5–10 s
- Rise: 2
- Fall: 3
- Expect:
2xx
Endpoint usage
Query parameter
Use the check query parameter to select which checks to run (comma-separated).
When omitted, only node_ready runs.
| Request | Checks run |
|---|---|
GET /health |
node_ready |
GET /health?check=okd_router_1936 |
okd_router_1936 only |
GET /health?check=node_ready,okd_router_1936 |
node_ready and okd_router_1936 |
Note: specifying
check=replaces the default. Includenode_readyexplicitly if you need it alongside other checks.
Response format
{
"status": "ready" | "not-ready",
"checks": [
{
"name": "<check-name>",
"passed": true | false,
"reason": "<failure reason, omitted on success>",
"duration_ms": 42
}
],
"total_duration_ms": 42
}
Healthy node (default)
HTTP/1.1 200 OK
{
"status": "ready",
"checks": [{ "name": "node_ready", "passed": true, "duration_ms": 42 }],
"total_duration_ms": 42
}
Unhealthy node
HTTP/1.1 503 Service Unavailable
{
"status": "not-ready",
"checks": [
{ "name": "node_ready", "passed": false, "reason": "KubeletNotReady", "duration_ms": 35 }
],
"total_duration_ms": 35
}
API server unreachable (fail-open)
HTTP/1.1 200 OK
{
"status": "ready",
"checks": [{ "name": "node_ready", "passed": true, "duration_ms": 1001 }],
"total_duration_ms": 1001
}
(A warning is logged: Kubernetes API appears to be down … Assuming node is ready.)
Configuration
| Env var | Default | Description |
|---|---|---|
NODE_NAME |
required | Node name, injected via Downward API |
LISTEN_PORT |
25001 |
TCP port the HTTP server binds to |
RUST_LOG |
— | Log level (e.g. info, debug) |
Development
# Run locally
NODE_NAME=my-test-node cargo run
# Run tests
cargo test
Minimal, auditable, and built for production bare-metal Kubernetes environments.