6.4 KiB
harmony-node-readiness-endpoint
A lightweight, standalone Rust service for Kubernetes node health checking.
Designed for bare-metal Kubernetes clusters with external load balancers (HAProxy, OPNsense, F5, etc.).
It exposes a simple, reliable HTTP endpoint (/health) on each node that returns:
- 200 OK — node is healthy and ready to receive traffic
- 503 Service Unavailable — node should be removed from the load balancer pool
This project is not dependent on Harmony, but is commonly used as part of Harmony bare-metal Kubernetes deployments.
Why this project exists
In bare-metal environments, external load balancers often rely on pod-level or router-level checks that can lag behind the authoritative Kubernetes Node.status.conditions[Ready].
This service provides the true source-of-truth with fast reaction time.
Features & Roadmap
| Check | Description | Status | Check Name |
|---|---|---|---|
| Node readiness (API) | Queries Node.status.conditions[Ready] via Kubernetes API |
Implemented | node_ready |
| OKD Router health | Probes OpenShift router healthz on port 1936 | Implemented | okd_router_1936 |
| Filesystem readonly | Detects read-only mounts via /proc/mounts |
To be implemented | filesystem_ro |
| Kubelet running | Local probe to kubelet /healthz (port 10248) |
To be implemented | kubelet |
| CRI-O / container runtime health | Socket check + runtime status | To be implemented | container_runtime |
| Disk / inode pressure | Threshold checks on key filesystems | To be implemented | disk_pressure |
| Network reachability | DNS resolution + gateway connectivity | To be implemented | network |
| Custom NodeConditions | Reacts to extra conditions (NPD, etc.) | To be implemented | custom_conditions |
All checks are combined with logical AND — any failure results in 503.
How it works
Node Name Discovery
The service automatically discovers its own node name using the Kubernetes Downward API:
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
Kubernetes API Authentication
- Uses standard in-cluster configuration (no external credentials needed).
- The ServiceAccount token and CA certificate are automatically mounted by Kubernetes at
/var/run/secrets/kubernetes.io/serviceaccount/. - The application (via
kube-rsor your Harmony higher-level client) calls the equivalent ofConfig::incluster_config(). - Requires only minimal RBAC:
getpermission on thenodesresource (seedeploy/rbac.yaml).
Quick Start
1. Build and push
cargo build --release --bin harmony-node-readiness-endpoint
docker build -t your-registry/harmony-node-readiness-endpoint:v1.0.0 .
docker push your-registry/harmony-node-readiness-endpoint:v1.0.0
2. Deploy
kubectl apply -f deploy/namespace.yaml
kubectl apply -f deploy/rbac.yaml
kubectl apply -f deploy/daemonset.yaml
(The DaemonSet uses hostPort: 25001 by default so the endpoint is reachable directly on the node's IP.)
3. Configure your external load balancer
Example for HAProxy / OPNsense:
- Check type: HTTP
- URI:
/health - Port:
25001(configurable viaLISTEN_PORT) - Interval: 5–10 s
- Rise: 2
- Fall: 3
- Expect:
2xx
Health Endpoint Examples
Query Parameter
Use the check query parameter to specify which checks to run. Multiple checks can be comma-separated.
| Request | Behavior |
|---|---|
GET /health |
Runs node_ready (default) |
GET /health?check=okd_router_1936 |
Runs only OKD router check |
GET /health?check=node_ready,okd_router_1936 |
Runs both checks |
Note: When the check parameter is provided, only the specified checks run. You must explicitly include node_ready if you want it along with other checks.
Response Format
Each check result includes:
name: The check identifierpassed: Boolean indicating success or failurereason: (Optional) Failure reason if the check failedduration_ms: Time taken to execute the check in milliseconds
Healthy node (default check)
HTTP/1.1 200 OK
Content-Type: application/json
{
"status": "ready",
"checks": [
{
"name": "node_ready",
"passed": true,
"duration_ms": 42
}
]
}
Healthy node (multiple checks)
GET /health?check=node_ready,okd_router_1936
HTTP/1.1 200 OK
Content-Type: application/json
{
"status": "ready",
"checks": [
{
"name": "node_ready",
"passed": true,
"duration_ms": 38
},
{
"name": "okd_router_1936",
"passed": true,
"duration_ms": 12
}
]
}
Unhealthy node (one check failed)
GET /health?check=node_ready,okd_router_1936
HTTP/1.1 503 Service Unavailable
Content-Type: application/json
{
"status": "not-ready",
"checks": [
{
"name": "node_ready",
"passed": true,
"duration_ms": 41
},
{
"name": "okd_router_1936",
"passed": false,
"reason": "Failed to connect to OKD router: connection refused",
"duration_ms": 5
}
]
}
Unhealthy node (default check)
HTTP/1.1 503 Service Unavailable
Content-Type: application/json
{
"status": "not-ready",
"checks": [
{
"name": "node_ready",
"passed": false,
"reason": "KubeletNotReady",
"duration_ms": 35
}
]
}
Configuration (via DaemonSet env vars)
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: LISTEN_PORT
value: "25001"
Checks are selected via the check query parameter on the /health endpoint. See the usage examples above.
Development
# Run locally (set NODE_NAME env var)
NODE_NAME=my-test-node cargo run
Minimal, auditable, and built for production bare-metal Kubernetes environments.