Files
harmony/harmony_node_readiness

harmony-node-readiness-endpoint

A lightweight, standalone Rust service for Kubernetes node health checking.

Designed for bare-metal Kubernetes clusters with external load balancers (HAProxy, OPNsense, F5, etc.).

Exposes a simple HTTP endpoint (/health) on each node:

  • 200 OK — node is healthy and ready to receive traffic
  • 503 Service Unavailable — node should be removed from the load balancer pool
  • 500 Internal Server Error — misconfiguration (e.g. NODE_NAME not set)

This project is not dependent on Harmony, but is commonly used as part of Harmony bare-metal Kubernetes deployments.

Why this project exists

In bare-metal environments, external load balancers often rely on pod-level or router-level checks that can lag behind the authoritative Kubernetes Node.status.conditions[Ready].
This service provides the true source-of-truth with fast reaction time.

Available checks

Check name Description Status
node_ready Queries Node.status.conditions[Ready] via Kubernetes API Implemented
okd_router_1936 Probes OpenShift router /healthz/ready on port 1936 Implemented
filesystem_ro Detects read-only mounts via /proc/mounts To be implemented
kubelet Local probe to kubelet /healthz (port 10248) To be implemented
container_runtime Socket check + runtime status To be implemented
disk_pressure Threshold checks on key filesystems To be implemented
network DNS resolution + gateway connectivity To be implemented
custom_conditions Reacts to extra conditions (NPD, etc.) To be implemented

All checks are combined with logical AND — any single failure results in 503.

Behavior

node_ready check — fail-open design

The node_ready check queries the Kubernetes API server to read Node.status.conditions[Ready]. Because this service runs on the node it is checking, there are scenarios where the API server is temporarily unreachable (e.g. during a control-plane restart). To avoid incorrectly draining a healthy node in such cases, the check is fail-open: it passes (reports ready) whenever the Kubernetes API is unavailable.

Situation Result HTTP status
Node.conditions[Ready] == True Pass 200
Node.conditions[Ready] == False Fail 503
Ready condition absent Fail 503
API server unreachable or timed out (1 s timeout) Pass (assumes ready) 200
Kubernetes client initialization failed Pass (assumes ready) 200
NODE_NAME env var not set Hard error 500

A warning is logged whenever the API is unavailable and the check falls back to assuming ready.

okd_router_1936 check

Sends GET http://127.0.0.1:1936/healthz/ready with a 5-second timeout. Returns pass on any 2xx response, fail otherwise.

Unknown check names

Requesting an unknown check name (e.g. check=bogus) results in that check returning passed: false with reason "Unknown check: bogus", and the overall response is 503.

How it works

Node name discovery

The service reads the NODE_NAME environment variable, which must be injected via the Kubernetes Downward API:

env:
- name: NODE_NAME
  valueFrom:
    fieldRef:
      fieldPath: spec.nodeName

Kubernetes API authentication

  • Uses standard in-cluster configuration — no external credentials needed.
  • The ServiceAccount token and CA certificate are automatically mounted at /var/run/secrets/kubernetes.io/serviceaccount/.
  • Requires only minimal RBAC: get and list on the nodes resource (see deploy/resources.yaml).
  • Connect and write timeouts are set to 1 second to keep checks fast.

Deploy

All Kubernetes resources (Namespace, ServiceAccount, ClusterRole, ClusterRoleBinding, and an OpenShift SCC RoleBinding for hostnetwork) are in a single file.

kubectl apply -f deploy/resources.yaml
kubectl apply -f deploy/daemonset.yaml

The DaemonSet uses hostNetwork: true and hostPort: 25001, so the endpoint is reachable directly on the node's IP at port 25001.
It tolerates all taints, ensuring it runs even on nodes marked unschedulable.

Configure your external load balancer

Example for HAProxy / OPNsense:

  • Check type: HTTP
  • URI: /health
  • Port: 25001 (configurable via LISTEN_PORT env var)
  • Interval: 510 s
  • Rise: 2
  • Fall: 3
  • Expect: 2xx

Endpoint usage

Query parameter

Use the check query parameter to select which checks to run (comma-separated).
When omitted, only node_ready runs.

Request Checks run
GET /health node_ready
GET /health?check=okd_router_1936 okd_router_1936 only
GET /health?check=node_ready,okd_router_1936 node_ready and okd_router_1936

Note: specifying check= replaces the default. Include node_ready explicitly if you need it alongside other checks.

Response format

{
  "status": "ready" | "not-ready",
  "checks": [
    {
      "name": "<check-name>",
      "passed": true | false,
      "reason": "<failure reason, omitted on success>",
      "duration_ms": 42
    }
  ],
  "total_duration_ms": 42
}

Healthy node (default)

HTTP/1.1 200 OK

{
  "status": "ready",
  "checks": [{ "name": "node_ready", "passed": true, "duration_ms": 42 }],
  "total_duration_ms": 42
}

Unhealthy node

HTTP/1.1 503 Service Unavailable

{
  "status": "not-ready",
  "checks": [
    { "name": "node_ready", "passed": false, "reason": "KubeletNotReady", "duration_ms": 35 }
  ],
  "total_duration_ms": 35
}

API server unreachable (fail-open)

HTTP/1.1 200 OK

{
  "status": "ready",
  "checks": [{ "name": "node_ready", "passed": true, "duration_ms": 1001 }],
  "total_duration_ms": 1001
}

(A warning is logged: Kubernetes API appears to be down … Assuming node is ready.)

Configuration

Env var Default Description
NODE_NAME required Node name, injected via Downward API
LISTEN_PORT 25001 TCP port the HTTP server binds to
RUST_LOG Log level (e.g. info, debug)

Development

# Run locally
NODE_NAME=my-test-node cargo run

# Run tests
cargo test

Minimal, auditable, and built for production bare-metal Kubernetes environments.