200 OK — node is healthy and ready to receive traffic
503 Service Unavailable — node should be removed from the load balancer pool
500 Internal Server Error — misconfiguration (e.g. NODE_NAME not set)

This project is not dependent on Harmony, but is commonly used as part of Harmony bare-metal Kubernetes deployments.

Why this project exists

In bare-metal environments, external load balancers often rely on pod-level or router-level checks that can lag behind the authoritative Kubernetes Node.status.conditions[Ready].
This service provides the true source-of-truth with fast reaction time.

Available checks

Check name	Description	Status
`node_ready`	Queries `Node.status.conditions[Ready]` via Kubernetes API	Implemented
`okd_router_1936`	Probes OpenShift router `/healthz/ready` on port 1936	Implemented
`filesystem_ro`	Detects read-only mounts via `/proc/mounts`	To be implemented
`kubelet`	Local probe to kubelet `/healthz` (port 10248)	To be implemented
`container_runtime`	Socket check + runtime status	To be implemented
`disk_pressure`	Threshold checks on key filesystems	To be implemented
`network`	DNS resolution + gateway connectivity	To be implemented
`custom_conditions`	Reacts to extra conditions (NPD, etc.)	To be implemented

All checks are combined with logical AND — any single failure results in 503.

Behavior

`node_ready` check — fail-open design

The node_ready check queries the Kubernetes API server to read Node.status.conditions[Ready]. Because this service runs on the node it is checking, there are scenarios where the API server is temporarily unreachable (e.g. during a control-plane restart). To avoid incorrectly draining a healthy node in such cases, the check is fail-open: it passes (reports ready) whenever the Kubernetes API is unavailable.

Situation	Result	HTTP status
`Node.conditions[Ready] == True`	Pass	200
`Node.conditions[Ready] == False`	Fail	503
`Ready` condition absent	Fail	503
API server unreachable or timed out (1 s timeout)	Pass (assumes ready)	200
Kubernetes client initialization failed	Pass (assumes ready)	200
`NODE_NAME` env var not set	Hard error	500

A warning is logged whenever the API is unavailable and the check falls back to assuming ready.

`okd_router_1936` check

Sends GET http://127.0.0.1:1936/healthz/ready with a 5-second timeout. Returns pass on any 2xx response, fail otherwise.

Unknown check names

Requesting an unknown check name (e.g. check=bogus) results in that check returning passed: false with reason "Unknown check: bogus", and the overall response is 503.

How it works

Node name discovery

The service reads the NODE_NAME environment variable, which must be injected via the Kubernetes Downward API:

env:
- name: NODE_NAME
  valueFrom:
    fieldRef:
      fieldPath: spec.nodeName

Kubernetes API authentication

Uses standard in-cluster configuration — no external credentials needed.
The ServiceAccount token and CA certificate are automatically mounted at /var/run/secrets/kubernetes.io/serviceaccount/.
Requires only minimal RBAC: get and list on the nodes resource (see deploy/resources.yaml).
Connect and write timeouts are set to 1 second to keep checks fast.

Deploy

All Kubernetes resources (Namespace, ServiceAccount, ClusterRole, ClusterRoleBinding, and an OpenShift SCC RoleBinding for hostnetwork) are in a single file.

kubectl apply -f deploy/resources.yaml
kubectl apply -f deploy/daemonset.yaml

The DaemonSet uses hostNetwork: true and hostPort: 25001, so the endpoint is reachable directly on the node's IP at port 25001.
It tolerates all taints, ensuring it runs even on nodes marked unschedulable.

Configure your external load balancer

Example for HAProxy / OPNsense:

Check type: HTTP
URI: /health
Port: 25001 (configurable via LISTEN_PORT env var)
Interval: 5–10 s
Rise: 2
Fall: 3
Expect: 2xx

Endpoint usage

Query parameter

Use the check query parameter to select which checks to run (comma-separated).
When omitted, only node_ready runs.

Request	Checks run
`GET /health`	`node_ready`
`GET /health?check=okd_router_1936`	`okd_router_1936` only
`GET /health?check=node_ready,okd_router_1936`	`node_ready` and `okd_router_1936`

Note: specifying check= replaces the default. Include node_ready explicitly if you need it alongside other checks.

Response format

{
  "status": "ready" | "not-ready",
  "checks": [
    {
      "name": "<check-name>",
      "passed": true | false,
      "reason": "<failure reason, omitted on success>",
      "duration_ms": 42
    }
  ],
  "total_duration_ms": 42
}

Healthy node (default)

HTTP/1.1 200 OK

{
  "status": "ready",
  "checks": [{ "name": "node_ready", "passed": true, "duration_ms": 42 }],
  "total_duration_ms": 42
}

Unhealthy node

HTTP/1.1 503 Service Unavailable

{
  "status": "not-ready",
  "checks": [
    { "name": "node_ready", "passed": false, "reason": "KubeletNotReady", "duration_ms": 35 }
  ],
  "total_duration_ms": 35
}

API server unreachable (fail-open)

HTTP/1.1 200 OK

{
  "status": "ready",
  "checks": [{ "name": "node_ready", "passed": true, "duration_ms": 1001 }],
  "total_duration_ms": 1001
}

(A warning is logged: Kubernetes API appears to be down … Assuming node is ready.)

Configuration

Env var	Default	Description
`NODE_NAME`	required	Node name, injected via Downward API
`LISTEN_PORT`	`25001`	TCP port the HTTP server binds to
`RUST_LOG`	—	Log level (e.g. `info`, `debug`)

Development

# Run locally
NODE_NAME=my-test-node cargo run

# Run tests
cargo test

Minimal, auditable, and built for production bare-metal Kubernetes environments.

README.md Unescape Escape

harmony-node-readiness-endpoint