Files
harmony/harmony_node_readiness/README.md
2026-03-02 14:55:28 -05:00

6.4 KiB
Raw Permalink Blame History

harmony-node-readiness-endpoint

A lightweight, standalone Rust service for Kubernetes node health checking.

Designed for bare-metal Kubernetes clusters with external load balancers (HAProxy, OPNsense, F5, etc.).

It exposes a simple, reliable HTTP endpoint (/health) on each node that returns:

  • 200 OK — node is healthy and ready to receive traffic
  • 503 Service Unavailable — node should be removed from the load balancer pool

This project is not dependent on Harmony, but is commonly used as part of Harmony bare-metal Kubernetes deployments.

Why this project exists

In bare-metal environments, external load balancers often rely on pod-level or router-level checks that can lag behind the authoritative Kubernetes Node.status.conditions[Ready].
This service provides the true source-of-truth with fast reaction time.

Features & Roadmap

Check Description Status Check Name
Node readiness (API) Queries Node.status.conditions[Ready] via Kubernetes API Implemented node_ready
OKD Router health Probes OpenShift router healthz on port 1936 Implemented okd_router_1936
Filesystem readonly Detects read-only mounts via /proc/mounts To be implemented filesystem_ro
Kubelet running Local probe to kubelet /healthz (port 10248) To be implemented kubelet
CRI-O / container runtime health Socket check + runtime status To be implemented container_runtime
Disk / inode pressure Threshold checks on key filesystems To be implemented disk_pressure
Network reachability DNS resolution + gateway connectivity To be implemented network
Custom NodeConditions Reacts to extra conditions (NPD, etc.) To be implemented custom_conditions

All checks are combined with logical AND — any failure results in 503.

How it works

Node Name Discovery

The service automatically discovers its own node name using the Kubernetes Downward API:

env:
- name: NODE_NAME
  valueFrom:
    fieldRef:
      fieldPath: metadata.name

Kubernetes API Authentication

  • Uses standard in-cluster configuration (no external credentials needed).
  • The ServiceAccount token and CA certificate are automatically mounted by Kubernetes at /var/run/secrets/kubernetes.io/serviceaccount/.
  • The application (via kube-rs or your Harmony higher-level client) calls the equivalent of Config::incluster_config().
  • Requires only minimal RBAC: get permission on the nodes resource (see deploy/rbac.yaml).

Quick Start

1. Build and push

cargo build --release --bin harmony-node-readiness-endpoint

docker build -t your-registry/harmony-node-readiness-endpoint:v1.0.0 .
docker push your-registry/harmony-node-readiness-endpoint:v1.0.0

2. Deploy

kubectl apply -f deploy/namespace.yaml
kubectl apply -f deploy/rbac.yaml
kubectl apply -f deploy/daemonset.yaml

(The DaemonSet uses hostPort: 25001 by default so the endpoint is reachable directly on the node's IP.)

3. Configure your external load balancer

Example for HAProxy / OPNsense:

  • Check type: HTTP
  • URI: /health
  • Port: 25001 (configurable via LISTEN_PORT)
  • Interval: 510 s
  • Rise: 2
  • Fall: 3
  • Expect: 2xx

Health Endpoint Examples

Query Parameter

Use the check query parameter to specify which checks to run. Multiple checks can be comma-separated.

Request Behavior
GET /health Runs node_ready (default)
GET /health?check=okd_router_1936 Runs only OKD router check
GET /health?check=node_ready,okd_router_1936 Runs both checks

Note: When the check parameter is provided, only the specified checks run. You must explicitly include node_ready if you want it along with other checks.

Response Format

Each check result includes:

  • name: The check identifier
  • passed: Boolean indicating success or failure
  • reason: (Optional) Failure reason if the check failed
  • duration_ms: Time taken to execute the check in milliseconds

Healthy node (default check)

HTTP/1.1 200 OK
Content-Type: application/json

{
  "status": "ready",
  "checks": [
    {
      "name": "node_ready",
      "passed": true,
      "duration_ms": 42
    }
  ]
}

Healthy node (multiple checks)

GET /health?check=node_ready,okd_router_1936

HTTP/1.1 200 OK
Content-Type: application/json

{
  "status": "ready",
  "checks": [
    {
      "name": "node_ready",
      "passed": true,
      "duration_ms": 38
    },
    {
      "name": "okd_router_1936",
      "passed": true,
      "duration_ms": 12
    }
  ]
}

Unhealthy node (one check failed)

GET /health?check=node_ready,okd_router_1936

HTTP/1.1 503 Service Unavailable
Content-Type: application/json

{
  "status": "not-ready",
  "checks": [
    {
      "name": "node_ready",
      "passed": true,
      "duration_ms": 41
    },
    {
      "name": "okd_router_1936",
      "passed": false,
      "reason": "Failed to connect to OKD router: connection refused",
      "duration_ms": 5
    }
  ]
}

Unhealthy node (default check)

HTTP/1.1 503 Service Unavailable
Content-Type: application/json

{
  "status": "not-ready",
  "checks": [
    {
      "name": "node_ready",
      "passed": false,
      "reason": "KubeletNotReady",
      "duration_ms": 35
    }
  ]
}

Configuration (via DaemonSet env vars)

env:
- name: NODE_NAME
  valueFrom:
    fieldRef:
      fieldPath: metadata.name
- name: LISTEN_PORT
  value: "25001"

Checks are selected via the check query parameter on the /health endpoint. See the usage examples above.

Development

# Run locally (set NODE_NAME env var)
NODE_NAME=my-test-node cargo run

Minimal, auditable, and built for production bare-metal Kubernetes environments.