All checks were successful
Run Check Script / check (pull_request) Successful in 1m51s
215 lines
6.4 KiB
Markdown
215 lines
6.4 KiB
Markdown
# harmony-node-readiness-endpoint
|
||
|
||
**A lightweight, standalone Rust service for Kubernetes node health checking.**
|
||
|
||
Designed for **bare-metal Kubernetes clusters** with external load balancers (HAProxy, OPNsense, F5, etc.).
|
||
|
||
It exposes a simple, reliable HTTP endpoint (`/health`) on each node that returns:
|
||
|
||
- **200 OK** — node is healthy and ready to receive traffic
|
||
- **503 Service Unavailable** — node should be removed from the load balancer pool
|
||
|
||
This project is **not dependent on Harmony**, but is commonly used as part of Harmony bare-metal Kubernetes deployments.
|
||
|
||
## Why this project exists
|
||
|
||
In bare-metal environments, external load balancers often rely on pod-level or router-level checks that can lag behind the authoritative Kubernetes `Node.status.conditions[Ready]`.
|
||
This service provides the true source-of-truth with fast reaction time.
|
||
|
||
## Features & Roadmap
|
||
|
||
| Check | Description | Status | Check Name |
|
||
|------------------------------------|--------------------------------------------------|---------------------|--------------------|
|
||
| **Node readiness (API)** | Queries `Node.status.conditions[Ready]` via Kubernetes API | **Implemented** | `node_ready` |
|
||
| **OKD Router health** | Probes OpenShift router healthz on port 1936 | **Implemented** | `okd_router_1936` |
|
||
| Filesystem readonly | Detects read-only mounts via `/proc/mounts` | To be implemented | `filesystem_ro` |
|
||
| Kubelet running | Local probe to kubelet `/healthz` (port 10248) | To be implemented | `kubelet` |
|
||
| CRI-O / container runtime health | Socket check + runtime status | To be implemented | `container_runtime`|
|
||
| Disk / inode pressure | Threshold checks on key filesystems | To be implemented | `disk_pressure` |
|
||
| Network reachability | DNS resolution + gateway connectivity | To be implemented | `network` |
|
||
| Custom NodeConditions | Reacts to extra conditions (NPD, etc.) | To be implemented | `custom_conditions`|
|
||
|
||
All checks are combined with logical **AND** — any failure results in 503.
|
||
|
||
## How it works
|
||
|
||
### Node Name Discovery
|
||
The service automatically discovers its own node name using the **Kubernetes Downward API**:
|
||
|
||
```yaml
|
||
env:
|
||
- name: NODE_NAME
|
||
valueFrom:
|
||
fieldRef:
|
||
fieldPath: metadata.name
|
||
```
|
||
|
||
### Kubernetes API Authentication
|
||
|
||
- Uses standard **in-cluster configuration** (no external credentials needed).
|
||
- The ServiceAccount token and CA certificate are automatically mounted by Kubernetes at `/var/run/secrets/kubernetes.io/serviceaccount/`.
|
||
- The application (via `kube-rs` or your Harmony higher-level client) calls the equivalent of `Config::incluster_config()`.
|
||
- Requires only minimal RBAC: `get` permission on the `nodes` resource (see `deploy/rbac.yaml`).
|
||
|
||
## Quick Start
|
||
|
||
### 1. Build and push
|
||
```bash
|
||
cargo build --release --bin harmony-node-readiness-endpoint
|
||
|
||
docker build -t your-registry/harmony-node-readiness-endpoint:v1.0.0 .
|
||
docker push your-registry/harmony-node-readiness-endpoint:v1.0.0
|
||
```
|
||
|
||
### 2. Deploy
|
||
```bash
|
||
kubectl apply -f deploy/namespace.yaml
|
||
kubectl apply -f deploy/rbac.yaml
|
||
kubectl apply -f deploy/daemonset.yaml
|
||
```
|
||
|
||
(The DaemonSet uses `hostPort: 25001` by default so the endpoint is reachable directly on the node's IP.)
|
||
|
||
### 3. Configure your external load balancer
|
||
|
||
**Example for HAProxy / OPNsense:**
|
||
- Check type: **HTTP**
|
||
- URI: `/health`
|
||
- Port: `25001` (configurable via `LISTEN_PORT`)
|
||
- Interval: 5–10 s
|
||
- Rise: 2
|
||
- Fall: 3
|
||
- Expect: `2xx`
|
||
|
||
## Health Endpoint Examples
|
||
|
||
### Query Parameter
|
||
|
||
Use the `check` query parameter to specify which checks to run. Multiple checks can be comma-separated.
|
||
|
||
| Request | Behavior |
|
||
|--------------------------------------|---------------------------------------------|
|
||
| `GET /health` | Runs `node_ready` (default) |
|
||
| `GET /health?check=okd_router_1936` | Runs only OKD router check |
|
||
| `GET /health?check=node_ready,okd_router_1936` | Runs both checks |
|
||
|
||
**Note:** When the `check` parameter is provided, only the specified checks run. You must explicitly include `node_ready` if you want it along with other checks.
|
||
|
||
### Response Format
|
||
|
||
Each check result includes:
|
||
- `name`: The check identifier
|
||
- `passed`: Boolean indicating success or failure
|
||
- `reason`: (Optional) Failure reason if the check failed
|
||
- `duration_ms`: Time taken to execute the check in milliseconds
|
||
|
||
**Healthy node (default check)**
|
||
```http
|
||
HTTP/1.1 200 OK
|
||
Content-Type: application/json
|
||
|
||
```
|
||
|
||
**Healthy node (multiple checks)**
|
||
```http
|
||
GET /health?check=node_ready,okd_router_1936
|
||
|
||
HTTP/1.1 200 OK
|
||
Content-Type: application/json
|
||
|
||
```
|
||
|
||
**Unhealthy node (one check failed)**
|
||
```http
|
||
GET /health?check=node_ready,okd_router_1936
|
||
|
||
HTTP/1.1 503 Service Unavailable
|
||
Content-Type: application/json
|
||
|
||
```
|
||
|
||
**Unhealthy node (default check)**
|
||
```http
|
||
HTTP/1.1 503 Service Unavailable
|
||
Content-Type: application/json
|
||
|
||
```
|
||
|
||
## Configuration (via DaemonSet env vars)
|
||
|
||
```yaml
|
||
env:
|
||
- name: NODE_NAME
|
||
valueFrom:
|
||
fieldRef:
|
||
fieldPath: metadata.name
|
||
- name: LISTEN_PORT
|
||
value: "25001"
|
||
```
|
||
|
||
Checks are selected via the `check` query parameter on the `/health` endpoint. See the usage examples above.
|
||
|
||
## Development
|
||
|
||
```bash
|
||
# Run locally (set NODE_NAME env var)
|
||
NODE_NAME=my-test-node cargo run
|
||
```
|
||
|
||
---
|
||
|
||
*Minimal, auditable, and built for production bare-metal Kubernetes environments.*
|
||
|