All checks were successful
Run Check Script / check (pull_request) Successful in 2m25s
The IoT vocabulary was anchoring the codebase to one customer's
domain. The reconciler pattern is generic — operator in k8s, NATS
KV as desired-state bus, agents reconciling podman / OKD / KVM /
anything that can register. "Fleet" captures that neutrally; IoT
stays acknowledged in docs as the first customer use case.
Done now, while nothing is deployed. After a partner fleet lands,
changing the CRD group alone is a multi-quarter migration.
Scope (nothing left over):
Paths + crates
- iot/ → fleet/
- iot/iot-operator-v0 → fleet/harmony-fleet-operator
- iot/iot-agent-v0 → fleet/harmony-fleet-agent
- harmony/src/modules/iot → harmony/src/modules/fleet
- ROADMAP/iot_platform → ROADMAP/fleet_platform
- examples/iot_{vm_setup, load_test, nats_install} → examples/fleet_*
- -v0 suffix dropped on the operator + agent crates (semver in
Cargo.toml already tracks version)
Rust identifiers
- enum IotScore (podman score payload) → ReconcileScore
- struct IotDeviceSetupScore/Config → FleetDeviceSetupScore/Config
- InterpretName::IotDeviceSetup → InterpretName::FleetDeviceSetup
- HarmonyIotPool → HarmonyFleetPool (libvirt pool)
- HARMONY_IOT_POOL_NAME (default "harmony-iot") → HARMONY_FLEET_POOL_NAME ("harmony-fleet")
- IotSshKeypair → FleetSshKeypair
- ensure_iot_ssh_keypair / ensure_harmony_iot_pool /
check_iot_smoke_preflight_for_arch → fleet-prefixed variants
Wire / config surfaces
- CRD group `iot.nationtech.io` → `fleet.nationtech.io`
- Finalizer `iot.nationtech.io/finalizer` → `fleet.nationtech.io/finalizer`
- Shortnames iotdep/iotdevice → fleetdep/fleetdev
- Env var IOT_AGENT_CONFIG → FLEET_AGENT_CONFIG
- Env var IOT_VM_ADMIN_PASSWORD → FLEET_VM_ADMIN_PASSWORD
- Binary /usr/local/bin/iot-agent → /usr/local/bin/fleet-agent
- Systemd user `iot-agent` → `fleet-agent`
- VM admin user `iot-admin` → `fleet-admin`
Defaults
- Namespaces iot-system/iot-demo/iot-load → fleet-system/fleet-demo/fleet-load
- Helm release iot-nats → fleet-nats
- Helm release iot-operator-v0 → harmony-fleet-operator
- Container image localhost/iot-operator-v0:latest →
localhost/harmony-fleet-operator:latest
- On-disk cache $HARMONY_DATA_DIR/iot/ → $HARMONY_DATA_DIR/fleet/
(cloud-images, ssh keypairs, libvirt pool)
What stayed
- harmony-reconciler-contracts — already neutrally named
- Wire types (DeviceInfo, DeploymentState, HeartbeatPayload,
DeploymentName) — already neutral
- KV buckets (device-info, device-state, device-heartbeat,
desired-state) — already neutral
- CRD kind names (Deployment, Device) — already neutral
- NatsBasicScore / NatsHelmChartScore / HelmChart / etc. —
framework-scope, unchanged
Verification
- cargo check --workspace --all-targets: clean
- All harmony lib tests (114), fleet-operator (6), fleet-agent
(7), harmony-reconciler-contracts (13): green
- End-to-end load-test (20 devices / 3 CRs / 20s under
fleet/scripts/load-test.sh): PASS. Image built as
localhost/harmony-fleet-operator:latest, chart installed as
release harmony-fleet-operator in namespace fleet-system,
all CR aggregates correct.
Zero stragglers: grep across the tree for \biot\b / IOT_ /
\bIot[A-Z] returns empty (excluding docs explicitly talking about
IoT as the first customer's domain).
345 lines
13 KiB
Bash
Executable File
345 lines
13 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
# End-to-end smoke test for the IoT walking skeleton (ROADMAP/fleet_platform/
|
|
# v0_walking_skeleton.md §9.A1 and §5.4 agent dispatch).
|
|
#
|
|
# Deployment CR ─apply─▶ operator ─KV put─▶ NATS ◀─watch─ agent ─podman─▶ nginx
|
|
# │
|
|
# curl :8080 ◀───┘
|
|
#
|
|
# Stands up a NATS server container + a k3d cluster + the operator + the
|
|
# on-host agent. Applies a test CR, asserts the key reaches NATS KV, that the
|
|
# agent reconciles it into a running container, that curl returns nginx, and
|
|
# then that deleting the CR propagates back through KV delete → agent →
|
|
# container removal. Everything is torn down in the cleanup trap.
|
|
#
|
|
# Requirements on the host:
|
|
# - podman (rootless OK) with a running user socket at $XDG_RUNTIME_DIR/podman/podman.sock
|
|
# - cargo (for building/running the operator and agent)
|
|
# - kubectl
|
|
# - a k3d binary (defaults to Harmony's downloaded copy)
|
|
|
|
set -euo pipefail
|
|
|
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
|
OPERATOR_DIR="$REPO_ROOT/fleet/harmony-fleet-operator"
|
|
AGENT_DIR="$REPO_ROOT/fleet/harmony-fleet-agent"
|
|
|
|
K3D_BIN="${K3D_BIN:-$HOME/.local/share/harmony/k3d/k3d}"
|
|
CLUSTER_NAME="${CLUSTER_NAME:-fleet-smoke}"
|
|
NATS_CONTAINER="${NATS_CONTAINER:-fleet-smoke-nats}"
|
|
NATS_NET_NAME="${NATS_NET_NAME:-fleet-smoke-net}"
|
|
NATS_IMAGE="${NATS_IMAGE:-docker.io/library/nats:2.10-alpine}"
|
|
NATSBOX_IMAGE="${NATSBOX_IMAGE:-docker.io/natsio/nats-box:latest}"
|
|
NATS_PORT="${NATS_PORT:-4222}"
|
|
TARGET_DEVICE="${TARGET_DEVICE:-pi-demo-01}"
|
|
DEPLOY_NAME="${DEPLOY_NAME:-hello-world}"
|
|
DEPLOY_NS="${DEPLOY_NS:-fleet-demo}"
|
|
HELLO_CONTAINER="${HELLO_CONTAINER:-hello}"
|
|
HELLO_PORT="${HELLO_PORT:-8080}"
|
|
|
|
OPERATOR_LOG="$(mktemp -t harmony-fleet-operator.XXXXXX.log)"
|
|
OPERATOR_PID=""
|
|
AGENT_LOG="$(mktemp -t fleet-agent.XXXXXX.log)"
|
|
AGENT_PID=""
|
|
AGENT_CONFIG_FILE=""
|
|
KUBECONFIG_FILE=""
|
|
|
|
log() { printf '\033[1;34m[smoke]\033[0m %s\n' "$*"; }
|
|
fail() { printf '\033[1;31m[smoke FAIL]\033[0m %s\n' "$*" >&2; exit 1; }
|
|
|
|
cleanup() {
|
|
local rc=$?
|
|
log "cleanup…"
|
|
if [[ -n "$OPERATOR_PID" ]] && kill -0 "$OPERATOR_PID" 2>/dev/null; then
|
|
kill "$OPERATOR_PID" 2>/dev/null || true
|
|
wait "$OPERATOR_PID" 2>/dev/null || true
|
|
fi
|
|
if [[ -n "$AGENT_PID" ]] && kill -0 "$AGENT_PID" 2>/dev/null; then
|
|
kill "$AGENT_PID" 2>/dev/null || true
|
|
wait "$AGENT_PID" 2>/dev/null || true
|
|
fi
|
|
# Always remove the demo container we may have created on the host podman,
|
|
# even if KEEP=1 — leaving a rogue nginx on host:8080 is annoying.
|
|
podman rm -f "$HELLO_CONTAINER" >/dev/null 2>&1 || true
|
|
if [[ "${KEEP:-0}" != "1" ]]; then
|
|
"$K3D_BIN" cluster delete "$CLUSTER_NAME" >/dev/null 2>&1 || true
|
|
podman rm -f "$NATS_CONTAINER" >/dev/null 2>&1 || true
|
|
podman network rm "$NATS_NET_NAME" >/dev/null 2>&1 || true
|
|
[[ -n "$KUBECONFIG_FILE" ]] && rm -f "$KUBECONFIG_FILE"
|
|
[[ -n "$AGENT_CONFIG_FILE" ]] && rm -f "$AGENT_CONFIG_FILE"
|
|
else
|
|
log "KEEP=1 — leaving cluster '$CLUSTER_NAME' and container '$NATS_CONTAINER' running"
|
|
log "KUBECONFIG=$KUBECONFIG_FILE"
|
|
log "agent config: $AGENT_CONFIG_FILE"
|
|
fi
|
|
if [[ $rc -ne 0 ]]; then
|
|
log "operator log at $OPERATOR_LOG"
|
|
echo "----- operator log tail -----"
|
|
tail -n 60 "$OPERATOR_LOG" 2>/dev/null || true
|
|
log "agent log at $AGENT_LOG"
|
|
echo "----- agent log tail -----"
|
|
tail -n 60 "$AGENT_LOG" 2>/dev/null || true
|
|
else
|
|
rm -f "$OPERATOR_LOG" "$AGENT_LOG"
|
|
fi
|
|
exit $rc
|
|
}
|
|
trap cleanup EXIT INT TERM
|
|
|
|
require() { command -v "$1" >/dev/null 2>&1 || fail "missing required tool: $1"; }
|
|
require podman
|
|
require cargo
|
|
require kubectl
|
|
[[ -x "$K3D_BIN" ]] || fail "k3d binary not executable at $K3D_BIN (set K3D_BIN=…)"
|
|
|
|
natsbox() {
|
|
podman run --rm --network "$NATS_NET_NAME" "$NATSBOX_IMAGE" \
|
|
nats --server "nats://$NATS_CONTAINER:$NATS_PORT" "$@"
|
|
}
|
|
|
|
###############################################################################
|
|
# phase 1 — NATS
|
|
###############################################################################
|
|
log "phase 1: start NATS"
|
|
podman network exists "$NATS_NET_NAME" || podman network create "$NATS_NET_NAME" >/dev/null
|
|
podman rm -f "$NATS_CONTAINER" >/dev/null 2>&1 || true
|
|
podman run -d \
|
|
--name "$NATS_CONTAINER" \
|
|
--network "$NATS_NET_NAME" \
|
|
-p "$NATS_PORT:4222" \
|
|
"$NATS_IMAGE" -js >/dev/null
|
|
log "waiting for NATS"
|
|
for _ in $(seq 1 30); do
|
|
if podman run --rm --network "$NATS_NET_NAME" "$NATSBOX_IMAGE" \
|
|
nats --server "nats://$NATS_CONTAINER:4222" server check connection >/dev/null 2>&1; then
|
|
break
|
|
fi
|
|
sleep 1
|
|
done
|
|
natsbox server check connection >/dev/null || fail "NATS never became ready"
|
|
|
|
###############################################################################
|
|
# phase 2 — k3d cluster + CRD
|
|
###############################################################################
|
|
log "phase 2: create k3d cluster '$CLUSTER_NAME'"
|
|
"$K3D_BIN" cluster delete "$CLUSTER_NAME" >/dev/null 2>&1 || true
|
|
"$K3D_BIN" cluster create "$CLUSTER_NAME" --wait --timeout 90s >/dev/null
|
|
|
|
KUBECONFIG_FILE="$(mktemp -t fleet-smoke-kubeconfig.XXXXXX)"
|
|
"$K3D_BIN" kubeconfig get "$CLUSTER_NAME" > "$KUBECONFIG_FILE"
|
|
export KUBECONFIG="$KUBECONFIG_FILE"
|
|
|
|
log "install CRD via operator's install subcommand (typed Rust — no yaml, no kubectl apply)"
|
|
( cd "$OPERATOR_DIR" && cargo run -q -- install ) >/dev/null
|
|
kubectl wait --for=condition=Established "crd/deployments.fleet.nationtech.io" --timeout=30s >/dev/null
|
|
|
|
kubectl get ns "$DEPLOY_NS" >/dev/null 2>&1 || kubectl create namespace "$DEPLOY_NS" >/dev/null
|
|
|
|
###############################################################################
|
|
# phase 2b — CEL discriminator guardrail: an invalid score.type must be rejected
|
|
# by the apiserver (tests x-kubernetes-validations on spec.score)
|
|
###############################################################################
|
|
log "phase 2b: apiserver rejects invalid score.type"
|
|
BAD_CR=$(cat <<EOF
|
|
apiVersion: fleet.nationtech.io/v1alpha1
|
|
kind: Deployment
|
|
metadata:
|
|
name: bad-discriminator
|
|
namespace: $DEPLOY_NS
|
|
spec:
|
|
targetDevices: [$TARGET_DEVICE]
|
|
score:
|
|
type: "has spaces"
|
|
data: {}
|
|
rollout:
|
|
strategy: Immediate
|
|
EOF
|
|
)
|
|
BAD_OUT="$(echo "$BAD_CR" | kubectl apply -f - 2>&1 || true)"
|
|
if echo "$BAD_OUT" | grep -q "must be a valid Rust identifier"; then
|
|
log "apiserver rejected invalid discriminator as expected"
|
|
else
|
|
fail "expected CEL rejection for score.type='has spaces'; got: $BAD_OUT"
|
|
fi
|
|
# Belt-and-braces: make sure nothing was persisted
|
|
if kubectl -n "$DEPLOY_NS" get deployment.fleet.nationtech.io bad-discriminator >/dev/null 2>&1; then
|
|
kubectl -n "$DEPLOY_NS" delete deployment.fleet.nationtech.io bad-discriminator >/dev/null 2>&1 || true
|
|
fail "apiserver should have rejected 'bad-discriminator' but it was persisted"
|
|
fi
|
|
|
|
###############################################################################
|
|
# phase 3 — operator
|
|
###############################################################################
|
|
log "phase 3: start operator"
|
|
(
|
|
cd "$OPERATOR_DIR"
|
|
cargo build -q
|
|
)
|
|
NATS_URL="nats://127.0.0.1:$NATS_PORT" \
|
|
KV_BUCKET="desired-state" \
|
|
RUST_LOG="info,kube_runtime=warn" \
|
|
"$REPO_ROOT/target/debug/harmony-fleet-operator" \
|
|
>"$OPERATOR_LOG" 2>&1 &
|
|
OPERATOR_PID=$!
|
|
log "operator pid=$OPERATOR_PID (log: $OPERATOR_LOG)"
|
|
|
|
for _ in $(seq 1 30); do
|
|
if grep -q "starting Deployment controller" "$OPERATOR_LOG"; then break; fi
|
|
if ! kill -0 "$OPERATOR_PID" 2>/dev/null; then fail "operator exited early"; fi
|
|
sleep 0.5
|
|
done
|
|
grep -q "starting Deployment controller" "$OPERATOR_LOG" \
|
|
|| fail "operator never logged 'starting Deployment controller'"
|
|
grep -q "KV bucket ready" "$OPERATOR_LOG" \
|
|
|| fail "operator never confirmed KV bucket ready"
|
|
|
|
###############################################################################
|
|
# phase 3b — agent on localhost podman
|
|
###############################################################################
|
|
log "phase 3b: build + start agent"
|
|
(
|
|
cd "$AGENT_DIR"
|
|
cargo build -q
|
|
)
|
|
|
|
# Belt-and-braces: nuke any prior demo container so a previous aborted run
|
|
# doesn't occupy the host port before we even start.
|
|
podman rm -f "$HELLO_CONTAINER" >/dev/null 2>&1 || true
|
|
|
|
AGENT_CONFIG_FILE="$(mktemp -t fleet-agent-config.XXXXXX.toml)"
|
|
cat >"$AGENT_CONFIG_FILE" <<EOF
|
|
[agent]
|
|
device_id = "$TARGET_DEVICE"
|
|
|
|
[credentials]
|
|
type = "toml-shared"
|
|
nats_user = "smoke"
|
|
nats_pass = "smoke"
|
|
|
|
[nats]
|
|
urls = ["nats://127.0.0.1:$NATS_PORT"]
|
|
EOF
|
|
|
|
FLEET_AGENT_CONFIG="$AGENT_CONFIG_FILE" \
|
|
RUST_LOG="info,async_nats=warn" \
|
|
"$REPO_ROOT/target/debug/harmony-fleet-agent" \
|
|
>"$AGENT_LOG" 2>&1 &
|
|
AGENT_PID=$!
|
|
log "agent pid=$AGENT_PID (log: $AGENT_LOG)"
|
|
|
|
for _ in $(seq 1 30); do
|
|
if grep -q "watching KV keys" "$AGENT_LOG"; then break; fi
|
|
if ! kill -0 "$AGENT_PID" 2>/dev/null; then fail "agent exited early"; fi
|
|
sleep 0.5
|
|
done
|
|
grep -q "watching KV keys" "$AGENT_LOG" \
|
|
|| fail "agent never logged 'watching KV keys'"
|
|
|
|
###############################################################################
|
|
# phase 4 — apply Deployment CR
|
|
###############################################################################
|
|
log "phase 4: apply Deployment CR"
|
|
cat <<EOF | kubectl apply -f - >/dev/null
|
|
apiVersion: fleet.nationtech.io/v1alpha1
|
|
kind: Deployment
|
|
metadata:
|
|
name: $DEPLOY_NAME
|
|
namespace: $DEPLOY_NS
|
|
spec:
|
|
targetDevices: [$TARGET_DEVICE]
|
|
score:
|
|
type: PodmanV0
|
|
data:
|
|
services:
|
|
- name: hello
|
|
image: docker.io/library/nginx:alpine
|
|
ports: ["8080:80"]
|
|
rollout:
|
|
strategy: Immediate
|
|
EOF
|
|
|
|
log "wait for KV key $TARGET_DEVICE.$DEPLOY_NAME"
|
|
KV_VALUE=""
|
|
for _ in $(seq 1 30); do
|
|
if KV_VALUE="$(natsbox kv get desired-state "$TARGET_DEVICE.$DEPLOY_NAME" --raw 2>/dev/null)"; then
|
|
[[ -n "$KV_VALUE" ]] && break
|
|
fi
|
|
sleep 1
|
|
done
|
|
[[ -n "$KV_VALUE" ]] || fail "KV key never appeared"
|
|
echo "$KV_VALUE" | grep -q '"type":"PodmanV0"' \
|
|
|| fail "KV value missing \"type\":\"PodmanV0\" discriminator — got: $KV_VALUE"
|
|
echo "$KV_VALUE" | grep -q '"image":"docker.io/library/nginx:alpine"' \
|
|
|| fail "KV value missing nginx image — got: $KV_VALUE"
|
|
|
|
log "wait for .status.observedScoreString"
|
|
OBSERVED=""
|
|
for _ in $(seq 1 30); do
|
|
OBSERVED="$(kubectl -n "$DEPLOY_NS" get deployment.fleet.nationtech.io "$DEPLOY_NAME" \
|
|
-o jsonpath='{.status.observedScoreString}' 2>/dev/null || true)"
|
|
[[ -n "$OBSERVED" ]] && break
|
|
sleep 1
|
|
done
|
|
[[ -n "$OBSERVED" ]] || fail ".status.observedScoreString never set"
|
|
[[ "$OBSERVED" == "$KV_VALUE" ]] \
|
|
|| fail "observedScoreString does not match KV value:\n status=$OBSERVED\n kv =$KV_VALUE"
|
|
|
|
###############################################################################
|
|
# phase 4b — agent reconciled container running on host podman
|
|
###############################################################################
|
|
log "phase 4b: wait for agent to start container '$HELLO_CONTAINER'"
|
|
CONTAINER_ID=""
|
|
for _ in $(seq 1 120); do
|
|
CONTAINER_ID="$(podman ps --filter "name=^${HELLO_CONTAINER}\$" --format '{{.ID}}' 2>/dev/null || true)"
|
|
[[ -n "$CONTAINER_ID" ]] && break
|
|
sleep 1
|
|
done
|
|
[[ -n "$CONTAINER_ID" ]] \
|
|
|| fail "agent never started container '$HELLO_CONTAINER'"
|
|
log "container running: $CONTAINER_ID"
|
|
|
|
log "curl http://127.0.0.1:$HELLO_PORT/"
|
|
CURL_OUT=""
|
|
for _ in $(seq 1 30); do
|
|
if CURL_OUT="$(curl -fsS --max-time 2 "http://127.0.0.1:$HELLO_PORT/" 2>/dev/null)"; then
|
|
break
|
|
fi
|
|
sleep 1
|
|
done
|
|
echo "$CURL_OUT" | grep -qi "nginx\|welcome" \
|
|
|| fail "curl did not return nginx welcome page (got: ${CURL_OUT:0:200})"
|
|
log "nginx responded"
|
|
|
|
###############################################################################
|
|
# phase 5 — delete CR, expect cleanup via finalizer + agent
|
|
###############################################################################
|
|
log "phase 5: delete Deployment CR — finalizer + agent should remove KV and container"
|
|
kubectl -n "$DEPLOY_NS" delete deployment.fleet.nationtech.io "$DEPLOY_NAME" --wait=true >/dev/null
|
|
|
|
log "wait for KV key removal"
|
|
for _ in $(seq 1 30); do
|
|
if ! natsbox kv get desired-state "$TARGET_DEVICE.$DEPLOY_NAME" --raw >/dev/null 2>&1; then
|
|
log "KV key gone"
|
|
break
|
|
fi
|
|
sleep 1
|
|
done
|
|
if natsbox kv get desired-state "$TARGET_DEVICE.$DEPLOY_NAME" --raw >/dev/null 2>&1; then
|
|
fail "KV key still present after CR delete"
|
|
fi
|
|
|
|
log "wait for agent to remove container"
|
|
for _ in $(seq 1 60); do
|
|
if ! podman ps -a --filter "name=^${HELLO_CONTAINER}\$" --format '{{.ID}}' 2>/dev/null | grep -q .; then
|
|
log "container removed"
|
|
break
|
|
fi
|
|
sleep 1
|
|
done
|
|
if podman ps -a --filter "name=^${HELLO_CONTAINER}\$" --format '{{.ID}}' 2>/dev/null | grep -q .; then
|
|
fail "container '$HELLO_CONTAINER' still present after CR delete"
|
|
fi
|
|
|
|
log "PASS"
|