All checks were successful
Run Check Script / check (pull_request) Successful in 2m12s
Review feedback: writing yaml and shelling out to kubectl is the exact anti-pattern harmony exists to eliminate. The operator already has typed Rust for its CRD (`#[derive(CustomResource)]`), and harmony-k8s already has a typed apply path. So the "install" step should be a Score, not `cargo run -- gen-crd | kubectl apply -f -`. Changes: - **New** `iot/iot-operator-v0/src/install.rs` — `install_crds()` builds `Deployment::crd()` via `kube::CustomResourceExt`, wraps it in `harmony::modules::k8s::resource::K8sResourceScore`, and executes the Score against a tiny local `InstallTopology` that just carries a `K8sClient` loaded from `KUBECONFIG`. The local topology exists because `K8sAnywhereTopology::ensure_ready` does a lot of product-level setup (cert-manager, tenant manager, helm probes) that isn't appropriate for a narrow "apply a CRD" action. A 30-line inline topology that implements `K8sclient` + a noop `ensure_ready` is the right-sized abstraction for now. When a larger "install the operator in-cluster" Score lands (Deployment + SA + RBAC + ClusterRoleBinding), that may justify promoting the topology to a shared crate. - **Renamed subcommand** `gen-crd` → `install`. Old path: print yaml to stdout for kubectl to consume. New path: apply the CRD directly via the Score, using whatever `KUBECONFIG` points at. - **Deleted** `iot/iot-operator-v0/deploy/crd.yaml` and `deploy/operator.yaml`. The CRD yaml was derived from Rust and committed alongside the source — a drift hazard (nothing guaranteed they stayed in sync). `operator.yaml` was never actually applied by any smoke script; it existed only for documentation. Both go. - **Rewired** `iot/scripts/smoke-a1.sh` phase 2 to call the `install` subcommand instead of piping yaml to kubectl. Everything downstream (kubectl wait for Established, apiserver CEL rejection check, operator + agent + container lifecycle) unchanged. - **Dropped** `serde_yaml` from the operator's `Cargo.toml` — it was only used to print the CRD as yaml. Added `harmony`, `harmony-k8s`, and `async-trait` deps. Verification — `smoke-a1.sh` PASSes end-to-end on x86_64 k3d: k3d cluster → install CRD via Score → apiserver rejects bad score.type (CEL still works through the Score-applied CRD) → operator → agent → nginx container up → curl 200 → delete CR → KV + container removed. Out of scope / follow-up: a proper "install operator in-cluster" Score that also applies Namespace + SA + ClusterRole + ClusterRoleBinding + Deployment (the manifests that used to live in the deleted operator.yaml). Smoke-a1 currently runs the operator as a host-side process, so that Score isn't on the test path today.
345 lines
13 KiB
Bash
Executable File
345 lines
13 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
# End-to-end smoke test for the IoT walking skeleton (ROADMAP/iot_platform/
|
|
# v0_walking_skeleton.md §9.A1 and §5.4 agent dispatch).
|
|
#
|
|
# Deployment CR ─apply─▶ operator ─KV put─▶ NATS ◀─watch─ agent ─podman─▶ nginx
|
|
# │
|
|
# curl :8080 ◀───┘
|
|
#
|
|
# Stands up a NATS server container + a k3d cluster + the operator + the
|
|
# on-host agent. Applies a test CR, asserts the key reaches NATS KV, that the
|
|
# agent reconciles it into a running container, that curl returns nginx, and
|
|
# then that deleting the CR propagates back through KV delete → agent →
|
|
# container removal. Everything is torn down in the cleanup trap.
|
|
#
|
|
# Requirements on the host:
|
|
# - podman (rootless OK) with a running user socket at $XDG_RUNTIME_DIR/podman/podman.sock
|
|
# - cargo (for building/running the operator and agent)
|
|
# - kubectl
|
|
# - a k3d binary (defaults to Harmony's downloaded copy)
|
|
|
|
set -euo pipefail
|
|
|
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
|
OPERATOR_DIR="$REPO_ROOT/iot/iot-operator-v0"
|
|
AGENT_DIR="$REPO_ROOT/iot/iot-agent-v0"
|
|
|
|
K3D_BIN="${K3D_BIN:-$HOME/.local/share/harmony/k3d/k3d}"
|
|
CLUSTER_NAME="${CLUSTER_NAME:-iot-smoke}"
|
|
NATS_CONTAINER="${NATS_CONTAINER:-iot-smoke-nats}"
|
|
NATS_NET_NAME="${NATS_NET_NAME:-iot-smoke-net}"
|
|
NATS_IMAGE="${NATS_IMAGE:-docker.io/library/nats:2.10-alpine}"
|
|
NATSBOX_IMAGE="${NATSBOX_IMAGE:-docker.io/natsio/nats-box:latest}"
|
|
NATS_PORT="${NATS_PORT:-4222}"
|
|
TARGET_DEVICE="${TARGET_DEVICE:-pi-demo-01}"
|
|
DEPLOY_NAME="${DEPLOY_NAME:-hello-world}"
|
|
DEPLOY_NS="${DEPLOY_NS:-iot-demo}"
|
|
HELLO_CONTAINER="${HELLO_CONTAINER:-hello}"
|
|
HELLO_PORT="${HELLO_PORT:-8080}"
|
|
|
|
OPERATOR_LOG="$(mktemp -t iot-operator.XXXXXX.log)"
|
|
OPERATOR_PID=""
|
|
AGENT_LOG="$(mktemp -t iot-agent.XXXXXX.log)"
|
|
AGENT_PID=""
|
|
AGENT_CONFIG_FILE=""
|
|
KUBECONFIG_FILE=""
|
|
|
|
log() { printf '\033[1;34m[smoke]\033[0m %s\n' "$*"; }
|
|
fail() { printf '\033[1;31m[smoke FAIL]\033[0m %s\n' "$*" >&2; exit 1; }
|
|
|
|
cleanup() {
|
|
local rc=$?
|
|
log "cleanup…"
|
|
if [[ -n "$OPERATOR_PID" ]] && kill -0 "$OPERATOR_PID" 2>/dev/null; then
|
|
kill "$OPERATOR_PID" 2>/dev/null || true
|
|
wait "$OPERATOR_PID" 2>/dev/null || true
|
|
fi
|
|
if [[ -n "$AGENT_PID" ]] && kill -0 "$AGENT_PID" 2>/dev/null; then
|
|
kill "$AGENT_PID" 2>/dev/null || true
|
|
wait "$AGENT_PID" 2>/dev/null || true
|
|
fi
|
|
# Always remove the demo container we may have created on the host podman,
|
|
# even if KEEP=1 — leaving a rogue nginx on host:8080 is annoying.
|
|
podman rm -f "$HELLO_CONTAINER" >/dev/null 2>&1 || true
|
|
if [[ "${KEEP:-0}" != "1" ]]; then
|
|
"$K3D_BIN" cluster delete "$CLUSTER_NAME" >/dev/null 2>&1 || true
|
|
podman rm -f "$NATS_CONTAINER" >/dev/null 2>&1 || true
|
|
podman network rm "$NATS_NET_NAME" >/dev/null 2>&1 || true
|
|
[[ -n "$KUBECONFIG_FILE" ]] && rm -f "$KUBECONFIG_FILE"
|
|
[[ -n "$AGENT_CONFIG_FILE" ]] && rm -f "$AGENT_CONFIG_FILE"
|
|
else
|
|
log "KEEP=1 — leaving cluster '$CLUSTER_NAME' and container '$NATS_CONTAINER' running"
|
|
log "KUBECONFIG=$KUBECONFIG_FILE"
|
|
log "agent config: $AGENT_CONFIG_FILE"
|
|
fi
|
|
if [[ $rc -ne 0 ]]; then
|
|
log "operator log at $OPERATOR_LOG"
|
|
echo "----- operator log tail -----"
|
|
tail -n 60 "$OPERATOR_LOG" 2>/dev/null || true
|
|
log "agent log at $AGENT_LOG"
|
|
echo "----- agent log tail -----"
|
|
tail -n 60 "$AGENT_LOG" 2>/dev/null || true
|
|
else
|
|
rm -f "$OPERATOR_LOG" "$AGENT_LOG"
|
|
fi
|
|
exit $rc
|
|
}
|
|
trap cleanup EXIT INT TERM
|
|
|
|
require() { command -v "$1" >/dev/null 2>&1 || fail "missing required tool: $1"; }
|
|
require podman
|
|
require cargo
|
|
require kubectl
|
|
[[ -x "$K3D_BIN" ]] || fail "k3d binary not executable at $K3D_BIN (set K3D_BIN=…)"
|
|
|
|
natsbox() {
|
|
podman run --rm --network "$NATS_NET_NAME" "$NATSBOX_IMAGE" \
|
|
nats --server "nats://$NATS_CONTAINER:$NATS_PORT" "$@"
|
|
}
|
|
|
|
###############################################################################
|
|
# phase 1 — NATS
|
|
###############################################################################
|
|
log "phase 1: start NATS"
|
|
podman network exists "$NATS_NET_NAME" || podman network create "$NATS_NET_NAME" >/dev/null
|
|
podman rm -f "$NATS_CONTAINER" >/dev/null 2>&1 || true
|
|
podman run -d \
|
|
--name "$NATS_CONTAINER" \
|
|
--network "$NATS_NET_NAME" \
|
|
-p "$NATS_PORT:4222" \
|
|
"$NATS_IMAGE" -js >/dev/null
|
|
log "waiting for NATS"
|
|
for _ in $(seq 1 30); do
|
|
if podman run --rm --network "$NATS_NET_NAME" "$NATSBOX_IMAGE" \
|
|
nats --server "nats://$NATS_CONTAINER:4222" server check connection >/dev/null 2>&1; then
|
|
break
|
|
fi
|
|
sleep 1
|
|
done
|
|
natsbox server check connection >/dev/null || fail "NATS never became ready"
|
|
|
|
###############################################################################
|
|
# phase 2 — k3d cluster + CRD
|
|
###############################################################################
|
|
log "phase 2: create k3d cluster '$CLUSTER_NAME'"
|
|
"$K3D_BIN" cluster delete "$CLUSTER_NAME" >/dev/null 2>&1 || true
|
|
"$K3D_BIN" cluster create "$CLUSTER_NAME" --wait --timeout 90s >/dev/null
|
|
|
|
KUBECONFIG_FILE="$(mktemp -t iot-smoke-kubeconfig.XXXXXX)"
|
|
"$K3D_BIN" kubeconfig get "$CLUSTER_NAME" > "$KUBECONFIG_FILE"
|
|
export KUBECONFIG="$KUBECONFIG_FILE"
|
|
|
|
log "install CRD via operator's install subcommand (typed Rust — no yaml, no kubectl apply)"
|
|
( cd "$OPERATOR_DIR" && cargo run -q -- install ) >/dev/null
|
|
kubectl wait --for=condition=Established "crd/deployments.iot.nationtech.io" --timeout=30s >/dev/null
|
|
|
|
kubectl get ns "$DEPLOY_NS" >/dev/null 2>&1 || kubectl create namespace "$DEPLOY_NS" >/dev/null
|
|
|
|
###############################################################################
|
|
# phase 2b — CEL discriminator guardrail: an invalid score.type must be rejected
|
|
# by the apiserver (tests x-kubernetes-validations on spec.score)
|
|
###############################################################################
|
|
log "phase 2b: apiserver rejects invalid score.type"
|
|
BAD_CR=$(cat <<EOF
|
|
apiVersion: iot.nationtech.io/v1alpha1
|
|
kind: Deployment
|
|
metadata:
|
|
name: bad-discriminator
|
|
namespace: $DEPLOY_NS
|
|
spec:
|
|
targetDevices: [$TARGET_DEVICE]
|
|
score:
|
|
type: "has spaces"
|
|
data: {}
|
|
rollout:
|
|
strategy: Immediate
|
|
EOF
|
|
)
|
|
BAD_OUT="$(echo "$BAD_CR" | kubectl apply -f - 2>&1 || true)"
|
|
if echo "$BAD_OUT" | grep -q "must be a valid Rust identifier"; then
|
|
log "apiserver rejected invalid discriminator as expected"
|
|
else
|
|
fail "expected CEL rejection for score.type='has spaces'; got: $BAD_OUT"
|
|
fi
|
|
# Belt-and-braces: make sure nothing was persisted
|
|
if kubectl -n "$DEPLOY_NS" get deployment.iot.nationtech.io bad-discriminator >/dev/null 2>&1; then
|
|
kubectl -n "$DEPLOY_NS" delete deployment.iot.nationtech.io bad-discriminator >/dev/null 2>&1 || true
|
|
fail "apiserver should have rejected 'bad-discriminator' but it was persisted"
|
|
fi
|
|
|
|
###############################################################################
|
|
# phase 3 — operator
|
|
###############################################################################
|
|
log "phase 3: start operator"
|
|
(
|
|
cd "$OPERATOR_DIR"
|
|
cargo build -q
|
|
)
|
|
NATS_URL="nats://127.0.0.1:$NATS_PORT" \
|
|
KV_BUCKET="desired-state" \
|
|
RUST_LOG="info,kube_runtime=warn" \
|
|
"$REPO_ROOT/target/debug/iot-operator-v0" \
|
|
>"$OPERATOR_LOG" 2>&1 &
|
|
OPERATOR_PID=$!
|
|
log "operator pid=$OPERATOR_PID (log: $OPERATOR_LOG)"
|
|
|
|
for _ in $(seq 1 30); do
|
|
if grep -q "starting Deployment controller" "$OPERATOR_LOG"; then break; fi
|
|
if ! kill -0 "$OPERATOR_PID" 2>/dev/null; then fail "operator exited early"; fi
|
|
sleep 0.5
|
|
done
|
|
grep -q "starting Deployment controller" "$OPERATOR_LOG" \
|
|
|| fail "operator never logged 'starting Deployment controller'"
|
|
grep -q "KV bucket ready" "$OPERATOR_LOG" \
|
|
|| fail "operator never confirmed KV bucket ready"
|
|
|
|
###############################################################################
|
|
# phase 3b — agent on localhost podman
|
|
###############################################################################
|
|
log "phase 3b: build + start agent"
|
|
(
|
|
cd "$AGENT_DIR"
|
|
cargo build -q
|
|
)
|
|
|
|
# Belt-and-braces: nuke any prior demo container so a previous aborted run
|
|
# doesn't occupy the host port before we even start.
|
|
podman rm -f "$HELLO_CONTAINER" >/dev/null 2>&1 || true
|
|
|
|
AGENT_CONFIG_FILE="$(mktemp -t iot-agent-config.XXXXXX.toml)"
|
|
cat >"$AGENT_CONFIG_FILE" <<EOF
|
|
[agent]
|
|
device_id = "$TARGET_DEVICE"
|
|
|
|
[credentials]
|
|
type = "toml-shared"
|
|
nats_user = "smoke"
|
|
nats_pass = "smoke"
|
|
|
|
[nats]
|
|
urls = ["nats://127.0.0.1:$NATS_PORT"]
|
|
EOF
|
|
|
|
IOT_AGENT_CONFIG="$AGENT_CONFIG_FILE" \
|
|
RUST_LOG="info,async_nats=warn" \
|
|
"$REPO_ROOT/target/debug/iot-agent-v0" \
|
|
>"$AGENT_LOG" 2>&1 &
|
|
AGENT_PID=$!
|
|
log "agent pid=$AGENT_PID (log: $AGENT_LOG)"
|
|
|
|
for _ in $(seq 1 30); do
|
|
if grep -q "watching KV keys" "$AGENT_LOG"; then break; fi
|
|
if ! kill -0 "$AGENT_PID" 2>/dev/null; then fail "agent exited early"; fi
|
|
sleep 0.5
|
|
done
|
|
grep -q "watching KV keys" "$AGENT_LOG" \
|
|
|| fail "agent never logged 'watching KV keys'"
|
|
|
|
###############################################################################
|
|
# phase 4 — apply Deployment CR
|
|
###############################################################################
|
|
log "phase 4: apply Deployment CR"
|
|
cat <<EOF | kubectl apply -f - >/dev/null
|
|
apiVersion: iot.nationtech.io/v1alpha1
|
|
kind: Deployment
|
|
metadata:
|
|
name: $DEPLOY_NAME
|
|
namespace: $DEPLOY_NS
|
|
spec:
|
|
targetDevices: [$TARGET_DEVICE]
|
|
score:
|
|
type: PodmanV0
|
|
data:
|
|
services:
|
|
- name: hello
|
|
image: docker.io/library/nginx:alpine
|
|
ports: ["8080:80"]
|
|
rollout:
|
|
strategy: Immediate
|
|
EOF
|
|
|
|
log "wait for KV key $TARGET_DEVICE.$DEPLOY_NAME"
|
|
KV_VALUE=""
|
|
for _ in $(seq 1 30); do
|
|
if KV_VALUE="$(natsbox kv get desired-state "$TARGET_DEVICE.$DEPLOY_NAME" --raw 2>/dev/null)"; then
|
|
[[ -n "$KV_VALUE" ]] && break
|
|
fi
|
|
sleep 1
|
|
done
|
|
[[ -n "$KV_VALUE" ]] || fail "KV key never appeared"
|
|
echo "$KV_VALUE" | grep -q '"type":"PodmanV0"' \
|
|
|| fail "KV value missing \"type\":\"PodmanV0\" discriminator — got: $KV_VALUE"
|
|
echo "$KV_VALUE" | grep -q '"image":"docker.io/library/nginx:alpine"' \
|
|
|| fail "KV value missing nginx image — got: $KV_VALUE"
|
|
|
|
log "wait for .status.observedScoreString"
|
|
OBSERVED=""
|
|
for _ in $(seq 1 30); do
|
|
OBSERVED="$(kubectl -n "$DEPLOY_NS" get deployment.iot.nationtech.io "$DEPLOY_NAME" \
|
|
-o jsonpath='{.status.observedScoreString}' 2>/dev/null || true)"
|
|
[[ -n "$OBSERVED" ]] && break
|
|
sleep 1
|
|
done
|
|
[[ -n "$OBSERVED" ]] || fail ".status.observedScoreString never set"
|
|
[[ "$OBSERVED" == "$KV_VALUE" ]] \
|
|
|| fail "observedScoreString does not match KV value:\n status=$OBSERVED\n kv =$KV_VALUE"
|
|
|
|
###############################################################################
|
|
# phase 4b — agent reconciled container running on host podman
|
|
###############################################################################
|
|
log "phase 4b: wait for agent to start container '$HELLO_CONTAINER'"
|
|
CONTAINER_ID=""
|
|
for _ in $(seq 1 120); do
|
|
CONTAINER_ID="$(podman ps --filter "name=^${HELLO_CONTAINER}\$" --format '{{.ID}}' 2>/dev/null || true)"
|
|
[[ -n "$CONTAINER_ID" ]] && break
|
|
sleep 1
|
|
done
|
|
[[ -n "$CONTAINER_ID" ]] \
|
|
|| fail "agent never started container '$HELLO_CONTAINER'"
|
|
log "container running: $CONTAINER_ID"
|
|
|
|
log "curl http://127.0.0.1:$HELLO_PORT/"
|
|
CURL_OUT=""
|
|
for _ in $(seq 1 30); do
|
|
if CURL_OUT="$(curl -fsS --max-time 2 "http://127.0.0.1:$HELLO_PORT/" 2>/dev/null)"; then
|
|
break
|
|
fi
|
|
sleep 1
|
|
done
|
|
echo "$CURL_OUT" | grep -qi "nginx\|welcome" \
|
|
|| fail "curl did not return nginx welcome page (got: ${CURL_OUT:0:200})"
|
|
log "nginx responded"
|
|
|
|
###############################################################################
|
|
# phase 5 — delete CR, expect cleanup via finalizer + agent
|
|
###############################################################################
|
|
log "phase 5: delete Deployment CR — finalizer + agent should remove KV and container"
|
|
kubectl -n "$DEPLOY_NS" delete deployment.iot.nationtech.io "$DEPLOY_NAME" --wait=true >/dev/null
|
|
|
|
log "wait for KV key removal"
|
|
for _ in $(seq 1 30); do
|
|
if ! natsbox kv get desired-state "$TARGET_DEVICE.$DEPLOY_NAME" --raw >/dev/null 2>&1; then
|
|
log "KV key gone"
|
|
break
|
|
fi
|
|
sleep 1
|
|
done
|
|
if natsbox kv get desired-state "$TARGET_DEVICE.$DEPLOY_NAME" --raw >/dev/null 2>&1; then
|
|
fail "KV key still present after CR delete"
|
|
fi
|
|
|
|
log "wait for agent to remove container"
|
|
for _ in $(seq 1 60); do
|
|
if ! podman ps -a --filter "name=^${HELLO_CONTAINER}\$" --format '{{.ID}}' 2>/dev/null | grep -q .; then
|
|
log "container removed"
|
|
break
|
|
fi
|
|
sleep 1
|
|
done
|
|
if podman ps -a --filter "name=^${HELLO_CONTAINER}\$" --format '{{.ID}}' 2>/dev/null | grep -q .; then
|
|
fail "container '$HELLO_CONTAINER' still present after CR delete"
|
|
fi
|
|
|
|
log "PASS"
|