Hand-on walkthrough for the 48-hour customer demo: - Operator: build/push the callout image → fleet-staging-deploy → capture project_id + cli_client_id from the printed panel. - Developer: fleet-sso-login proves Zitadel SSO works end-to-end. - Pi onboarding: extract iam-admin-pat from the staging cluster, cross-compile the agent for aarch64, run fleet-rpi-setup once per device with --bootstrap-token. Each Pi's agent connects to NATS over WSS using the JWT-bearer token minted from its per-device keyfile. - Deploy a container to a labeled subset via example_harmony_apply_deployment with --env / --volume / --restart flags (env + bind mounts + restart policy that work_item #1 added). - Observe the cross-device security model holding via the auth callout's logs. Also captures what's deliberately NOT in the demo (compose auto-translation, UI, Tailscale backdoor, device-join-request flow, OpenBao, K8s OIDC) so the customer call has clean expectation- setting. The runbook is the closing piece of the 48h-demo work plan; sequenced after the eight feat / refactor commits that built the underlying functionality.
8.4 KiB
Fleet Platform Demo Runbook
48-hour-demo edition. Covers the operator-side (NationTech) and the customer-developer-side (two devs onboarding two Pis, applying a container deployment to them). Hand-on, no UI yet.
Roles
- NationTech operator — runs
fleet-staging-deployonce against the customer's OKD cluster. - Customer developer — runs
fleet-sso-loginto prove auth works, then runsfleet-rpi-setupfor each Pi, then applies their workload via the existingharmony-apply-deploymentexample.
Prerequisites
Cluster (operator-side)
- OKD ≥ 4.10 (HAProxy ingress, edge-TLS).
- Wildcard DNS
*.<base-domain>pointing at the cluster ingress IP (e.g.*.customer1.nationtech.io). - Wildcard cert that the HAProxy router serves for that domain (the default OKD pattern).
cert-manager,cloudnative-pgoperators installed (Zitadel chart depends on them viaK8sAnywhereTopology's ensure_ready).- Access to a container registry the cluster can pull from. Customer
may have their own; the default in
fleet-staging-deployisquay.io/nationtech/harmony-nats-callout:demo.
Driver machine (operator + developers)
kubectlwith kubeconfig wired up.cargo(Rust toolchain).podman(used to build the agent image / fleet-callout image).sshinto the Pis from the developers' machines.
Pis
- Pi OS Lite booted, SSH server enabled, developer's SSH pubkey in
~/.ssh/authorized_keys.fleet-rpi-setuphandles the rest.
Operator: deploy the staging stack
# 1. Build the callout image and push it to the customer's registry.
cargo build --release -p harmony-nats-callout
podman build -t quay.io/nationtech/harmony-nats-callout:demo \
-f nats/callout/Dockerfile .
podman push quay.io/nationtech/harmony-nats-callout:demo
# 2. Deploy the central stack.
cargo run -p example-fleet-staging-deploy -- \
--base-domain customer1.nationtech.io \
--kube-context customer1-prod \
--callout-image quay.io/nationtech/harmony-nats-callout:demo \
--nats-auth-pass "$(openssl rand -hex 16)" \
--nats-system-pass "$(openssl rand -hex 16)"
Expected output ends with a "next steps" panel containing the project
ID, the harmony-cli client_id, the NATS WSS URL, and the exact
follow-up commands. Save those — both developers will need them.
Developer: prove SSO works
cargo run -p example-fleet-sso-login -- \
--base-domain customer1.nationtech.io \
--client-id <CLI_CLIENT_ID printed by staging deploy>
Browser opens, developer logs into Zitadel, CLI prints
Welcome <name> <email> and persists ~/.local/share/harmony/sso-session.json.
Two developers each do this once with their own Zitadel accounts.
Operator (or developer with an admin PAT): onboard a Pi
# Extract the Zitadel admin PAT once (it's in a K8s secret on the
# staging cluster).
PAT=$(kubectl --context customer1-prod \
-n zitadel get secret iam-admin-pat \
-o jsonpath='{.data.pat}' | base64 -d)
# Cross-compile the agent for aarch64 (one-time per agent rev).
cargo build --release --target aarch64-unknown-linux-gnu -p harmony-fleet-agent
# Onboard Pi #1 — sensor on the floor with arch=aarch64, group=group-a.
cargo run -p example-fleet-rpi-setup -- \
--pi-host 192.168.1.42 \
--pi-user pi \
--device-id sensor-floor-01 \
--labels "group=group-a,arch=aarch64,role=sensor" \
--bootstrap-token "$PAT" \
--zitadel-issuer-url https://zitadel.customer1.nationtech.io \
--zitadel-project-id <PROJECT_ID printed by staging deploy> \
--nats-url wss://nats.customer1.nationtech.io/ \
--agent-binary ./target/aarch64-unknown-linux-gnu/release/fleet-agent
# Onboard Pi #2 — different group label so we can target by selector.
cargo run -p example-fleet-rpi-setup -- \
--pi-host 192.168.1.43 \
--pi-user pi \
--device-id sensor-shelf-02 \
--labels "group=group-b,arch=aarch64,role=sensor" \
--bootstrap-token "$PAT" \
--zitadel-issuer-url https://zitadel.customer1.nationtech.io \
--zitadel-project-id <PROJECT_ID> \
--nats-url wss://nats.customer1.nationtech.io/ \
--agent-binary ./target/aarch64-unknown-linux-gnu/release/fleet-agent
Each Pi onboarding does the following on the device:
- Installs podman + systemd-container.
- Creates the
fleet-agentuser (with subuid/subgid for rootless podman + linger). - Drops the per-device Zitadel JSON key at
/etc/fleet-agent/zitadel-key.json(mode 0640, owner fleet-agent). - Renders
/etc/fleet-agent/config.tomlwithtype = "zitadel-jwt"pointing at the keyfile. - Starts
fleet-agent.serviceunder systemd.
The agent connects to NATS over WSS using the JWT-bearer token it mints from its keyfile. async-nats's auto-reconnect + the auth callback re-mints the token on every reconnect attempt — the "never lose connectivity" property holds across:
- Token expiry (12h Zitadel default → re-minted ~5 minutes before).
- NATS pod restart (chart upgrade, drain, etc.).
- Pi network blip (DHCP renewal, Wi-Fi roam).
Verify the fleet from the operator side
kubectl --context customer1-prod -n fleet-system get device.fleet.nationtech.io
# NAME LABELS
# sensor-floor-01 arch=aarch64,group=group-a,role=sensor
# sensor-shelf-02 arch=aarch64,group=group-b,role=sensor
kubectl --context customer1-prod -n fleet-system logs deployment/fleet-callout
# ... received auth callout request
# ... Zitadel JWT validated, generating user JWT device_id=sensor-floor-01 role=device
Developer: deploy a container to a labeled subset
# Apply the customer's backend (single service + sqlite volume + envs)
# to every device with group=group-a.
cargo run -p example_harmony_apply_deployment -- \
--namespace fleet-demo \
--name customer-backend \
--selector group=group-a \
--image registry.example.com/customer/backend:1.4 \
--port 8080:8080 \
--env DATABASE_URL=sqlite:///data/app.db \
--env LOG_LEVEL=info \
--volume /var/lib/customer-backend:/data \
--restart unless-stopped
The operator sees one Deployment CR materialized, NATS KV gets a
desired-state.<device-id>.customer-backend entry per matched
device, and each Pi's agent reconciles podman to match. The
container's data persists across agent restarts and Pi reboots
because the bind mount survives both.
kubectl get device shows the agents heartbeating; their per-deployment
state shows up on Device.status.aggregate (Chapter 2 reflect-back
already in place).
Translating a docker-compose to a Deployment CR
For the call: walk through the customer's compose file once, paste
the equivalent --env/--volume/--port flags. Bind mounts only;
named volumes need a separate decision per service. Most compose
shapes translate mechanically; depends_on / startup ordering does
not (PodmanV0 has no ordering primitive — design out of scope for
the demo).
Cross-device security model (worth showing)
- Pi A's NATS connection has a user JWT permissioned to
device-state.sensor-floor-01.>anddevice-commands.sensor-floor-01.>. - Pi A cannot publish to or subscribe from
sensor-shelf-02's subjects — the auth callout never grants them. - An admin user (Zitadel role
fleet-admin) gets>on both publish + subscribe — they observe every device. - A user with no fleet role is rejected at NATS connect time.
This is the same security model the local examples/fleet_auth_callout
suite (3 cargo tests sharing a OnceCell k3d cluster) verifies in CI.
What's NOT in the demo
- Compose-to-Deployment auto-translation (low priority — manual translation during the call works).
- A web UI for
harmony fleet apply(post-demo). - Tailscale/Headscale-based SSH backdoor to the Pis (separate daemon, out of scope).
- Device-join-request + admin-approve flow (would replace bootstrap-PAT pattern; out of scope).
- OpenBao for non-NATS secrets (env-var-only is fine for demo).
- K8s OIDC integration so kubectl accepts Zitadel JWTs (post-demo).
Re-run idempotency
Every harness in this runbook is idempotent.
fleet-staging-deployrides helm-upgrade-by-default, the ZitadelSetupScore search-then-create loop, and a persisted issuer NKey in a K8s secret.fleet-rpi-setupbyte-compares the rendered TOML against the device's existing config and only reapplies on drift; the keyfile drop + agent restart only happen when something actually changed.harmony-apply-deploymentis akube::Api::patch(...)apply, so re-running with the same fields is a server-side no-op. EOF )