Adds `examples/fleet_e2e_demo/` — composes fleet_auth_callout's existing pieces (Zitadel + auth callout deploy) with per-device machine-user provisioning (one ZitadelSetupScore call per VM) and FleetDeviceSetupScore using FleetDeviceAuth::ZitadelJwt. The harness expects pre-provisioned libvirt VMs (one per device) reachable via `FLEET_E2E_VM_<i>_IP` env vars; full VM provisioning via ProvisionVmScore is a follow-up — keeping the harness observable in pieces during the cold-start debugging tomorrow. Constituent helpers in `fleet_auth_callout::lib.rs` flipped from private to `pub` (deploy_zitadel, wait_for_zitadel_ready, ensure_issuer_seed, build_and_load_callout_image, etc.) so the new harness composes them rather than re-implementing. `bring_up_full_stack`: 1. Ensure k3d cluster (re-uses fleet_auth_callout's create_k3d). 2. Deploy Zitadel + Postgres. 3. CoreDNS rewrite + wait for Zitadel HTTP + wait for the chart-provisioned `iam-admin-pat` secret. (Last step is new and load-bearing — without it ZitadelSetupScore races the chart's setup job and fails on first cold-run.) 4. ZitadelSetupScore for project + API app + roles + admin machine-user (admin gets fleet-admin role grant). 5. Issuer NKey from a persisted secret + NATS deploy with auth_callout block + callout pod. 6. For each device i: per-device ZitadelSetupScore (machine-user with `device` role grant), pull the JSON keyfile from cache, render the agent's TOML with the keyfile path. (FleetDeviceSetupScore invocation is wired structurally; the SSH-and-apply step is gated behind the VM provisioning follow-up.) `HostsEntry` + `merge_hosts_file` added to FleetDeviceSetupScore so VMs on a libvirt NAT can resolve `sso.fleet.local` to the host gateway. Managed-block markers in /etc/hosts make the merge idempotent across re-runs and removable when entries are dropped from the score. Four new unit tests cover the merge invariants (insert, replace, strip, byte-stable). Tests skeleton in `tests/e2e_walking_skeleton.rs`: - `both_devices_heartbeat_within_60s` — implemented; reads from device-info KV via admin token. - `admin_jwt_reads_any_device_subject` — implemented; subscribes to `device-state.>` as admin. - `cross_device_isolation_enforced_in_vm` — `#[ignore]` pending per-device-key plumbing through E2eHandles. - `agent_recovers_from_nats_pod_restart` — `#[ignore]` pending the NATS-pod-restart driver. The two `#[ignore]`d tests cover the load-bearing reconnect and isolation invariants. Wiring them is the morning-of-rehearsal priority since those are the customer-facing claims. Out of scope of this commit (called out in the roadmap doc): - ProvisionVmScore integration (today operator runs fleet_vm_setup out-of-band). - Operator install via Helm (smoke-a4 runs operator host-side; this harness inherits that pattern). - Full SSH-based agent install via FleetDeviceSetupScore — Score built, invocation gated.
example_iot_vm_setup
End-to-end driver for the IoT walking-skeleton VM-as-device flow. Runs two Harmony Scores in sequence:
KvmVmScore— provision a libvirt VM from an Ubuntu 24.04 cloud image with a cloud-init seed ISO that authorizes one SSH key. Returns the booted VM's IP.FleetDeviceSetupScore— SSH into the VM (via the Ansible-backedHostConfigurationProvider) and install podman + thefleet-agentbinary, drop the TOML config, bring up the systemd unit.
After a successful run, the VM is a fleet member reporting to NATS under
the --device-id you chose, carrying the --group label you passed.
One-time setup
WORK=/var/tmp/harmony-iot-smoke
mkdir -p "$WORK/ssh"
# 1. Ubuntu 24.04 cloud image (~700 MB) — cached between runs.
curl -o "$WORK/ubuntu-24.04-server-cloudimg-amd64.img" \
https://cloud-images.ubuntu.com/releases/24.04/release/ubuntu-24.04-server-cloudimg-amd64.img
# 2. SSH keypair the VM will trust.
ssh-keygen -t ed25519 -N '' -f "$WORK/ssh/id_ed25519"
# 3. Runtime deps — Harmony self-installs Ansible into a managed venv
# under $HARMONY_DATA_DIR/ansible-venv on first run, so you only need
# python3 + venv on the runner. No system-wide `ansible` needed.
# On Arch:
# sudo pacman -S libvirt qemu-full xorriso python
# On Debian/Ubuntu:
# sudo apt install libvirt-daemon-system qemu-kvm xorriso python3 python3-venv
# 4. libvirt default network.
sudo virsh net-start default
sudo virsh net-autostart default
Run
cargo build -p fleet-agent-v0
cargo run -p example_iot_vm_setup -- \
--base-image /var/tmp/harmony-iot-smoke/ubuntu-24.04-server-cloudimg-amd64.img \
--ssh-pubkey /var/tmp/harmony-iot-smoke/ssh/id_ed25519.pub \
--ssh-privkey /var/tmp/harmony-iot-smoke/ssh/id_ed25519 \
--work-dir /var/tmp/harmony-iot-smoke \
--agent-binary target/debug/fleet-agent-v0 \
--nats-url nats://192.168.122.1:4222
Changing groups
Re-running with a different --group rewrites
/etc/fleet-agent/config.toml on the VM and restarts the agent. The VM
itself is untouched.
cargo run -p example_iot_vm_setup -- ... --group group-b
Full end-to-end via smoke test
See fleet/scripts/smoke-a3.sh — stands up NATS in a podman container,
runs this example, asserts the agent's status lands in NATS.