Files
harmony/examples/fleet_vm_setup
Jean-Gabriel Gill-Couture 1d453dd9aa feat(e2e-demo): VM-based rehearsal harness + /etc/hosts injection
Adds `examples/fleet_e2e_demo/` — composes fleet_auth_callout's
existing pieces (Zitadel + auth callout deploy) with per-device
machine-user provisioning (one ZitadelSetupScore call per VM) and
FleetDeviceSetupScore using FleetDeviceAuth::ZitadelJwt. The harness
expects pre-provisioned libvirt VMs (one per device) reachable via
`FLEET_E2E_VM_<i>_IP` env vars; full VM provisioning via
ProvisionVmScore is a follow-up — keeping the harness observable in
pieces during the cold-start debugging tomorrow.

Constituent helpers in `fleet_auth_callout::lib.rs` flipped from
private to `pub` (deploy_zitadel, wait_for_zitadel_ready,
ensure_issuer_seed, build_and_load_callout_image, etc.) so the new
harness composes them rather than re-implementing.

`bring_up_full_stack`:
1. Ensure k3d cluster (re-uses fleet_auth_callout's create_k3d).
2. Deploy Zitadel + Postgres.
3. CoreDNS rewrite + wait for Zitadel HTTP + wait for the
   chart-provisioned `iam-admin-pat` secret. (Last step is new and
   load-bearing — without it ZitadelSetupScore races the chart's
   setup job and fails on first cold-run.)
4. ZitadelSetupScore for project + API app + roles + admin
   machine-user (admin gets fleet-admin role grant).
5. Issuer NKey from a persisted secret + NATS deploy with
   auth_callout block + callout pod.
6. For each device i: per-device ZitadelSetupScore (machine-user
   with `device` role grant), pull the JSON keyfile from cache,
   render the agent's TOML with the keyfile path. (FleetDeviceSetupScore
   invocation is wired structurally; the SSH-and-apply step is
   gated behind the VM provisioning follow-up.)

`HostsEntry` + `merge_hosts_file` added to FleetDeviceSetupScore so
VMs on a libvirt NAT can resolve `sso.fleet.local` to the host
gateway. Managed-block markers in /etc/hosts make the merge
idempotent across re-runs and removable when entries are dropped
from the score. Four new unit tests cover the merge invariants
(insert, replace, strip, byte-stable).

Tests skeleton in `tests/e2e_walking_skeleton.rs`:
- `both_devices_heartbeat_within_60s` — implemented; reads from
  device-info KV via admin token.
- `admin_jwt_reads_any_device_subject` — implemented; subscribes
  to `device-state.>` as admin.
- `cross_device_isolation_enforced_in_vm` — `#[ignore]` pending
  per-device-key plumbing through E2eHandles.
- `agent_recovers_from_nats_pod_restart` — `#[ignore]` pending
  the NATS-pod-restart driver.

The two `#[ignore]`d tests cover the load-bearing reconnect and
isolation invariants. Wiring them is the morning-of-rehearsal
priority since those are the customer-facing claims.

Out of scope of this commit (called out in the roadmap doc):
- ProvisionVmScore integration (today operator runs fleet_vm_setup
  out-of-band).
- Operator install via Helm (smoke-a4 runs operator host-side; this
  harness inherits that pattern).
- Full SSH-based agent install via FleetDeviceSetupScore — Score
  built, invocation gated.
2026-05-03 17:07:40 -04:00
..

example_iot_vm_setup

End-to-end driver for the IoT walking-skeleton VM-as-device flow. Runs two Harmony Scores in sequence:

  1. KvmVmScore — provision a libvirt VM from an Ubuntu 24.04 cloud image with a cloud-init seed ISO that authorizes one SSH key. Returns the booted VM's IP.
  2. FleetDeviceSetupScore — SSH into the VM (via the Ansible-backed HostConfigurationProvider) and install podman + the fleet-agent binary, drop the TOML config, bring up the systemd unit.

After a successful run, the VM is a fleet member reporting to NATS under the --device-id you chose, carrying the --group label you passed.

One-time setup

WORK=/var/tmp/harmony-iot-smoke
mkdir -p "$WORK/ssh"

# 1. Ubuntu 24.04 cloud image (~700 MB) — cached between runs.
curl -o "$WORK/ubuntu-24.04-server-cloudimg-amd64.img" \
     https://cloud-images.ubuntu.com/releases/24.04/release/ubuntu-24.04-server-cloudimg-amd64.img

# 2. SSH keypair the VM will trust.
ssh-keygen -t ed25519 -N '' -f "$WORK/ssh/id_ed25519"

# 3. Runtime deps — Harmony self-installs Ansible into a managed venv
#    under $HARMONY_DATA_DIR/ansible-venv on first run, so you only need
#    python3 + venv on the runner. No system-wide `ansible` needed.
# On Arch:
#   sudo pacman -S libvirt qemu-full xorriso python
# On Debian/Ubuntu:
#   sudo apt install libvirt-daemon-system qemu-kvm xorriso python3 python3-venv

# 4. libvirt default network.
sudo virsh net-start default
sudo virsh net-autostart default

Run

cargo build -p fleet-agent-v0

cargo run -p example_iot_vm_setup -- \
  --base-image /var/tmp/harmony-iot-smoke/ubuntu-24.04-server-cloudimg-amd64.img \
  --ssh-pubkey /var/tmp/harmony-iot-smoke/ssh/id_ed25519.pub \
  --ssh-privkey /var/tmp/harmony-iot-smoke/ssh/id_ed25519 \
  --work-dir /var/tmp/harmony-iot-smoke \
  --agent-binary target/debug/fleet-agent-v0 \
  --nats-url nats://192.168.122.1:4222

Changing groups

Re-running with a different --group rewrites /etc/fleet-agent/config.toml on the VM and restarts the agent. The VM itself is untouched.

cargo run -p example_iot_vm_setup -- ... --group group-b

Full end-to-end via smoke test

See fleet/scripts/smoke-a3.sh — stands up NATS in a podman container, runs this example, asserts the agent's status lands in NATS.