Files
harmony/fleet/harmony-fleet-e2e
Jean-Gabriel Gill-Couture bc2edf4530 feat(podman): init containers with k8s-style run-to-completion semantics
Customer apps frequently need a one-shot setup step (DB migration,
config render, cache warm-up) to succeed before the long-running
service starts. Without init containers each customer either inlines
the step into the service entrypoint (slow, racy, no failure surface)
or bolts on a sidecar that the platform can't introspect. This change
adds k8s-style init containers at the score layer so the contract is
the same one the customer already knows.

Score:
- New `InitContainer { name, image, args, env, volumes, timeout }`
  in `harmony::modules::podman`.
- `PodmanV0Score.init_containers: Vec<InitContainer>` with
  `#[serde(default)]` — pre-init-container wire payloads parse as an
  empty vec and behave unchanged.
- `DEFAULT_INIT_CONTAINER_TIMEOUT = 300s`; timeout serializes as
  whole seconds for operator readability.
- Idempotency is the customer's contract — documented at module
  level: init containers re-run on every reconcile that needs a
  fresh main container set.

Runtime contract:
- `ContainerRuntime::run_to_completion(spec, timeout) -> RunOutcome`
  added to the domain trait. `RunOutcome::Exited { exit_code }`
  vs `TimedOut { waited }` — distinct arms because the caller's
  failure path is different (operator gets the exit code for
  actionable diagnosis).
- Init containers are NOT surfaced via `list_managed_services`;
  they're removed after they exit so the host's managed-container
  surface stays bounded to long-running services.

PodmanTopology implementation:
- Pre-remove any prior container with the same name (retry-safe).
- Restart policy forced to `No` — a retrying init defeats the
  run-to-completion contract.
- `tokio::time::timeout` around `podman wait`; force-remove + return
  `TimedOut` on deadline.
- Single 200ms retry on inspect for the libpod race where state can
  briefly read `running` between `wait` returning and conmon writing
  the exit code.
- `INIT_CONTAINER_LABEL` on every init container so operators can
  `podman ps -a --filter label=...` to spot init failures.

Interpret:
- Init containers run sequentially before any service. Non-zero exit
  or timeout fails the deployment with a typed `InterpretError`
  carrying the container name + cause.
- Success message reports both counts.

Tests (in tree):
- 3 new wire-format tests in `podman::score`: roundtrip, default
  timeout hydration, ordering preservation.
- All 10 existing podman::score tests still pass; legacy roundtrip
  test now also asserts `init_containers.is_empty()` as a wire-compat
  canary.

Call-site updates (5 sites) — all existing constructors of
`PodmanV0Score` add `init_containers: vec![]`: harmony_apply_deployment
example, fleet_load_test example, operator e2e, vm_deploy_lifecycle
e2e, vm_isolation e2e.

Deferred: per-version "run-once" semantics (customer can build with a
marker file today); the agent-side handler for surfacing init logs to
the operator dashboard (covered by the logs companion PR's deferred
work).
2026-05-24 21:56:39 -04:00
..
2026-05-22 14:07:52 -04:00

harmony-fleet-e2e

End-to-end test harness for the fleet stack. Brings up NATS (in k3d) plus one or more fleet-agent instances — either as in-cluster Pods (cheap, no podman) or on real libvirt VMs (expensive, real podman, matches the production Raspberry Pi target).

Per ADR-023 P2, the harness composes the same *Score types production uses (FleetNatsScore, FleetAgentScore, ProvisionVmScore, FleetDeviceSetupScore). The only thing this crate owns is the test-fixture wiring: per-binary OnceCell bring-up, RAII cleanup of namespaces + VMs, and admin-side KV helpers.

File map

src/
├── lib.rs                  # entry, re-exports
├── stack.rs                # Pod-target stack (NATS + Pod agents, num_devices=0 = infra-only)
├── images.rs               # cargo build + podman build + k3d image import (Pod path)
├── namespace.rs            # k8s namespace RAII guard
├── kv_admin.rs             # admin KV helpers: put/delete desired state + wait_for_phase
└── vm/                     # VM-target harness
    ├── stack.rs             # VmStack = infra Stack + Vec<VmDevice>
    ├── device.rs            # one libvirt VM: ProvisionVmScore + FleetDeviceSetupScore
    ├── agent_build.rs       # build the agent for the requested guest arch (aarch64 cross / x86_64 native)
    └── network.rs           # libvirt default-network gateway IP discovery

Tests in tests/ map 1:1 to scenarios:

File What it asserts Cost
ping.rs Pod agent replies to Verb::Ping over NATS ~30 s (k3d + image build)
operator.rs Operator adds Fleet Deployment finalizers and reconciles desired-state KV create/delete ~30 s (k3d + image build)
vm_ping.rs VM agent replies to Verb::Ping over NATS ~75 s (x86 KVM) / ~7 min (aarch64 TCG)
vm_isolation.rs VM agent does NOT react to another device's KV key ~75 s (x86 KVM) / ~8 min (aarch64 TCG)
vm_deploy_lifecycle.rs deploy → upgrade → delete podman deployment, KV phases + podman ps ground truth ~90 s (x86 KVM) / ~7-8 min (aarch64 TCG)

Env gates

Every test in this crate is gated so cargo test --workspace stays cheap.

Var Purpose
HARMONY_FLEET_E2E=1 Enable the Pod-target test (ping.rs). Needs k3d + podman on PATH.
HARMONY_FLEET_VM_E2E=1 Enable the VM-target tests (vm_*). Needs libvirt + qemu (+ aarch64 cross-toolchain when running the default arch).
FLEET_E2E_KEEP=1 Leave the k8s namespace + libvirt VM in place on test exit (debug).
FLEET_E2E_VM_ARCH=x86_64 Boot an x86_64 KVM guest instead of an aarch64 TCG guest. Default aarch64 (production target). x86 runs ~3-4× faster — useful for iteration.
RUST_LOG=... Standard tracing filter; default is info.

Running tests

Pod-target (cheap, fast iteration)

HARMONY_FLEET_E2E=1 cargo test -p harmony-fleet-e2e --test ping -- --nocapture

VM-target — pick aarch64 (prod parity) or x86_64 (fast iteration)

The same three tests run against either guest arch — flip FLEET_E2E_VM_ARCH. Defaults to aarch64 (Raspberry Pi target).

Path Guest CPU Wall-clock for vm_ping (warm caches) Use when
FLEET_E2E_VM_ARCH=x86_64 native KVM ~75 s dev iteration loop
(default, aarch64) qemu TCG emulation ~7 min pre-push / CI / arch-drift catch

CI must run aarch64 — even though x86 covers the logic, a new crate dep with a broken aarch64 build or a podman call that segfaults under TCG will only surface on the real target.

# ---- dev iteration loop (x86_64 KVM, ~3× faster end-to-end) ----
HARMONY_FLEET_VM_E2E=1 FLEET_E2E_VM_ARCH=x86_64 RUST_LOG=info \
    cargo test -p harmony-fleet-e2e --test vm_ping -- --nocapture
HARMONY_FLEET_VM_E2E=1 FLEET_E2E_VM_ARCH=x86_64 RUST_LOG=info \
    cargo test -p harmony-fleet-e2e --test vm_isolation -- --nocapture
HARMONY_FLEET_VM_E2E=1 FLEET_E2E_VM_ARCH=x86_64 RUST_LOG=info \
    cargo test -p harmony-fleet-e2e --test vm_deploy_lifecycle -- --nocapture

# ---- pre-push / CI (aarch64 — production target) ----
HARMONY_FLEET_VM_E2E=1 RUST_LOG=info \
    cargo test -p harmony-fleet-e2e --test vm_ping -- --nocapture
HARMONY_FLEET_VM_E2E=1 RUST_LOG=info \
    cargo test -p harmony-fleet-e2e --test vm_isolation -- --nocapture
HARMONY_FLEET_VM_E2E=1 RUST_LOG=info \
    cargo test -p harmony-fleet-e2e --test vm_deploy_lifecycle -- --nocapture

# ---- all three sequentially (each is a separate binary → its own VM bring-up) ----
HARMONY_FLEET_VM_E2E=1 FLEET_E2E_VM_ARCH=x86_64 RUST_LOG=info cargo test -p harmony-fleet-e2e \
    --test vm_ping --test vm_isolation --test vm_deploy_lifecycle -- --nocapture --test-threads=1

# ---- everything in the crate at once (pod + vm, gates honored per-test) ----
HARMONY_FLEET_E2E=1 HARMONY_FLEET_VM_E2E=1 RUST_LOG=info \
    cargo test -p harmony-fleet-e2e -- --nocapture --test-threads=1

Wall-clock breakdown (measured on this host)

vm_ping from cold libvirt + cold cargo cache (one-time pain) to a green test:

Step aarch64 TCG x86_64 KVM Speedup
Agent build (cold) 85 s (cross) 72 s (native) 1.2×
qemu start → DHCP 48 s 9 s 5.3×
sshd accepts 9 s <1 s ≥10×
Ansible Python detect 15 s 1 s 15×
apt install podman + systemd-container 261 s 23 s 11.3×
FleetDeviceSetup steps 3-7 + restart ~50 s ~4 s ~12×
wait_until_ready ping retry ~2 s <1 s 2×
Total test future (finished in …s) 440 s 149 s 2.95×

The single biggest swing is apt install podman inside the guest: 4 min 21 s on TCG vs 23 s on KVM. The whole-test 2.95× speedup is because cold cargo cross-build and cargo native build are comparable (~80 s either way) — the in-guest work is where the x86 path collapses. Warm-cache iteration is closer to 6× because the cargo build vanishes.

Debugging a failed bring-up

# Leave the VM + namespace alive; inspect by hand.
FLEET_E2E_KEEP=1 HARMONY_FLEET_VM_E2E=1 RUST_LOG=debug \
    cargo test -p harmony-fleet-e2e --test vm_ping -- --nocapture

# After the test exits, the harness logs the cleanup commands you'd run:
#   kubectl delete namespace e2e-<uuid>
#   virsh destroy fleet-e2e-vm-<run>-<i>
#   virsh undefine --nvram --remove-all-storage fleet-e2e-vm-<run>-<i>

# Tail the VM agent's journal:
ssh -i ~/.local/share/harmony/fleet/ssh/id_ed25519 \
    fleet-admin@<vm-ip> -- 'journalctl -u fleet-agent -f'

Host prerequisites

The Pod path needs: k3d, podman, cargo, kubectl.

The VM path adds:

# Arch
sudo pacman -S libvirt qemu-full libisoburn python podman \
               aarch64-linux-gnu-gcc
rustup target add aarch64-unknown-linux-gnu

# Debian / Ubuntu
sudo apt install libvirt-daemon-system qemu-kvm xorriso python3 python3-venv \
                 podman gcc-aarch64-linux-gnu
rustup target add aarch64-unknown-linux-gnu

# One-time libvirt setup
sudo usermod -aG libvirt "$USER"   # then re-login
sudo virsh net-start default
sudo virsh net-autostart default

fleet/scripts/smoke-a3-arm.sh is the bash equivalent of vm_ping.rs and a useful sanity check when the Rust path misbehaves — same underlying Scores, fewer moving parts.

How the VM tests reach NATS

NATS runs in k3d. The harness publishes it as a NodePort Service on host port 30423. The test process connects directly to nats://127.0.0.1:30423; the VM connects to the same NodePort via the libvirt default-network gateway (typically 192.168.122.1) — vm::network::libvirt_default_gateway_ip discovers the IP at bring-up.

What's deliberately not tested here

  • Operator-side aggregation. The operator's KV-watch → CR-status reflection is covered by the operator crate's own suite. These tests bypass the operator and talk to NATS directly to keep the failure surface narrow — when an agent test fails, you know it's the agent.
  • Real Zitadel auth. All VM tests run against the FleetNatsScore::user_pass mode. The Zitadel-JWT path is exercised by examples/fleet_e2e_demo (currently #[ignore]'d pending a CI runner with full bring-up capacity).