Files
harmony/fleet/harmony-fleet-e2e/README.md
2026-05-20 21:42:47 -04:00

144 lines
5.8 KiB
Markdown

# harmony-fleet-e2e
End-to-end test harness for the fleet stack. Brings up NATS (in k3d)
plus one or more `fleet-agent` instances — either as in-cluster Pods
(cheap, no podman) or on real libvirt VMs (expensive, real podman,
matches the production Raspberry Pi target).
Per ADR-023 P2, the harness composes the **same `*Score` types
production uses** (`FleetNatsScore`, `FleetAgentScore`,
`ProvisionVmScore`, `FleetDeviceSetupScore`). The only thing this
crate owns is the test-fixture wiring: per-binary `OnceCell` bring-up,
RAII cleanup of namespaces + VMs, and admin-side KV helpers.
## File map
```
src/
├── lib.rs # entry, re-exports
├── stack.rs # Pod-target stack (NATS + Pod agents, num_devices=0 = infra-only)
├── images.rs # cargo build + podman build + k3d image import (Pod path)
├── namespace.rs # k8s namespace RAII guard
├── kv_admin.rs # admin KV helpers: put/delete desired state + wait_for_phase
└── vm/ # VM-target harness
├── stack.rs # VmStack = infra Stack + Vec<VmDevice>
├── device.rs # one libvirt VM: ProvisionVmScore + FleetDeviceSetupScore
├── agent_build.rs # cross-build the agent for aarch64-unknown-linux-gnu
└── network.rs # libvirt default-network gateway IP discovery
```
Tests in `tests/` map 1:1 to scenarios:
| File | What it asserts | Cost |
|---|---|---|
| `ping.rs` | Pod agent replies to `Verb::Ping` over NATS | ~30 s (k3d + image build) |
| `vm_ping.rs` | VM agent replies to `Verb::Ping` over NATS | aarch64 VM bring-up |
| `vm_isolation.rs` | VM agent does NOT react to another device's KV key | shared VM |
| `vm_deploy_lifecycle.rs` | deploy → upgrade → delete podman deployment, KV phases + `podman ps` ground truth | shared VM + image pulls |
## Env gates
Every test in this crate is gated so `cargo test --workspace` stays cheap.
| Var | Purpose |
|---|---|
| `HARMONY_FLEET_E2E=1` | Enable the Pod-target test (`ping.rs`). Needs k3d + podman on PATH. |
| `HARMONY_FLEET_VM_E2E=1` | Enable the VM-target tests (`vm_*`). Needs libvirt + qemu + aarch64 cross-toolchain. |
| `FLEET_E2E_KEEP=1` | Leave the k8s namespace + libvirt VM in place on test exit (debug). |
| `RUST_LOG=...` | Standard tracing filter; default is `info`. |
## Running tests
### Pod-target (cheap, fast iteration)
```bash
HARMONY_FLEET_E2E=1 cargo test -p harmony-fleet-e2e --test ping -- --nocapture
```
### VM-target (expensive, real podman + aarch64 boot)
```bash
# One scenario at a time. Each test binary brings up its own VM
# (cargo runs each integration test file as a separate binary, so the
# per-binary `shared_vm_stack` OnceCell does not amortize across binaries).
HARMONY_FLEET_VM_E2E=1 RUST_LOG=info cargo test -p harmony-fleet-e2e --test vm_ping -- --nocapture
HARMONY_FLEET_VM_E2E=1 RUST_LOG=info cargo test -p harmony-fleet-e2e --test vm_isolation -- --nocapture
HARMONY_FLEET_VM_E2E=1 RUST_LOG=info cargo test -p harmony-fleet-e2e --test vm_deploy_lifecycle -- --nocapture
# All three sequentially:
HARMONY_FLEET_VM_E2E=1 RUST_LOG=info cargo test -p harmony-fleet-e2e \
--test vm_ping --test vm_isolation --test vm_deploy_lifecycle -- --nocapture --test-threads=1
# Everything in the crate at once (skips disabled, runs enabled):
HARMONY_FLEET_E2E=1 HARMONY_FLEET_VM_E2E=1 RUST_LOG=info \
cargo test -p harmony-fleet-e2e -- --nocapture --test-threads=1
```
### Debugging a failed bring-up
```bash
# Leave the VM + namespace alive; inspect by hand.
FLEET_E2E_KEEP=1 HARMONY_FLEET_VM_E2E=1 RUST_LOG=debug \
cargo test -p harmony-fleet-e2e --test vm_ping -- --nocapture
# After the test exits, the harness logs the cleanup commands you'd run:
# kubectl delete namespace e2e-<uuid>
# virsh destroy fleet-e2e-vm-<run>-<i>
# virsh undefine --nvram --remove-all-storage fleet-e2e-vm-<run>-<i>
# Tail the VM agent's journal:
ssh -i ~/.local/share/harmony/fleet/ssh/id_ed25519 \
fleet-admin@<vm-ip> -- 'journalctl -u fleet-agent -f'
```
## Host prerequisites
The Pod path needs: `k3d`, `podman`, `cargo`, `kubectl`.
The VM path adds:
```bash
# Arch
sudo pacman -S libvirt qemu-full libisoburn python podman \
aarch64-linux-gnu-gcc
rustup target add aarch64-unknown-linux-gnu
# Debian / Ubuntu
sudo apt install libvirt-daemon-system qemu-kvm xorriso python3 python3-venv \
podman gcc-aarch64-linux-gnu
rustup target add aarch64-unknown-linux-gnu
# One-time libvirt setup
sudo usermod -aG libvirt "$USER" # then re-login
sudo virsh net-start default
sudo virsh net-autostart default
```
`fleet/scripts/smoke-a3-arm.sh` is the bash equivalent of `vm_ping.rs`
and a useful sanity check when the Rust path misbehaves — same
underlying Scores, fewer moving parts.
## How the VM tests reach NATS
NATS runs in k3d. The harness publishes it as a `NodePort` Service
on host port `30423`. The test process connects directly to
`nats://127.0.0.1:30423`; the VM connects to the same NodePort via
the libvirt default-network gateway (typically `192.168.122.1`) —
`vm::network::libvirt_default_gateway_ip` discovers the IP at
bring-up.
## What's deliberately not tested here
- **Operator-side aggregation.** The operator's KV-watch → CR-status
reflection is covered by the operator crate's own suite. These
tests bypass the operator and talk to NATS directly to keep the
failure surface narrow — when an agent test fails, you know
it's the agent.
- **Real Zitadel auth.** All VM tests run against the
`FleetNatsScore::user_pass` mode. The Zitadel-JWT path is
exercised by `examples/fleet_e2e_demo` (currently `#[ignore]`'d
pending a CI runner with full bring-up capacity).
- **x86_64 VM bring-up.** Locked to aarch64 because that's the
production target. An x86_64 fast-path can be added by widening
`VmStackOptions::arch`; out of scope today.