feat/iot-arm-vm #269

Merged
johnride merged 14 commits from feat/iot-arm-vm into feat/iot-walking-skeleton 2026-04-21 19:04:53 +00:00
Owner
No description provided.
johnride added 14 commits 2026-04-21 18:41:21 +00:00
The smoke test now runs end-to-end against a pristine host with only
generic deps installed (libvirt, qemu, xorriso, python3, podman,
cargo, kubectl) — no manual ISO downloads, ssh-keygen rituals, or
chmod dances. Pairs with a hard power-cycle recovery phase that
matches ROADMAP §8's "power cycle test" shape.

Harmony-side bootstrap (all under $HARMONY_DATA_DIR/iot/):

- `modules::iot::assets` — SHA256-verified Ubuntu 24.04 cloud image
  download (cached, streaming via reqwest) + ed25519 SSH keypair
  generation. OnceCell-cached like `ensure_ansible_venv`.

- `modules::iot::libvirt_pool` — user-owned dir-backed libvirt
  storage pool at $HARMONY_DATA_DIR/iot/kvm/pool/. Per-VM overlay
  disks + seed ISOs land here; libvirt dynamic-ownership handles the
  libvirt-qemu chown transitions we used to do by hand. Pool is
  defined/built once via the `virt` crate inside a spawn_blocking,
  then auto-started + auto-autostarted on every process boot.

- `modules::iot::preflight::check_iot_smoke_preflight()` — fail-fast
  checks for every runner-host prereq (`virsh`, `qemu-img`, `xorriso`,
  `python3`, `ssh-keygen`, libvirt-group membership, default
  network active). Each missing piece surfaces with the Arch/Debian/
  Fedora install command inline.

KvmVmScore now owns these calls internally — `CloudInitVmConfig`
loses `base_image_path`, `seed_output_dir`, `authorized_key`. The
Score returns the SSH private-key path in its outcome details so the
caller can hand it straight to `LinuxHostTopology`.

smoke-a3.sh dropped from 125 lines of manual setup to a thin
orchestration script. Adds phase 5: `virsh destroy` + `sleep` +
`virsh start`, then a wall-clock gate that rejects any status writes
from before the reboot. Verified: real power-cycles produce
timestamps ~14s after the gate (agent boot + connect latency); the
gate catches in-flight writes that happen during destroy.

Verified end-to-end from a fully nuked `$HARMONY_DATA_DIR/iot/`:
- cold boot: downloads 600MB cloud image (~25s), generates SSH key,
  defines + starts libvirt pool, provisions VM, onboards device,
  verifies phase 5 power-cycle recovery
- warm boot: cache hits on all bootstrap steps; same end-to-end
  PASS in 2-3 minutes total

aarch64 cross-compile still green.
Structural changes (the biggest items from the review):

- `HostConfigurationProvider` split into five narrower capabilities:
  `HostReachable`, `PackageInstaller`, `FileDelivery`,
  `UnixUserManager`, `SystemdManager`. Each implementation now only
  implements what it can actually deliver — a future cloud-init /
  ignition / podman-agent backend can pick a subset without
  inheriting systemd assumptions it can't honour. Added an umbrella
  trait `LinuxHostConfiguration` blanket-impl'd for any type that
  has all five, so Scores keep a single bound.

- New `VirtualMachineHost` capability in domain/topology/: `list_vms`
  / `ensure_vm` / `delete_vm` / `get_vm_info`, with generic
  `VirtualMachineSpec` carrying a typed optional `VmFirstBootConfig`
  (hostname, admin user, authorized keys). `KvmHost` trait and
  `KvmHostTopology` deleted; `KvmVirtualMachineHost` is the
  concrete libvirt implementation. Cloud-init stays a KVM-impl
  detail — callers never see it.

- `KvmVmScore` + `CloudInitVmConfig` deleted; replaced by a generic
  `ProvisionVmScore` in `modules::iot::vm_score` bound to
  `T: VirtualMachineHost`. The Score itself has no knowledge of the
  hypervisor or its first-boot delivery mechanism.

- `IotDeviceSetupConfig.device_id` is now `harmony_types:🆔:Id`
  (timestamp-prefixed, sortable-by-creation, collision-safe).

- `ensure_ready` on `KvmVirtualMachineHost` is a Noop with a TODO
  pointing at ROADMAP/12-code-review-april-2026.md §12.1 (phased
  topology). Captures the concern about eagerly probing the
  hypervisor even when the current run doesn't need KVM.

Code quality fixes from the line-level comments:

- `render_toml` / `render_systemd_unit` / `render_user_data`
  rewritten as `format!` with raw-string templates (no more
  push_str chains).

- Every `Command::new(…).arg().arg().arg()` chain in the touched
  files converted to `.args([…])`.

- Ansible module args are now typed Rust structs (`AptArgs`,
  `AnsibleFileArgs`, `AnsibleUserArgs`, `AnsibleCopyArgs`,
  `AnsibleSystemdArgs`, `AnsibleCommandArgs`, `AnsibleStatArgs`)
  serialized via `serde_json::to_value`. No more `json!` macros
  with ad-hoc string keys.

- `ensure_linger`: no more shell sentinel. Uses
  `ansible.builtin.stat` on `/var/lib/systemd/linger/<user>` for
  the idempotent change-state check, then `ansible.builtin.command
  loginctl enable-linger` only on miss. `loginctl` is required
  (not just `file state=touch`) because systemd-logind needs the
  dbus signal to actually start the user manager; a plain file
  touch doesn't wake it up and every subsequent `systemctl --user
  …` fails with "Failed to connect to bus". Documented in-place.

- `ensure_user_unit_active`: picks up the user's UID first via
  `ansible.builtin.command id -u <user>` and wraps the
  `systemctl --user enable --now <unit>` invocation in `env
  XDG_RUNTIME_DIR=/run/user/<UID>`. The systemd module's
  task-level `environment:` keyword isn't available in ad-hoc
  mode; this is the cleanest equivalent. Documented the
  inline-playbook path as a future when we get more task-level-
  env callsites.

- `ensure_package` comment clarified: distro dispatch is this
  function's job; Debian-family is the first concrete target and
  extending to RHEL/Fedora/Alpine is an implementation detail,
  not a capability change.

- Kubespray line removed.

Verified: from a primed `$HARMONY_DATA_DIR/iot/`, smoke-a3.sh
still completes all 5 phases (bootstrap + provision + 9 setup
changes + initial NATS status + power-cycle recovery).
Ansible's `command` module is a Python-wrapped SSH round trip with
zero added value when the operation isn't built around Ansible's
idempotency primitives. `russh` is already a workspace dep and
gives us the exit code + stdout + stderr in a typed struct, with
one round trip. Moving the two call sites that were using
`ansible.builtin.command` to russh directly:

- New `modules::linux::ssh_executor::ssh_exec(host, creds, cmd)`
  returning `SshCommandOutput { rc, stdout, stderr }`. Loads the
  private key via `russh::keys::load_secret_key`, authenticates,
  opens an exec channel, drains all `ChannelMsg` until the
  channel closes, returns the collected data. Draining past `Eof`
  matters: some sshd implementations emit `ExitStatus` *after*
  `Eof`, and an early break loses the rc.

- `ensure_linger`: `test -e /var/lib/systemd/linger/<user>` over
  russh for the check, then `sudo loginctl enable-linger <user>`
  only on miss. Two SSH round trips, no Ansible. Same semantics
  as the previous `stat` + `command` pair but without the Python
  hop.

- `ensure_user_unit_active`: `id -u <user>` + `sudo -u <user>
  env XDG_RUNTIME_DIR=/run/user/<uid> systemctl --user enable
  --now <unit>`. This is the case that couldn't be done cleanly
  via ad-hoc `ansible.builtin.systemd` in the first place because
  task-level `environment:` isn't available in ad-hoc; russh makes
  it a one-liner.

Ansible still owns: `apt` (distro dispatch + cache), `user`
(idempotent account management), `copy` (file delivery with
content-diff change reporting), `file` (directory/mode), `systemd`
(daemon-reload + enable + start as one atomic call). Those are
where `ansible`'s value is real; `command` was a category error.

Verified: smoke-a3 PASS end-to-end — same 9-change initial setup,
NATS status, and power-cycle recovery as before.
Adds the type-safe arch dimension for the aarch64-on-x86_64
emulation work to follow. No behaviour change: every existing call
site gets `VmArchitecture::X86_64` via `Default`, and the XML
renderer (unchanged in this commit) emits the same bytes it
always did.

- `VmArchitecture { X86_64 (default), Aarch64 }` in
  domain/topology/virtualization.rs, with `as_str()` and
  `ubuntu_cloudimg_suffix()` helpers (Ubuntu uses `amd64`/`arm64`
  in filenames, not the `uname -m` spelling).
- `VirtualMachineSpec.architecture` + `#[serde(default)]` for
  on-disk compat.
- `VmConfig.architecture` + `VmConfig.firmware: Option<UefiFirmware>`
  in modules/kvm/types.rs. `UefiFirmware { code, vars }` is the
  typed pair libvirt's `<loader>` + `<nvram>` need for aarch64
  guests; x86_64 leaves it None. `VmConfigBuilder::architecture()`
  / `firmware()` setters added.
- `KvmVirtualMachineHost::ensure_vm` threads the arch through to
  VmConfig; firmware wiring is commit 3.

Re-exported: `VmArchitecture`, `UefiFirmware` from
`modules::kvm`. `VmArchitecture` is a type-alias re-export from
domain/topology so the arch enum lives in one place.

Verified: cargo check clean, fmt clean, aarch64 cross-compile of
harmony + iot crates still green.
Rewrites `domain_xml` to consume a resolved `DomainXmlParams`
(domain_type / arch / machine / emulator / cpu_block / firmware)
so per-arch branching happens once — at param resolution — and
the XML template itself stays a single readable format-string.

Per-arch values (from Linaro's "QEMU: A Tale of Performance
analysis" Jan 2025 for the aarch64 TCG knobs):

- **x86_64** → `<domain type='kvm'>` + machine `q35` + emulator
  `qemu-system-x86_64` + `<cpu mode='host-model'/>`. No firmware.
  (Unchanged — all existing XML still emits byte-identical output
  on the default arch.)

- **aarch64** → `<domain type='qemu'>` (TCG emulation), machine
  `virt`, emulator `qemu-system-aarch64`, custom CPU
  `<model>max</model>` with `<feature policy='require'
  name='pauth-impdef'/>`. MTTCG (`-accel tcg,thread=multi`) is
  the default in QEMU ≥ 9.1 so no libvirt-side knob is needed.
  UEFI via `<loader readonly='yes' type='pflash'>CODE</loader>`
  + `<nvram>VARS</nvram>` — a `UefiFirmware` pair is required
  (populated by `KvmVirtualMachineHost` in commit 3).

Four new unit tests verify the aarch64 path emits the right
domain type, arch, machine, emulator, CPU features, and firmware
elements — and that x86_64 stays BIOS-default with no loader/
nvram leakage. 26/26 `modules::kvm::xml` tests green.

When a native-aarch64 runner (Ampere) shows up, it's a one-line
fork inside `DomainXmlParams::for_vm` to switch to `kvm` +
`host-model` for the aarch64 branch — the shape already handles
it.
aarch64 guests boot via UEFI — there is no SeaBIOS equivalent for
the arm64 `virt` machine type. Libvirt needs two paths:

  - CODE (read-only firmware image, shared across VMs)
  - VARS (writable NVRAM, per-VM)

Every distro ships these under a different filename. New module
`modules/kvm/firmware.rs`:

- `AarchFirmware { code, vars_template }` — typed pair.
- `discover_aarch64_firmware()` walks four known-paths groups
  (Arch `edk2-armvirt`, Arch old naming, Debian/Ubuntu
  `qemu-efi-aarch64`, Fedora `edk2-aarch64`). First pair where
  both files exist wins. Miss → `ExecutorError` carrying the
  per-distro `pacman`/`apt`/`dnf` install command + the full
  candidate list for diagnosis.
- `copy_vars_template_for_vm(fw, dest)` produces the per-VM NVRAM
  at `$pool/<vm>-VARS.fd` and chmods 0644 so libvirt-qemu's
  dynamic-ownership chown on VM start works.

Wired into `KvmVirtualMachineHost::ensure_vm`: when
`spec.architecture == Aarch64`, the topology runs firmware
discovery + per-VM copy before composing the `VmConfig`, then
hands the resolved `UefiFirmware` to the XML renderer
(commit 2 already consumes it). x86_64 path unchanged.

Firmware discovery is deliberately a runtime check with a clear
error, not a preflight — this lets x86_64-only runs succeed on
hosts without AAVMF installed. Commit 4 adds an arch-aware
preflight that surfaces it upfront when a caller asks for
aarch64.

Verified: 26/26 kvm::xml tests still green, cargo check clean,
cargo fmt clean.
Add the pinned Ubuntu 24.04 arm64 cloud image alongside the existing
amd64 pin, with sha256 verification and a per-arch OnceCell cache so
both images can coexist under $HARMONY_DATA_DIR/iot/cloud-images/.

New entry point `ensure_ubuntu_2404_cloud_image_for_arch` selects the
right URL/sha256/filename tuple by VmArchitecture; the existing
`ensure_ubuntu_2404_cloud_image` becomes a back-compat shim pointing
at x86_64 so current callers don't need to thread an arch through yet.

Preflight gains `check_iot_smoke_preflight_for_arch`: on top of the
host-generic checks, an aarch64 target additionally requires
`qemu-system-aarch64` on PATH and a usable AAVMF firmware pair
(same `discover_aarch64_firmware` call the topology makes at
ensure_vm time — preflight surfaces it up front). Package-map
helpers learn `qemu-system-aarch64` for pacman/apt/dnf.
Wire the VmArchitecture story all the way to the user-facing entry
points so an arm64 smoke run is a single env flip.

Example (`example_iot_vm_setup`):
  * New `--arch {x86-64|aarch64}` flag (default x86-64) backed by a
    `CliArch` enum that converts cleanly to `VmArchitecture`.
  * Preflight and cloud-image bootstrap now call the `_for_arch`
    variants, and the `VirtualMachineSpec.architecture` field gets
    the real value instead of `Default::default()`.

Smoke script (`iot/scripts/smoke-a3.sh`):
  * Reads `ARCH=x86-64|aarch64` from env (default x86-64).
  * When `ARCH=aarch64`, `rustup target add aarch64-unknown-linux-gnu`
    + `cargo build --target ...` produces an arm64 agent binary;
    otherwise the existing host-target build path is kept.
  * Threads `--arch` to the example.
  * Extends the phase-4 initial-status timeout (60s → 300s) and the
    phase-5 post-reboot wait (240s → 900s) under TCG, which runs
    3-5× slower than native KVM.

New `smoke-a3-arm.sh` wrapper: exports `ARCH=aarch64` and a separate
`VM_NAME` / NATS container name so an arm smoke run can coexist with
an x86 one on the same host without stepping on libvirt state.

Topology side (`KvmVirtualMachineHost::ensure_vm`): `wait_for_ip`
timeout is now arch-derived — 300s for x86_64, 900s for aarch64 —
because first-boot cloud-init under TCG routinely needs 8-12 min
on a constrained worker.
The on-device agent builds `harmony` with `default-features = false,
features = ["podman"]`, which does not pull in the `kvm` feature.
Cross-compiling iot-agent-v0 for `aarch64-unknown-linux-gnu` to put
it on a Pi / arm64 VM currently fails with:

    error[E0433]: failed to resolve: could not find `kvm` in `modules`
     --> harmony/src/modules/iot/preflight.rs:18:21
        use crate::modules::kvm::firmware::discover_aarch64_firmware;

Gate the import and the `discover_aarch64_firmware()` call inside
`check_iot_smoke_preflight_for_arch` behind `#[cfg(feature = "kvm")]`.
Callers who build `harmony` without kvm (the agent) still get the
`qemu-system-aarch64` PATH check — the firmware probe only matters
to the host that will actually boot the VM, and that host always
builds with `kvm` enabled anyway.

Verification: `cargo build --release --target aarch64-unknown-linux-gnu
-p iot-agent-v0` now succeeds and produces a valid ELF aarch64 binary
(~13 MB).
Current Arch edk2-armvirt ships the pair as
  /usr/share/edk2/aarch64/QEMU_EFI.fd
  /usr/share/edk2/aarch64/QEMU_VARS.fd
(plus a compatibility copy under /usr/share/edk2-armvirt/aarch64/).
The previous CANDIDATES list looked for `QEMU_CODE.fd` and
`vars-template-pflash.raw` — neither name matches the actual
distro layout, so `discover_aarch64_firmware` reported
"no firmware found" on a fully-provisioned Arch host.

Add the `QEMU_EFI.fd` + `QEMU_VARS.fd` pair at both Arch paths at the
top of the probe order; keep the older raw-pflash variant and the
speculative CODE/VARS naming as later fallbacks. Sync the error
message's "checked paths" hint with the new list so the diagnostic
matches what's actually probed.

Verified against /usr/share/edk2/aarch64/QEMU_{EFI,VARS}.fd on this
host — `discover_aarch64_firmware` now returns the pair and
`cargo run -p example_iot_vm_setup -- --arch aarch64 --bootstrap-only`
completes (downloads + sha256-verifies the 598 MB arm64 image and
caches it under $HARMONY_DATA_DIR/iot/cloud-images/).
Three fixes landed during arm smoke debugging. Each is a real
correctness / perf issue that would bite anyone running aarch64
under TCG via libvirt, independent of any particular firmware.

**xml.rs — qemu:commandline overrides for -cpu and -accel**

`pauth-impdef=on` is a QEMU property of `-cpu max`, not a libvirt
`<feature>` entry. Putting it under `<cpu><feature policy='require'
name='pauth-impdef'/>` is rejected by libvirt with:

    error: unsupported configuration: unknown CPU feature: pauth-impdef

Route it instead via `<qemu:commandline>` (with the qemu namespace
declared on `<domain>`). QEMU takes the LAST `-cpu` arg as
authoritative, so libvirt's `-cpu max` followed by our
`-cpu max,pauth-impdef=on` yields max + pauth-impdef.

Same mechanism forces MTTCG: despite docs claiming QEMU ≥ 9.1
defaults to `thread=multi` on aarch64, observation on QEMU 10.2
shows cross-arch `-accel tcg` runs single-threaded (`vcpu.1.time`
stays at 0 forever). Appending `-accel tcg,thread=multi` creates
a real per-vcpu thread and roughly halves cold-boot wall time.

Also added a `<rng model='virtio'>` device feeding host `/dev/urandom`.
aarch64 cloud-init blocks minutes on first-boot SSH host-key
generation without it under TCG (entropy pool never fills on its
own). Cheap insurance on x86_64 too.

**topology.rs — 30-min wait_for_ip budget for aarch64**

Cold boot under TCG on an 8-core x86 host is 10-15 min even with
virtio-rng + pauth-impdef + MTTCG. The previous 900s ceiling
trips healthy boots; 1800s covers slower CI workers.

**smoke-a3.sh — cleanup must pass --nvram**

`virsh undefine --remove-all-storage` refuses to remove an aarch64
domain without `--nvram`, because NVRAM files aren't considered
"storage." Before this, a failed run left the domain definition
behind with yesterday's XML — subsequent runs would replay the
stale XML (ensure_vm is idempotent and doesn't redefine when the
domain already exists), masking any XML change until a manual
`virsh undefine` was issued. Also bump REBOOT_STEPS to match the
new topology-side budget.

Verified: `cargo test -p harmony --lib kvm::xml` passes (26/26),
including the 5 aarch64 assertions (namespace, cpu block, pflash
wiring, qemu:commandline contents for both -cpu and -accel).
QEMU's `virt` machine hardwires pflash unit 0 as a CFI flash device
of fixed size 64 MiB. When libvirt's `<loader type='pflash'>` points
at a file smaller than that, qemu refuses to start:

    cfi.pflash01 device '/machine/virt.flash0' requires 67108864
    bytes, block backend provides 3145728 bytes

Different distros ship the CODE firmware differently:

- Pre-padded (upstream QEMU pc-bios/edk2-aarch64-code.fd, Debian/
  Ubuntu qemu-efi-aarch64): file is exactly 64 MiB, zero-padded at
  the tail. Works as-is with libvirt's pflash loader.
- Raw edk2 build output (Arch `edk2-aarch64 202508+`): file is
  ~2-4 MiB, just the firmware volume without pflash padding. Has
  to be padded before libvirt accepts it.

Our discovery previously handed the discovered path straight to
libvirt. That works on pre-padded distros and silently fails on
raw-output distros.

Add `ensure_code_pflash_padded` in modules/kvm/firmware.rs:

- If the source is already 64 MiB, return the path unchanged —
  no copy, no bytes moved.
- If smaller, check a cache path (pool_dir/aarch64-code-padded.fd)
  for a correctly-sized copy newer than the source and reuse it.
- Otherwise copy + `File::set_len(64 MiB)` (sparse zero pad, one
  syscall), chmod 0644, return the cached path.
- If larger than 64 MiB, error out — no amount of padding saves us.

`ensure_vm_firmware` in topology.rs now runs the discovered code
through the padder before handing it to libvirt. One padded copy
per pool, reused across every aarch64 VM on that pool.

Verification path: `cargo test -p harmony --lib kvm::` passes
(26 tests — XML suite unchanged since this is runtime-only).
fix(kvm): wait for port 22 after DHCP lease when first_boot is set
All checks were successful
Run Check Script / check (pull_request) Successful in 2m14s
762e3b5b99
`wait_for_ip` returns as soon as libvirt sees a DHCP lease, but the
guest may still be minutes away from accepting SSH connections —
cloud-init is usually mid-firstboot (SSH host-key generation, runcmd,
etc.). Any Score that SSHes in immediately after `ensure_vm`
resolves races with sshd startup:

    ansible.builtin.ping failed against 192.168.122.11: UNREACHABLE!
    ssh: connect to host 192.168.122.11 port 22: Connection refused

This is painful on native KVM (seconds) and catastrophic under TCG
(1-3 min between DHCP and sshd listening).

When `spec.first_boot.is_some()` — i.e. the caller asked us to run
cloud-init and therefore almost certainly intends to SSH next — also
block on `wait_for_tcp_port(ip, 22, budget)` before returning. The
budget is reused from `wait_for_ip` (300 s x86_64 / 1800 s aarch64)
because if cloud-init takes that long to bring SSH up, something is
broken that a longer wait wouldn't fix.

`wait_for_tcp_port` uses 1 s backoff polling with a 5 s per-attempt
TCP connect timeout, so a silently dropped SYN doesn't burn half
the budget on a single hung syscall.

Cases without `first_boot` (caller bringing their own pre-baked
image and not expecting SSH) get the old behavior: return as soon
as DHCP resolves.
johnride reviewed 2026-04-21 19:02:16 +00:00
@@ -0,0 +81,4 @@
.cloned()
}
/// Back-compat shim — returns the x86_64 image. Prefer
Author
Owner

This should leverage harmony_assets crate, 90% of the code here is duplication.

This should leverage harmony_assets crate, 90% of the code here is duplication.
@@ -0,0 +242,4 @@
info!("generating ed25519 ssh keypair at {priv_path:?} (one-time)");
let status = Command::new("ssh-keygen")
.arg("-t")
Author
Owner

never list arg .arg.arg.arg.arg use .args([...]) instead.

never list arg .arg.arg.arg.arg use .args([...]) instead.
johnride merged commit 4e787ddb71 into feat/iot-walking-skeleton 2026-04-21 19:04:53 +00:00
johnride deleted branch feat/iot-arm-vm 2026-04-21 19:04:54 +00:00
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: NationTech/harmony#269
No description provided.