10 KiB
aarch64 VM support — plan
Why
The v0 walking skeleton's whole point is validating the IoT agent against the actual distribution, arch, and package set the end- customer's Pi 5 devices run on (ROADMAP §1). Everything green so far runs the agent against an x86_64 Ubuntu cloud image with an x86_64 Rust binary — which proves the code path works but not that the ARM target works. Every passing smoke-a3 run today is evidence that the wrong thing works.
This plan adds arm64 emulation on x86_64 hosts (no hardware needed for CI) so:
- the VM runs the same Ubuntu 24.04 arm64 cloud image customers will eventually flash onto a Pi;
- the iot-agent shipped to it is a real aarch64 binary produced by our existing cross-compile toolchain;
- apt/systemd/podman on the VM are the actual arm64 packages; and
- smoke-a3 exercises all of it end-to-end.
Acceptable cost: emulated boot is 5-15× slower than KVM-accelerated boot. That's the price of the target-arch validation.
Shape of the change
Additive, type-safe, default-preserving. Existing callers of
VirtualMachineSpec keep working unchanged; arm64 is opt-in via a
new field.
1. Architecture enum on the VM spec
Introduce VmArchitecture in harmony/src/domain/topology/ virtualization.rs:
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize, Default)]
pub enum VmArchitecture {
#[default]
X86_64,
Aarch64,
}
Add pub architecture: VmArchitecture to VirtualMachineSpec. With
#[derive(Default)] + VmArchitecture::X86_64 as default, every
existing call site that uses struct init continues to compile. New
constructor: VirtualMachineSpec::new_aarch64(name) for clarity.
Same treatment on VmConfig in modules/kvm/types.rs — add a
pub architecture: VmArchitecture field with Default impl.
2. Libvirt XML parameterization
Rewrite modules/kvm/xml.rs::domain_xml to branch on arch. What
changes per-arch (the QEMU flags you gave as reference map directly
to libvirt XML):
| QEMU flag | libvirt XML | x86_64 | aarch64 |
|---|---|---|---|
-accel kvm vs -accel tcg |
<domain type='…'> |
kvm |
qemu |
-M virt / -M q35 |
<os><type machine='…'> |
q35 |
virt |
| arch | <os><type arch='…'> |
x86_64 |
aarch64 |
| emulator binary | <emulator>…</emulator> |
/usr/bin/qemu-system-x86_64 |
/usr/bin/qemu-system-aarch64 |
-cpu max,pauth-impdef=on |
<cpu mode='custom'><model>max</model><feature …/></cpu> |
host-model (current) |
max + pauth-impdef |
-bios QEMU_EFI.fd |
<os><loader readonly='yes' type='pflash'>…</loader><nvram>…</nvram></os> |
— (BIOS) | AAVMF CODE + VARS pflash pair |
-accel tcg,thread=multi |
MTTCG is default-on when type='qemu' + QEMU ≥ 9.1 |
n/a | implicit |
Type safety: introduce a DomainXmlParams struct that captures
the arch-specific knobs (domain_type, arch, machine, emulator path,
cpu mode, firmware) and derives from VmArchitecture. The top-level
domain_xml then consumes a fully-resolved DomainXmlParams rather
than branching with if arch == X86_64 strings.
3. UEFI firmware discovery
aarch64 guests boot via UEFI, not BIOS. libvirt needs two files:
AAVMF_CODE.fd— the firmware code (read-only, shared)AAVMF_VARS.fd— per-VM NVRAM (writable, per-domain copy)
Common paths across distros:
| Distro | CODE | VARS (template) |
|---|---|---|
| Arch | /usr/share/edk2/aarch64/QEMU_CODE.fd |
/usr/share/edk2/aarch64/QEMU_VARS.fd |
| Debian/Ubuntu | /usr/share/AAVMF/AAVMF_CODE.fd |
/usr/share/AAVMF/AAVMF_VARS.fd |
| Fedora | /usr/share/edk2/aarch64/QEMU_EFI-pflash.raw |
/usr/share/edk2/aarch64/vars-template-pflash.raw |
New module harmony/src/modules/kvm/firmware.rs:
pub fn discover_aarch64_firmware() -> Result<AarchFirmware, KvmError>walks a small known-paths list and returns the first viable pair. Returns a typedAarchFirmware { code: PathBuf, vars_template: PathBuf }.- Per-VM NVRAM copy is handled in
KvmVirtualMachineHost: atensure_vmtime, copyvars_templateinto$pool/<vm_name>-VARS.fdand reference it in the domain XML.
4. Cloud image for arm64
Add to modules/iot/assets.rs:
pub const UBUNTU_2404_CLOUDIMG_ARM64_URL: &str =
"https://cloud-images.ubuntu.com/releases/24.04/release/ubuntu-24.04-server-cloudimg-arm64.img";
pub const UBUNTU_2404_CLOUDIMG_ARM64_SHA256: &str = "<pinned>";
pub async fn ensure_ubuntu_2404_cloud_image_for_arch(
arch: VmArchitecture,
) -> Result<PathBuf, ExecutorError>;
The existing ensure_ubuntu_2404_cloud_image() becomes a thin
wrapper that calls the arch-aware fn with X86_64, preserving all
callers. SHA256 gets pinned against the live Ubuntu arm64 image at
commit time.
5. Preflight additions
In modules/iot/preflight.rs, when the caller asks for arm64 VMs
(new check_iot_smoke_preflight_for_arch(VmArchitecture) wrapper):
- verify
qemu-system-aarch64is on PATH; - verify the aarch64 firmware pair exists (reuse the discovery fn);
- verify QEMU version ≥ 9.1 (MTTCG is a real perf multiplier — a warning, not a hard block, if the host is older).
6. Cross-compiled agent
smoke-a3.sh phase 2 currently does native cargo build --release -p iot-agent-v0. When arch=aarch64:
cargo build --release --target aarch64-unknown-linux-gnu -p iot-agent-v0- AGENT_BINARY points at
target/aarch64-unknown-linux-gnu/release/ iot-agent-v0
Opt-in via --arch aarch64 CLI flag on both
example_iot_vm_setup and smoke-a3.sh. Default stays x86_64.
7. Timeout bumps
First-boot cloud-init on emulated aarch64 takes 3-6× longer than
KVM-accel x86_64. Bump wait_for_ip timeout from 300s → 900s when
arch=aarch64. Smoke-a3's phase 5 reboot gate also lengthens.
Files to touch
| File | Change |
|---|---|
harmony/src/domain/topology/virtualization.rs |
Add VmArchitecture, field on VirtualMachineSpec, constructor helper. |
harmony/src/modules/kvm/types.rs |
Add architecture field on VmConfig, VmConfigBuilder setter. |
harmony/src/modules/kvm/xml.rs |
Rewrite domain_xml to take DomainXmlParams resolved from arch. |
harmony/src/modules/kvm/firmware.rs (new) |
Discovery of AAVMF code+vars paths; AarchFirmware struct. |
harmony/src/modules/kvm/topology.rs |
Copy per-VM NVRAM template on ensure_vm; thread arch through to XML. |
harmony/src/modules/iot/assets.rs |
ensure_ubuntu_2404_cloud_image_for_arch(arch); pin arm64 URL+sha256. |
harmony/src/modules/iot/preflight.rs |
Arch-aware preflight; qemu-system-aarch64 + firmware + qemu-version. |
examples/iot_vm_setup/src/main.rs |
`--arch x86_64 |
iot/scripts/smoke-a3.sh |
Arch flag plumbing; cross-compile; extended timeouts; preflight. |
iot/scripts/smoke-a3-arm.sh (new) |
Dedicated arm smoke as the CI hook — ARCH=aarch64 ./smoke-a3.sh. |
Out of scope
- Migrating OPNsense + other KVM examples to
VirtualMachineHost/ProvisionVmScore— real inconsistency in the codebase but a separate refactor, orthogonal to the ARM work. Filing as follow-up. - KVM-accelerated aarch64-on-aarch64 (e.g. running on an ampere
runner). Emulation covers the x86 CI story; native aarch64
runners would use
<domain type='kvm'>and no MTTCG flags, which the arch enum + existing x86_64 XML path already model — so this is effectively free when we get there. - Supporting multiple simultaneous guest arches on one host in the same smoke run. Single-arch-per-run keeps everything simple.
- Pinning AAVMF firmware like we pin the cloud image. Firmware is distro-package-managed; pin when we hit a regression.
Commit plan (in order)
-
VmArchitecturedomain type +VirtualMachineSpec.architecturefield — tiny, just the enum and struct field; no behaviour change (all callers getX86_64viaDefault). -
XML parameterization via
DomainXmlParams— rewritedomain_xmlto be arch-driven. Tests underharmony/src/modules/kvm/xml.rsget an arm64 variant. -
AAVMF firmware discovery + per-VM NVRAM copy —
firmware.rs+ the copy intopology.rs::ensure_vm. -
arm64 cloud image asset + preflight —
ensure_ubuntu_2404_cloud_image_for_arch(arch)plus preflight extensions. SHA256 pinned at commit time via a one-offcurl | sha256sum. -
Example + smoke script plumbing —
--archflag, cross-compile, timeout bumps,smoke-a3-arm.shwrapper. -
End-to-end verification — run
smoke-a3-arm.shfrom a fresh$HARMONY_DATA_DIR/iot/and confirm the aarch64 agent boots, joins NATS, and survives a power-cycle. Document timing in the commit message.
Verification
cargo check --all-targets --features kvm: clean.cargo clippy --no-deps -- -D warningson touched files: clean.cargo fmt --check: clean.- aarch64 cross-compile of harmony + iot crates: still green.
- Fresh-cache arm64 smoke-a3: PASS, timing documented.
- Existing x86_64 smoke-a3: still PASS (regression guard).