Files
harmony/examples/fleet_device_enroll
Jean-Gabriel Gill-Couture 50f62b6437 chore: warning sweep — auto-fix pass + scoped allows for generated code
Workspace warning count: 408 → 105.

Three buckets cleared:

* Auto-fixable (`cargo fix` + `cargo clippy --fix`): unused imports
  removed, unused variables prefixed with `_`, deprecated method
  calls updated. Applied across harmony, harmony-k8s, harmony-agent,
  harmony_inventory_agent, the fleet/ workspace, and ~15 examples.
* Generated code (opnsense-api/src/generated/): 269 snake_case
  warnings + ~10 unreachable-pattern warnings come from
  CamelCase-preserving bindings to OPNsense's HAProxy/Caddy XML
  schemas. Scoped a single `#[allow(non_snake_case,
  unreachable_patterns)]` at `pub mod generated;` rather than
  fighting the codegen — renaming would break serde round-trips
  and the codegen would regenerate them anyway.
* opnsense-codegen parser's defensive `let...else` guards on
  `XmlNode` (currently single-variant): file-level
  `#![allow(irrefutable_let_patterns)]` with a comment explaining
  why we keep the `else` arms (they re-arm if the IR grows a
  second variant).

`harmony_inventory_agent::local_presence::{DiscoveryEvent,
discover_agents}` re-exports were stripped twice by the auto-fix
passes (consumers live in another crate, so the local crate looks
"unused" to lint). Anchored with explicit `pub use` + an
`#[allow(unused_imports)]` annotation noting why.

All 151 harmony lib tests still pass. Remaining ~105 warnings are
mostly real dead code in non-fleet modules + a handful of
unused-imports/variables clippy couldn't auto-resolve; cleared in
the next pass.
2026-05-06 22:51:44 -04:00
..

Example: Fleet Device Enroll

Enrolls a device into the fleet by minting its Zitadel machine user + JSON key inline (browser SSO or pre-acquired admin token), then runs FleetDeviceSetupScore against the device to install podman, drop the keyfile + agent config, and bring up the agent under systemd.

Two operator workflows land on the same code path:

  • Dev-on-device — developer runs the score on a Pi with keyboard + display attached. Browser opens locally, dev signs in with their personal SSO account, the score provisions credentials for that one device.
  • Production-via-SSH — operator runs the score from a workstation, targets each device over SSH. Browser opens once on the workstation. (Per-batch token caching is on the roadmap; v0 re-prompts per device but the browser session cookie keeps the click cheap.)

How to use

Prerequisites

  • A running staging install (Zitadel + NATS + auth callout + operator) — see examples/fleet_staging_install/.
  • The Zitadel project ID for fleet (from the staging install output).
  • A cross-compiled fleet-agent binary for the target arch.
  • For VM rehearsal: libvirt + qemu-system-aarch64 + xorriso installed locally. Run cargo run -p example_fleet_vm_setup -- --bootstrap-only --arch aarch64 once to prime the asset cache and SSH keys.
  • Your Zitadel SSO account must hold a role permitting machine-user, role-grant, and machine-key creation (typically IAM_OWNER or ORG_OWNER).

Build flavors

The crate has two flavors selected by Cargo features:

Flavor Command What it includes
Workstation (default) cargo build --release -p example_fleet_device_enroll Everything: --launch-pi-vm, --vm-rehearsal, full enrollment. Pulls in libvirt via the vm-rehearsal feature.
Device-side (cross-compile) cargo build --release --target aarch64-unknown-linux-musl -p example_fleet_device_enroll --no-default-features Enrollment-only — no VM-rehearsal flags, no libvirt. Builds for arm64. Use the musl target, not gnu (see below).

Why musl, not gnu

Building with --target aarch64-unknown-linux-gnu links against the host's glibc. On a current Arch / Fedora workstation that's glibc 2.41+; on the device it might be glibc 2.36 (Debian 12) or 2.41 (Debian 13). When the workstation's glibc is newer than the device's, the binary fails to start with:

./fleet_device_enroll: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.39' not found

aarch64-unknown-linux-musl produces a fully static binary linked against musl libc, which is bundled in. It runs on any aarch64 Linux regardless of the host's libc generation — Debian 12, 13, Pi OS, Alpine, all the same. That's what we want for a device-side binary that gets shipped onto whatever userland the production line happens to flash.

One-time musl setup

rustup target add aarch64-unknown-linux-musl
# Arch:   sudo pacman -S aarch64-linux-musl   (AUR) or use mold-aarch64
# Fedora: sudo dnf install gcc-aarch64-linux-gnu  (we use musl-cross via rustup)

You may need to point Cargo at the right linker. In ~/.cargo/config.toml:

[target.aarch64-unknown-linux-musl]
linker = "aarch64-linux-musl-gcc"

Or use cross (cargo install cross) which handles the toolchain automatically:

cross build --release --target aarch64-unknown-linux-musl \
  -p example_fleet_device_enroll --no-default-features

Copying to the device

scp target/aarch64-unknown-linux-musl/release/fleet_device_enroll pi@<host>:

Then SSH to the device and run it as documented in Dev-on-device above.

Quickstart — Pi-equivalent VM rehearsal

Boot a Pi-equivalent VM (Debian bookworm arm64 generic-cloud — same Debian base Pi OS is built on; Pi OS itself is locked to Pi hardware and won't boot in generic KVM) with one command:

cargo run -p example_fleet_device_enroll -- --launch-pi-vm

The command boots the VM and exits, printing the SSH connection details and a suggested next command. From there, enroll the running VM:

./target/debug/fleet_device_enroll \
  --target ssh://fleet-admin@<VM_IP> \
  --device-id pi-rehearsal-01 \
  --issuer-url https://sso-staging.cb1.nationtech.io \
  --audience <PROJECT_ID> \
  --nats-url wss://nats-fleet-staging.cb1.nationtech.io \
  --admin-oidc-client-id <CLIENT_ID> \
  --agent-binary target/aarch64-unknown-linux-gnu/release/fleet-agent

--device-id is required and validated against RFC1123 subdomain rules (lowercase alphanumeric + -, must start and end with an alphanumeric, ≤253 chars total / ≤63 chars per label). Same id is reused for the agent's TOML, the Zitadel machine username (device-<id>), and the Kubernetes Device CR — so anything kube wouldn't accept as a metadata.name is rejected upfront here instead of three layers down at operator-reconcile time.

The browser opens to Zitadel's device-code login. Sign in with your SSO account; the score mints the per-device user, drops the keyfile, and brings up the agent.

Dev-on-device

Run the binary on the Pi itself, omit --target entirely. The score uses ansible's local connection and runs everything on the same machine — no SSH, no keypair:

fleet_device_enroll \
  --issuer-url https://sso.example.com \
  --audience <PROJECT_ID> \
  --nats-url wss://nats.example.com \
  --admin-oidc-client-id <CLIENT_ID> \
  --agent-binary /usr/local/bin/fleet-agent \
  --device-id pi-001 \
  --labels group=lab,arch=aarch64

Browser opens on the Pi's local display. The dev signs in once; the score handles the rest. Sudo prompts the operator's password if passwordless sudo isn't configured (which is fine — Debian's default).

Auto-installs python3-venv on first run if missing (Debian splits it out of base python3); the score detects the failure, runs sudo apt-get install -y python3-venv, and retries the venv create.

Production-via-SSH

Operator runs from a workstation, targeting devices on the LAN:

fleet_device_enroll \
  --target ssh://pi@10.0.0.42 \
  --issuer-url https://sso.example.com \
  --audience <PROJECT_ID> \
  --nats-url wss://nats.example.com \
  --agent-binary ./build/fleet-agent-aarch64 \
  --device-id batch7-042 \
  --labels group=batch7,site=warehouse-east

Each invocation re-prompts the browser. Token caching across runs is tracked in ROADMAP/fleet_platform/device_enrollment_token_caching.md.

Non-interactive (CI / scripted)

Skip the browser by passing a Bearer token:

HARMONY_ZITADEL_ADMIN_TOKEN=<pat-or-access-token> \
fleet_device_enroll \
  --target ssh://pi@10.0.0.42 \
  --issuer-url https://sso.example.com \
  --audience <PROJECT_ID> \
  --nats-url wss://nats.example.com \
  --agent-binary ./build/fleet-agent-aarch64

What the score does on the device

For each invocation the score:

  1. Calls Zitadel /management/v1/* with the admin token to find-or-create the device's machine user, grant it the device role on the fleet project, and mint a JSON key (idempotent on user + grant; always mints a new key because Zitadel doesn't return existing material).
  2. SSHes to the target, ensures podman + systemd-container packages, creates the fleet-agent user with linger, activates the user-scoped podman socket.
  3. Uploads the agent binary to /usr/local/bin/fleet-agent.
  4. Drops the JSON keyfile at /etc/fleet-agent/zitadel-key.json (mode 0640, owned by fleet-agent).
  5. Renders /etc/fleet-agent/config.toml with the agent's NATS URLs, labels, and [credentials] block pointing at the keyfile.
  6. Installs and starts fleet-agent.service. Restarts only if config / binary / unit changed.

The agent then mints NATS JWTs from the keyfile via the auth callout's JWT-bearer flow and registers itself in the device-info KV.

Verification

After enrollment, the device's heartbeat should appear within seconds:

nats kv get fleet-device-info <device-id>

Or watch via the operator's dashboard / CRs:

kubectl get fleetdev   # devices CRD

SSO client_id — where to get it

--admin-oidc-client-id is the numeric Zitadel-assigned client_id, not the human-readable app name. When fleet_staging_install provisions the harmony-cli device-code app, Zitadel generates a numeric client_id like 371639797157987125@fleet. The staging install prints this value in its final summary block — copy it from there.

If you ever need to look it up after the fact, it's in the staging-install operator's local cache:

jq -r '.apps."harmony-cli"' ~/.local/share/harmony/zitadel/client-config.json

That cache is on the operator's workstation (the host that ran fleet_staging_install). The device itself doesn't have it — the operator must pass --admin-oidc-client-id <numeric> explicitly when running enrollment from the device, or set HARMONY_ZITADEL_ADMIN_TOKEN to skip SSO entirely.

Common failure modes

  • invalid_client: no active client not found--admin-oidc-client-id is wrong. Most likely you passed the app name (harmony-cli) instead of the numeric client_id. See above.
  • Project '<name>' not visible to the current Zitadel token — your SSO token's primary org differs from where the project lives. Most common when the staging install created the project as the system iam-admin user (system org) and you're signing in with a personal Zitadel account (your own org). Pass --admin-org-id <id> (find it in Zitadel UI → Organization → Resource ID). Alternatively, the score now logs projects visible in current org context: … right before the error — that list shows what your token CAN see, which usually pinpoints the org mismatch.
  • 403 on management API — operator SSO account doesn't hold a role permitting management calls. Grant IAM_OWNER (or equivalent scoped permission) in Zitadel admin UI.
  • CaUsedAsEndEntity from rustls — talking to a dev cluster with a self-signed cert. Pass --danger-accept-invalid-certs.
  • Browser doesn't open over SSHwebbrowser can't find a GUI. The score still prints the URL; copy it into a browser on your workstation.

CLI flags

Run fleet_device_enroll --help for the full surface.