harmony/examples/fleet_device_enroll/README.md

# Example: Fleet Device Enroll

Enrolls a device into the fleet by minting its Zitadel machine user + JSON key inline (browser SSO or pre-acquired admin token), then runs `FleetDeviceSetupScore` against the device to install podman, drop the keyfile + agent config, and bring up the agent under systemd.

Two operator workflows land on the same code path:

- **Dev-on-device** — developer runs the score on a Pi with keyboard + display attached. Browser opens locally, dev signs in with their personal SSO account, the score provisions credentials for that one device.
- **Production-via-SSH** — operator runs the score from a workstation, targets each device over SSH. Browser opens once on the workstation. (Per-batch token caching is on the roadmap; v0 re-prompts per device but the browser session cookie keeps the click cheap.)

## How to use

### Prerequisites

- A running staging install (Zitadel + NATS + auth callout + operator) — see `examples/fleet_staging_install/`.
- The Zitadel project ID for `fleet` (from the staging install output).
- A cross-compiled `fleet-agent` binary for the target arch.
- For VM rehearsal: libvirt + qemu-system-aarch64 + xorriso installed locally. Run `cargo run -p example_fleet_vm_setup -- --bootstrap-only --arch aarch64` once to prime the asset cache and SSH keys.
- Your Zitadel SSO account must hold a role permitting machine-user, role-grant, and machine-key creation (typically `IAM_OWNER` or `ORG_OWNER`).

### Build flavors

The crate has two flavors selected by Cargo features:

| Flavor | Command | What it includes |
|---|---|---|
| **Workstation** (default) | `cargo build --release -p example_fleet_device_enroll` | Everything: `--launch-pi-vm`, `--vm-rehearsal`, full enrollment. Pulls in libvirt via the `vm-rehearsal` feature. |
| **Device-side** (cross-compile) | `cargo build --release --target aarch64-unknown-linux-musl -p example_fleet_device_enroll --no-default-features` | Enrollment-only — no VM-rehearsal flags, no libvirt. Builds for arm64. **Use the musl target, not gnu** (see below). |

#### Why musl, not gnu

Building with `--target aarch64-unknown-linux-gnu` links against the host's glibc. On a current Arch / Fedora workstation that's glibc 2.41+; on the device it might be glibc 2.36 (Debian 12) or 2.41 (Debian 13). When the workstation's glibc is newer than the device's, the binary fails to start with:

```
./fleet_device_enroll: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.39' not found
```

`aarch64-unknown-linux-musl` produces a **fully static binary** linked against musl libc, which is bundled in. It runs on any aarch64 Linux regardless of the host's libc generation — Debian 12, 13, Pi OS, Alpine, all the same. That's what we want for a device-side binary that gets shipped onto whatever userland the production line happens to flash.

#### One-time musl setup

```bash
rustup target add aarch64-unknown-linux-musl
# Arch:   sudo pacman -S aarch64-linux-musl   (AUR) or use mold-aarch64
# Fedora: sudo dnf install gcc-aarch64-linux-gnu  (we use musl-cross via rustup)
```

You may need to point Cargo at the right linker. In `~/.cargo/config.toml`:

```toml
[target.aarch64-unknown-linux-musl]
linker = "aarch64-linux-musl-gcc"
```

Or use `cross` (`cargo install cross`) which handles the toolchain automatically:

```bash
cross build --release --target aarch64-unknown-linux-musl \
  -p example_fleet_device_enroll --no-default-features
```

#### Copying to the device

```bash
scp target/aarch64-unknown-linux-musl/release/fleet_device_enroll pi@<host>:
```

Then SSH to the device and run it as documented in [Dev-on-device](#dev-on-device) above.

### Quickstart — Pi-equivalent VM rehearsal

Boot a Pi-equivalent VM (Debian bookworm arm64 generic-cloud — same Debian base Pi OS is built on; Pi OS itself is locked to Pi hardware and won't boot in generic KVM) with one command:

```bash
cargo run -p example_fleet_device_enroll -- --launch-pi-vm
```

The command boots the VM and exits, printing the SSH connection details and a suggested next command. From there, enroll the running VM:

```bash
./target/debug/fleet_device_enroll \
  --target ssh://fleet-admin@<VM_IP> \
  --device-id pi-rehearsal-01 \
  --issuer-url https://sso-staging.cb1.nationtech.io \
  --audience <PROJECT_ID> \
  --nats-url wss://nats-fleet-staging.cb1.nationtech.io \
  --admin-oidc-client-id <CLIENT_ID> \
  --agent-binary target/aarch64-unknown-linux-gnu/release/fleet-agent
```

`--device-id` is required and validated against RFC1123 subdomain rules (lowercase alphanumeric + `-`, must start and end with an alphanumeric, ≤253 chars total / ≤63 chars per label). Same id is reused for the agent's TOML, the Zitadel machine username (`device-<id>`), and the Kubernetes Device CR — so anything kube wouldn't accept as a `metadata.name` is rejected upfront here instead of three layers down at operator-reconcile time.

The browser opens to Zitadel's device-code login. Sign in with your SSO account; the score mints the per-device user, drops the keyfile, and brings up the agent.

### Dev-on-device

Run the binary on the Pi itself, omit `--target` entirely. The score uses ansible's local connection and runs everything on the same machine — no SSH, no keypair:

```bash
fleet_device_enroll \
  --issuer-url https://sso.example.com \
  --audience <PROJECT_ID> \
  --nats-url wss://nats.example.com \
  --admin-oidc-client-id <CLIENT_ID> \
  --agent-binary /usr/local/bin/fleet-agent \
  --device-id pi-001 \
  --labels group=lab,arch=aarch64
```

Browser opens on the Pi's local display. The dev signs in once; the score handles the rest. Sudo prompts the operator's password if passwordless sudo isn't configured (which is fine — Debian's default).

Auto-installs `python3-venv` on first run if missing (Debian splits it out of base python3); the score detects the failure, runs `sudo apt-get install -y python3-venv`, and retries the venv create.

### Production-via-SSH

Operator runs from a workstation, targeting devices on the LAN:

```bash
fleet_device_enroll \
  --target ssh://pi@10.0.0.42 \
  --issuer-url https://sso.example.com \
  --audience <PROJECT_ID> \
  --nats-url wss://nats.example.com \
  --agent-binary ./build/fleet-agent-aarch64 \
  --device-id batch7-042 \
  --labels group=batch7,site=warehouse-east
```

Each invocation re-prompts the browser. Token caching across runs is tracked in `ROADMAP/fleet_platform/device_enrollment_token_caching.md`.

### Non-interactive (CI / scripted)

Skip the browser by passing a Bearer token:

```bash
HARMONY_ZITADEL_ADMIN_TOKEN=<pat-or-access-token> \
fleet_device_enroll \
  --target ssh://pi@10.0.0.42 \
  --issuer-url https://sso.example.com \
  --audience <PROJECT_ID> \
  --nats-url wss://nats.example.com \
  --agent-binary ./build/fleet-agent-aarch64
```

## What the score does on the device

For each invocation the score:

1. Calls Zitadel `/management/v1/*` with the admin token to find-or-create the device's machine user, grant it the `device` role on the fleet project, and mint a JSON key (idempotent on user + grant; always mints a new key because Zitadel doesn't return existing material).
2. SSHes to the target, ensures `podman` + `systemd-container` packages, creates the `fleet-agent` user with linger, activates the user-scoped podman socket.
3. Uploads the agent binary to `/usr/local/bin/fleet-agent`.
4. Drops the JSON keyfile at `/etc/fleet-agent/zitadel-key.json` (mode 0640, owned by `fleet-agent`).
5. Renders `/etc/fleet-agent/config.toml` with the agent's NATS URLs, labels, and `[credentials]` block pointing at the keyfile.
6. Installs and starts `fleet-agent.service`. Restarts only if config / binary / unit changed.

The agent then mints NATS JWTs from the keyfile via the auth callout's JWT-bearer flow and registers itself in the `device-info` KV.

## Verification

After enrollment, the device's heartbeat should appear within seconds:

```bash
nats kv get fleet-device-info <device-id>
```

Or watch via the operator's dashboard / CRs:

```bash
kubectl get fleetdev   # devices CRD
```

## SSO `client_id` — where to get it

`--admin-oidc-client-id` is the **numeric Zitadel-assigned client_id**, not the human-readable app name. When `fleet_staging_install` provisions the `harmony-cli` device-code app, Zitadel generates a numeric client_id like `371639797157987125@fleet`. The staging install prints this value in its final summary block — copy it from there.

If you ever need to look it up after the fact, it's in the staging-install operator's local cache:

```bash
jq -r '.apps."harmony-cli"' ~/.local/share/harmony/zitadel/client-config.json
```

That cache is on the **operator's workstation** (the host that ran `fleet_staging_install`). The device itself doesn't have it — the operator must pass `--admin-oidc-client-id <numeric>` explicitly when running enrollment from the device, or set `HARMONY_ZITADEL_ADMIN_TOKEN` to skip SSO entirely.

## Common failure modes

- **`invalid_client: no active client not found`** — `--admin-oidc-client-id` is wrong. Most likely you passed the app name (`harmony-cli`) instead of the numeric client_id. See above.
- **`Project '<name>' not visible to the current Zitadel token`** — your SSO token's primary org differs from where the project lives. Most common when the staging install created the project as the system iam-admin user (system org) and you're signing in with a personal Zitadel account (your own org). Pass `--admin-org-id <id>` (find it in Zitadel UI → Organization → Resource ID). Alternatively, the score now logs `projects visible in current org context: …` right before the error — that list shows what your token CAN see, which usually pinpoints the org mismatch.
- **403 on management API** — operator SSO account doesn't hold a role permitting management calls. Grant `IAM_OWNER` (or equivalent scoped permission) in Zitadel admin UI.
- **`CaUsedAsEndEntity` from rustls** — talking to a dev cluster with a self-signed cert. Pass `--danger-accept-invalid-certs`.
- **Browser doesn't open over SSH** — `webbrowser` can't find a GUI. The score still prints the URL; copy it into a browser on your workstation.

## CLI flags

Run `fleet_device_enroll --help` for the full surface.