The client_id is the harmony-cli app's plain numeric Client ID — not <num>@fleet, not the app name, not the app Resource ID. Fix the enroll README example and the dev-enroll-device.sh comment accordingly.
198 lines
10 KiB
Markdown
198 lines
10 KiB
Markdown
# Example: Fleet Device Enroll
|
|
|
|
Enrolls a device into the fleet by minting its Zitadel machine user + JSON key inline (browser SSO or pre-acquired admin token), then runs `FleetDeviceSetupScore` against the device to install podman, drop the keyfile + agent config, and bring up the agent under systemd.
|
|
|
|
Two operator workflows land on the same code path:
|
|
|
|
- **Dev-on-device** — developer runs the score on a Pi with keyboard + display attached. Browser opens locally, dev signs in with their personal SSO account, the score provisions credentials for that one device.
|
|
- **Production-via-SSH** — operator runs the score from a workstation, targets each device over SSH. Browser opens once on the workstation. (Per-batch token caching is on the roadmap; v0 re-prompts per device but the browser session cookie keeps the click cheap.)
|
|
|
|
## How to use
|
|
|
|
> **Fast inner loop:** `fleet/scripts/dev-enroll-device.sh` wraps the cross-build
|
|
> + the enroll command below into one shot (per-device values from env / a local
|
|
> `.envrc`). Use it to iterate; the sections below explain what it runs.
|
|
|
|
### Prerequisites
|
|
|
|
- A running staging install (Zitadel + NATS + auth callout + operator) — see `examples/fleet_staging_install/`.
|
|
- The Zitadel project ID for `fleet` (from the staging install output).
|
|
- A cross-compiled `fleet-agent` binary for the target arch.
|
|
- For VM rehearsal: libvirt + qemu-system-aarch64 + xorriso installed locally. Run `cargo run -p example_fleet_vm_setup -- --bootstrap-only --arch aarch64` once to prime the asset cache and SSH keys.
|
|
- Your Zitadel SSO account must hold a role permitting machine-user, role-grant, and machine-key creation (typically `IAM_OWNER` or `ORG_OWNER`).
|
|
|
|
### Build flavors
|
|
|
|
The crate has two flavors selected by Cargo features:
|
|
|
|
| Flavor | Command | What it includes |
|
|
|---|---|---|
|
|
| **Workstation** (default) | `cargo build --release -p example_fleet_device_enroll` | Everything: `--launch-pi-vm`, `--vm-rehearsal`, full enrollment. Pulls in libvirt via the `vm-rehearsal` feature. |
|
|
| **Device-side** (cross-compile) | `cargo build --release --target aarch64-unknown-linux-musl -p example_fleet_device_enroll --no-default-features` | Enrollment-only — no VM-rehearsal flags, no libvirt. Builds for arm64. **Use the musl target, not gnu** (see below). |
|
|
|
|
#### Why musl, not gnu
|
|
|
|
Building with `--target aarch64-unknown-linux-gnu` links against the host's glibc. On a current Arch / Fedora workstation that's glibc 2.41+; on the device it might be glibc 2.36 (Debian 12) or 2.41 (Debian 13). When the workstation's glibc is newer than the device's, the binary fails to start with:
|
|
|
|
```
|
|
./fleet_device_enroll: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.39' not found
|
|
```
|
|
|
|
`aarch64-unknown-linux-musl` produces a **fully static binary** linked against musl libc, which is bundled in. It runs on any aarch64 Linux regardless of the host's libc generation — Debian 12, 13, Pi OS, Alpine, all the same. That's what we want for a device-side binary that gets shipped onto whatever userland the production line happens to flash.
|
|
|
|
#### One-time musl setup
|
|
|
|
```bash
|
|
rustup target add aarch64-unknown-linux-musl
|
|
# Arch: sudo pacman -S aarch64-linux-musl (AUR) or use mold-aarch64
|
|
# Fedora: sudo dnf install gcc-aarch64-linux-gnu (we use musl-cross via rustup)
|
|
```
|
|
|
|
You may need to point Cargo at the right linker. In `~/.cargo/config.toml`:
|
|
|
|
```toml
|
|
[target.aarch64-unknown-linux-musl]
|
|
linker = "aarch64-linux-musl-gcc"
|
|
```
|
|
|
|
Or use `cross` (`cargo install cross`) which handles the toolchain automatically:
|
|
|
|
```bash
|
|
cross build --release --target aarch64-unknown-linux-musl \
|
|
-p example_fleet_device_enroll --no-default-features
|
|
```
|
|
|
|
#### Copying to the device
|
|
|
|
```bash
|
|
scp target/aarch64-unknown-linux-musl/release/fleet_device_enroll pi@<host>:
|
|
```
|
|
|
|
Then SSH to the device and run it as documented in [Dev-on-device](#dev-on-device) above.
|
|
|
|
### Quickstart — Pi-equivalent VM rehearsal
|
|
|
|
Boot a Pi-equivalent VM (Debian bookworm arm64 generic-cloud — same Debian base Pi OS is built on; Pi OS itself is locked to Pi hardware and won't boot in generic KVM) with one command:
|
|
|
|
```bash
|
|
cargo run -p example_fleet_device_enroll -- --launch-pi-vm
|
|
```
|
|
|
|
The command boots the VM and exits, printing the SSH connection details and a suggested next command. From there, enroll the running VM:
|
|
|
|
```bash
|
|
./target/debug/fleet_device_enroll \
|
|
--target ssh://fleet-admin@<VM_IP> \
|
|
--device-id pi-rehearsal-01 \
|
|
--issuer-url https://sso-staging.cb1.nationtech.io \
|
|
--audience <PROJECT_ID> \
|
|
--nats-url wss://nats-fleet-staging.cb1.nationtech.io \
|
|
--admin-oidc-client-id <CLIENT_ID> \
|
|
--agent-binary target/aarch64-unknown-linux-gnu/release/fleet-agent
|
|
```
|
|
|
|
`--device-id` is required and validated against RFC1123 subdomain rules (lowercase alphanumeric + `-`, must start and end with an alphanumeric, ≤253 chars total / ≤63 chars per label). Same id is reused for the agent's TOML, the Zitadel machine username (`device-<id>`), and the Kubernetes Device CR — so anything kube wouldn't accept as a `metadata.name` is rejected upfront here instead of three layers down at operator-reconcile time.
|
|
|
|
The browser opens to Zitadel's device-code login. Sign in with your SSO account; the score mints the per-device user, drops the keyfile, and brings up the agent.
|
|
|
|
### Dev-on-device
|
|
|
|
Run the binary on the Pi itself, omit `--target` entirely. The score uses ansible's local connection and runs everything on the same machine — no SSH, no keypair:
|
|
|
|
```bash
|
|
fleet_device_enroll \
|
|
--issuer-url https://sso.example.com \
|
|
--audience <PROJECT_ID> \
|
|
--nats-url wss://nats.example.com \
|
|
--admin-oidc-client-id <CLIENT_ID> \
|
|
--agent-binary /usr/local/bin/fleet-agent \
|
|
--device-id pi-001 \
|
|
--labels group=lab,arch=aarch64
|
|
```
|
|
|
|
Browser opens on the Pi's local display. The dev signs in once; the score handles the rest. Sudo prompts the operator's password if passwordless sudo isn't configured (which is fine — Debian's default).
|
|
|
|
Auto-installs `python3-venv` on first run if missing (Debian splits it out of base python3); the score detects the failure, runs `sudo apt-get install -y python3-venv`, and retries the venv create.
|
|
|
|
### Production-via-SSH
|
|
|
|
Operator runs from a workstation, targeting devices on the LAN:
|
|
|
|
```bash
|
|
fleet_device_enroll \
|
|
--target ssh://pi@10.0.0.42 \
|
|
--issuer-url https://sso.example.com \
|
|
--audience <PROJECT_ID> \
|
|
--nats-url wss://nats.example.com \
|
|
--agent-binary ./build/fleet-agent-aarch64 \
|
|
--device-id batch7-042 \
|
|
--labels group=batch7,site=warehouse-east
|
|
```
|
|
|
|
Each invocation re-prompts the browser. Token caching across runs is tracked in `ROADMAP/fleet_platform/device_enrollment_token_caching.md`.
|
|
|
|
### Non-interactive (CI / scripted)
|
|
|
|
Skip the browser by passing a Bearer token:
|
|
|
|
```bash
|
|
HARMONY_ZITADEL_ADMIN_TOKEN=<pat-or-access-token> \
|
|
fleet_device_enroll \
|
|
--target ssh://pi@10.0.0.42 \
|
|
--issuer-url https://sso.example.com \
|
|
--audience <PROJECT_ID> \
|
|
--nats-url wss://nats.example.com \
|
|
--agent-binary ./build/fleet-agent-aarch64
|
|
```
|
|
|
|
## What the score does on the device
|
|
|
|
For each invocation the score:
|
|
|
|
1. Calls Zitadel `/management/v1/*` with the admin token to find-or-create the device's machine user, grant it the `device` role on the fleet project, and mint a JSON key (idempotent on user + grant; always mints a new key because Zitadel doesn't return existing material).
|
|
2. SSHes to the target, ensures `podman` + `systemd-container` packages, creates the `fleet-agent` user with linger, activates the user-scoped podman socket.
|
|
3. Uploads the agent binary to `/usr/local/bin/fleet-agent`.
|
|
4. Drops the JSON keyfile at `/etc/fleet-agent/zitadel-key.json` (mode 0640, owned by `fleet-agent`).
|
|
5. Renders `/etc/fleet-agent/config.toml` with the agent's NATS URLs, labels, and `[credentials]` block pointing at the keyfile.
|
|
6. Installs and starts `fleet-agent.service`. Restarts only if config / binary / unit changed.
|
|
|
|
The agent then mints NATS JWTs from the keyfile via the auth callout's JWT-bearer flow and registers itself in the `device-info` KV.
|
|
|
|
## Verification
|
|
|
|
After enrollment, the device's heartbeat should appear within seconds:
|
|
|
|
```bash
|
|
nats kv get fleet-device-info <device-id>
|
|
```
|
|
|
|
Or watch via the operator's dashboard / CRs:
|
|
|
|
```bash
|
|
kubectl get fleetdev # devices CRD
|
|
```
|
|
|
|
## SSO `client_id` — where to get it
|
|
|
|
`--admin-oidc-client-id` is the **numeric Zitadel-assigned client_id** of the `harmony-cli` device-code app — just the number (e.g. `371683318111994677`), no suffix. It is not the human-readable app name (`harmony-cli`), and not the app's Resource ID. The staging install prints this value in its final summary block — copy it from there.
|
|
|
|
If you ever need to look it up after the fact, it's in the staging-install operator's local cache:
|
|
|
|
```bash
|
|
jq -r '.apps."harmony-cli"' ~/.local/share/harmony/zitadel/client-config.json
|
|
```
|
|
|
|
That cache is on the **operator's workstation** (the host that ran `fleet_staging_install`). The device itself doesn't have it — the operator must pass `--admin-oidc-client-id <numeric>` explicitly when running enrollment from the device, or set `HARMONY_ZITADEL_ADMIN_TOKEN` to skip SSO entirely.
|
|
|
|
## Common failure modes
|
|
|
|
- **`invalid_client: no active client not found`** — `--admin-oidc-client-id` is wrong. Most likely you passed the app name (`harmony-cli`) instead of the numeric client_id. See above.
|
|
- **`Project '<name>' not visible to the current Zitadel token`** — your SSO token's primary org differs from where the project lives. Most common when the staging install created the project as the system iam-admin user (system org) and you're signing in with a personal Zitadel account (your own org). Pass `--admin-org-id <id>` (find it in Zitadel UI → Organization → Resource ID). Alternatively, the score now logs `projects visible in current org context: …` right before the error — that list shows what your token CAN see, which usually pinpoints the org mismatch.
|
|
- **403 on management API** — operator SSO account doesn't hold a role permitting management calls. Grant `IAM_OWNER` (or equivalent scoped permission) in Zitadel admin UI.
|
|
- **`CaUsedAsEndEntity` from rustls** — talking to a dev cluster with a self-signed cert. Pass `--danger-accept-invalid-certs`.
|
|
- **Browser doesn't open over SSH** — `webbrowser` can't find a GUI. The score still prints the URL; copy it into a browser on your workstation.
|
|
|
|
## CLI flags
|
|
|
|
Run `fleet_device_enroll --help` for the full surface.
|