Files
harmony/ROADMAP/fleet_platform/device_enrollment_token_caching.md
Jean-Gabriel Gill-Couture cf350d890a feat(fleet): device enrollment via Zitadel SSO + Pi-equivalent rehearsal VM
`FleetDeviceSetupScore` gains `FleetDeviceAuth::ZitadelEnroll` —
resolves the device's Zitadel machine user + JSON key inline, then
falls through to the existing keyfile-drop flow exactly as if a
pre-resolved `ZitadelJwt` had been passed.

Two operator workflows fall out of this:

* Dev-on-device — developer runs the score on a Pi with display
  attached, browser opens locally to Zitadel SSO, dev signs in with
  their personal account (must hold IAM_OWNER or equivalent), score
  mints credentials for that one device and brings up the agent.
* Production-via-SSH — operator runs from a workstation, targets
  each device over SSH. Browser opens once on the workstation; the
  resulting access token is in-memory only for v0 (per-batch token
  caching tracked in
  ROADMAP/fleet_platform/device_enrollment_token_caching.md).

Implementation:

* `harmony/src/modules/zitadel/admin_auth.rs` — RFC 8628 device-code
  flow against Zitadel. Tries `webbrowser::open`, falls back to
  printing the URL (SSH sessions just see the URL). Minimum scope
  set is `openid urn:zitadel:iam:org:project🆔zitadel:aud` —
  enough to call `/management/v1/*`, nothing more.
* `harmony/src/modules/zitadel/setup.rs` — `mint_device_credentials`
  helper that reuses the existing find-or-create methods (project,
  machine user, user grant) plus `create_machine_key`. Idempotent on
  user + grant; always mints a new key because Zitadel does not
  return existing key material.
* `harmony/src/modules/fleet/setup_score.rs` — new `ZitadelEnroll`
  variant + `AdminAuth::{Sso, Token}`. Resolution runs at the top
  of execute(); the rest of the score sees a single shape.
  render_toml's match collapses both Zitadel variants into one arm
  (they share the issuer/audience/danger fields).
* `harmony/src/modules/fleet/assets.rs` — Debian bookworm arm64
  generic-cloud image fetcher. This is the same Debian base
  Raspberry Pi OS is built on; Pi OS itself is locked to Pi
  hardware (Broadcom firmware) and won't boot in generic KVM.
  No sha pin (Debian's `latest/` URL rotates per point release);
  swap to a dated subdir if you need cryptographic provenance.
* `examples/fleet_device_enroll/` — single CLI covering both
  workflows + a `--launch-pi-vm` switch that boots a Pi-equivalent
  VM with one command and prints the SSH details + suggested
  follow-up enrollment command. README walks the three flows.

Tests: `render_toml_zitadel_enroll_renders_same_as_zitadel_jwt`
locks the byte-equivalence between the unresolved (Enroll) and
resolved (Jwt) variants — the invariant `execute()` relies on so
TOML rendering is independent of when admin auth resolves.

Adds `webbrowser` as a regular dependency on `harmony` (small,
no feature gate).
2026-05-05 22:08:59 -04:00

2.6 KiB

Device enrollment — admin token caching

Status: deferred. Out of scope for v0; targeted at production-line batch enrollment.

Background

FleetDeviceSetupScore gained a FleetDeviceAuth::ZitadelEnroll variant (see examples/fleet_device_enroll/). When the operator picks AdminAuth::Sso, the score runs an OIDC device-code flow against Zitadel, gets an admin Bearer token, mints the device's machine user + key, and proceeds with the existing on-device install.

The token is held in memory only for the duration of one score run. Re-running for the next device requires the operator to sign in through the browser again. For developer-on-device scenarios that's fine — devs typically enroll one machine at a time and the browser is already authenticated to Zitadel anyway, so the device-code prompt auto-completes with one click.

What's deferred

For production-line batch enrollment (the target deployment shape: an operator stamping out tens or hundreds of devices per shift), an in-memory-only token forces a re-prompt per device. The browser-cached session means the click is fast, but it's still a click + a window focus per device — not what we want at scale.

The unbuilt feature: cache the admin access token (and its refresh_token if Zitadel's device-code response includes one) on the operator's workstation under a known path (e.g. ~/.local/state/harmony/zitadel-admin-token.json), with appropriate file mode (0600). Re-runs of the score within the token's lifetime re-use the cached token and skip the browser entirely; expired tokens trigger one re-login for the whole batch.

Sketch of the implementation

  • New module: harmony/src/modules/zitadel/admin_token_cache.rs — serde struct with access_token, expires_at, optional refresh_token. Loaded eagerly at the start of device_code_login if a recent file exists; rejected if expired or for a different issuer/client_id.
  • AdminAuth::Sso gains a cache: TokenCachePolicy field ({ Default, NoCache, Refresh }). Default-on for v0.1.
  • A --no-token-cache flag on the example mirrors the policy.
  • Refresh handling: if Zitadel returns a refresh_token (it does for the device-code grant when configured with offline_access scope), use it transparently before falling back to a fresh interactive login.

When to do this

Pick this up when the production-line workflow becomes a real cadence (>10 devices/run typical) or when the demo target customer reports re-prompt friction. For now, the in-memory token + browser session cookie covers the dev-on-device + small-batch admin cases adequately.