feat/prepare-rpi #280

Merged
stremblay merged 14 commits from feat/prepare-rpi into feat/iot-walking-skeleton 2026-05-04 17:28:45 +00:00
Owner

Adds a runnable example (examples/fleet_rpi_setup) for onboarding an already-booted Raspberry Pi via FleetDeviceSetupScore, plus the framework-level changes needed to make that flow clean.

What's new

  • examples/fleet_rpi_setup — physical-device sibling of fleet_vm_setup. No libvirt step; SSH straight to the Pi. Includes env.sh with the standard secret-store / database-url variables.
  • Pre-flight config diff in FleetDeviceSetupScore — reads the existing /etc/fleet-agent/config.toml, shows a unified diff against the desired one, prompts before overwriting.
  • Optional sudo password — when the bootstrap admin doesn't have passwordless sudo, the example auto-detects (sudo -n true probe) and fetches the password through SecretManager::get_or_prompt::<SudoPassword>() (cached after first prompt). Ansible gets it via ANSIBLE_BECOME_PASSWORD_FILE (path in env, never
    argv); direct ssh_exec sudo paths use sudo -S with the password piped through a new stdin: Option<&str> on ssh_exec.
  • FileFetcher capability trait + SudoPassword Secret type.

Score UX (FleetDeviceSetupScore)

  • Tagged per-step traces ([FleetSetup/<device>] Step N/7 — …).
  • Structured Outcome.details recap surfaced as a bullet list under 🚀 All done! via harmony_cli's standard reporter — no bespoke renderer in the example.

Framework-level fixes (apply to every score)

  • fix(linux) ensure_user_unit_active was unconditionally CHANGED; now probes is-enabled + is-active and reports NOOP correctly.
  • fix(cli) cli_reporter was filtering out NOOP details, dropping the recap on idempotent re-runs.
  • fix(linux) drops the redundant (ansible-exit=…, stderr=…, stdout=…) envelope when parse_err already carries the message.

Behavior unchanged

Default passwordless-sudo flow is identical to before — the probe runs first and only triggers the prompt on hosts that need it. SSH auth is still key-only; password-based SSH login is documented as a planned future extension (TODO on SshCredentials).

Adds a runnable example (`examples/fleet_rpi_setup`) for onboarding an already-booted Raspberry Pi via `FleetDeviceSetupScore`, plus the framework-level changes needed to make that flow clean. ## What's new - **`examples/fleet_rpi_setup`** — physical-device sibling of `fleet_vm_setup`. No libvirt step; SSH straight to the Pi. Includes `env.sh` with the standard secret-store / database-url variables. - **Pre-flight config diff** in `FleetDeviceSetupScore` — reads the existing `/etc/fleet-agent/config.toml`, shows a unified diff against the desired one, prompts before overwriting. - **Optional sudo password** — when the bootstrap admin doesn't have passwordless sudo, the example auto-detects (`sudo -n true` probe) and fetches the password through `SecretManager::get_or_prompt::<SudoPassword>()` (cached after first prompt). Ansible gets it via `ANSIBLE_BECOME_PASSWORD_FILE` (path in env, never argv); direct `ssh_exec` sudo paths use `sudo -S` with the password piped through a new `stdin: Option<&str>` on `ssh_exec`. - **`FileFetcher` capability trait** + **`SudoPassword` Secret type**. ## Score UX (FleetDeviceSetupScore) - Tagged per-step traces (`[FleetSetup/<device>] Step N/7 — …`). - Structured `Outcome.details` recap surfaced as a bullet list under `🚀 All done!` via `harmony_cli`'s standard reporter — no bespoke renderer in the example. ## Framework-level fixes (apply to every score) - `fix(linux)` `ensure_user_unit_active` was unconditionally `CHANGED`; now probes `is-enabled` + `is-active` and reports `NOOP` correctly. - `fix(cli)` `cli_reporter` was filtering out `NOOP` details, dropping the recap on idempotent re-runs. - `fix(linux)` drops the redundant `(ansible-exit=…, stderr=…, stdout=…)` envelope when `parse_err` already carries the message. ## Behavior unchanged Default passwordless-sudo flow is identical to before — the probe runs first and only triggers the prompt on hosts that need it. SSH auth is still key-only; password-based SSH login is documented as a planned future extension (TODO on `SshCredentials`).
stremblay added 14 commits 2026-05-01 18:53:21 +00:00
systemctl --user enable --now is systemd-level idempotent, but the
prior implementation always returned ChangeReport::CHANGED. This made
every re-run of any score that touches a user-scoped unit (notably
FleetDeviceSetupScore's podman.socket step) lie about its change
count, defeating the noop detection the rest of the score honors.

Probe is-enabled --quiet && is-active --quiet first; only call
enable --now (and report CHANGED) when the unit isn't already in the
desired state. Mirrors the existing ensure_linger pattern in the
same file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sibling of fleet_vm_setup with the libvirt provisioning step removed:
the operator has already booted Pi OS Lite themselves (rpi-imager,
preloaded SSH key, passwordless sudo on the admin user), so the
example goes straight to applying FleetDeviceSetupScore over SSH.

Defaults match the typical rpi-imager flow (--pi-user pi,
--ssh-key ~/.ssh/id_ed25519); --ssh-key supports tilde expansion.
The harmony dep is pulled in without the kvm feature since no VM is
created here. RUST_LOG defaults to info so the score's per-step
traces show up without the operator having to set the env var.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When stdout already parses into UNREACHABLE!/FAILED! + msg, the
trailing (ansible-exit=..., stderr=..., stdout=...) envelope just
duplicated the same text. Strip it when stderr is empty and the
verb is recognized; keep it when it adds debug signal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the opaque change-log with tagged per-step info traces and
a human-readable Outcome.details recap (Device ID / NATS / Labels /
User / Agent binary -> remote / Service). User and Service lines
carry their own /🔄 state markers; final line is  for noop and
🎉 for runs that applied changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the bespoke framed renderer, failure hint catalog, and custom
env_logger setup. Score output now flows through harmony_cli's
standard reporter (bullet list under "🚀 All done!"), matching the
other examples. cli_logger::init() at the top of main so early
logs (ensure_ansible_venv) get the same formatting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cli_reporter only accumulated details for SUCCESS, dropping the
recap on idempotent re-runs that legitimately return NOOP with
populated details. FleetDeviceSetupScore is the first score to
exercise this path; the filter was over-restrictive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Folds the "-> /usr/local/bin/fleet-agent" continuation into the
"Agent binary:" line. Removes the hardcoded-indent fragility (bullet
prefix shifts in cli_reporter would have broken alignment).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors FileDelivery in the opposite direction: returns Some(content)
or None if the file doesn't exist. AnsibleHostConfigurator implements
it via two SSH calls (sudo test -e + sudo cat), routed through sudo
to handle root- or service-owned config files. Added to the
LinuxHostConfiguration umbrella so any score with that bound gets it.

Enables scores to pre-flight-compare desired state against current
state before committing to a destructive change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New first step (1/7): read /etc/fleet-agent/config.toml off the
device and compare against the rendered desired config. Three
branches:

  - missing  → info, first install
  - matches  → warn, converge anyway
  - differs  → warn + unified diff (similar::TextDiff with 2-line
    context radius, '-/+' marker style) + inquire::Confirm prompt
    defaulting to N. Aborts with InterpretError if declined.

Existing 6 steps renumbered to 2/7-7/7. The diff replaces the
previous "dump both full configs" approach which was unreadable
even for one-line differences.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sudo password for a Linux bootstrap admin user. Stored under key
"SudoPassword" via SecretManager when a host doesn't have
passwordless sudo configured. Same shape as the other single-field
Secret types in this file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lets callers populate creds.sudo_password when the bootstrap admin
doesn't have passwordless sudo. None = current behavior unchanged.

Wire-level injection:
- ansible runs: when Some, write to a tempfile::NamedTempFile and
  pass ANSIBLE_BECOME_PASSWORD_FILE=<path> via Command::env. Path
  in env, never value in argv. File deletes on drop.
- direct ssh_exec sudo paths (ensure_linger, ensure_user_unit_active,
  fetch_file): new sudo_exec helper that uses `sudo -S` with the
  password piped via the new ssh_exec stdin parameter, otherwise
  plain sudo. ensure_user_unit_active's && chain folded into one
  sudo+sh -c call since `sudo -S` only reads stdin once.

ssh_executor.rs: ssh_exec gains an optional stdin: Option<&str>; on
Some, writes via channel.data() then channel.eof() so the remote
reader doesn't hang. Existing 4 call sites pass None.

fleet_vm_setup updated to set sudo_password: None (behavior
identical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Probe `sudo -n true` over SSH before constructing the topology. If
the probe succeeds (passwordless sudo, the typical rpi-imager
default), proceed silently. If it fails, fetch the password through
SecretManager::get_or_prompt::<SudoPassword>() — first run prompts
the operator, subsequent runs reuse the cached value (same flow
SshKeyPair etc. use).

Adds harmony_secret dep, env.sh with the standard
HARMONY_SECRET_NAMESPACE / HARMONY_SECRET_STORE / HARMONY_DATABASE_URL
/ RUST_LOG variables, and a doc snippet at the top of main.rs
pointing at it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new sudo_password field is strictly for privilege escalation on
the remote host (sudo -S, ansible become) — not for SSH login. SSH
auth is still key-only. Adds a TODO on SshCredentials pointing at
where SSH password support would land if/when we want it, and a
matching note on the SudoPassword Secret type.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat: add little script to call the fleet_rpi_setup example
Some checks failed
Run Check Script / check (pull_request) Failing after -44h57m30s
b86f8f11f9
stremblay merged commit ebd199b22e into feat/iot-walking-skeleton 2026-05-04 17:28:45 +00:00
stremblay deleted branch feat/prepare-rpi 2026-05-04 17:28:45 +00:00
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: NationTech/harmony#280
No description provided.