feat(fleet-deploy): smoke-test contract as a Score companion #292

Open
johnride wants to merge 3 commits from feat/smoke-test-contract into master

3 Commits

Author SHA1 Message Date
269ab2fbed Merge pull request 'docs(fleet): v0.3 last-mile roadmap' (#296) from docs/v0-3-roadmap into feat/smoke-test-contract
All checks were successful
Run Check Script / check (pull_request) Successful in 2m20s
Reviewed-on: #296
2026-05-25 12:28:47 +00:00
9deebab1ff docs(fleet): v0.3 last-mile roadmap
All checks were successful
Run Check Script / check (pull_request) Successful in 2m25s
Authoritative plan for the last mile before fleet ships to a real
customer. Picks up where v0_2_plan.md left the chapter structure.

Twelve chapters, organized in execution order:

  1. Dashboard role enforcement (security gap, do right now)
  2. Operator restart + aggregator recovery (more critical than smoke)
  3. Application log forwarding companion (dashboard utility)
  4. Agent self-upgrade, NATS-coordinated, systemd-resident
  5. Graceful deployment upgrade (roll-forward only — customer ask)
  6. Init containers in PodmanV0Score
  7. System upgrade, rollback deferred to v0.4
  8. Secrets via Zitadel + OpenBao (blocked on harmony_secret work)
  9. Agent time-drift verification
  10. Phase 1 smoke wiring
  11. CI yaml minimization (longer-term)
  12. NATS callout CI hardening (minimal)

Customer constraints baked in: deployments are roll-forward only
(no auto-rollback on Deployment failure); system rollback half of
the upgrade ADR is deferred to v0.4 (snapshot is created but not
used for revert in v0.3); secrets must go through Zitadel + OpenBao
(no plaintext shortcut).

Includes:
  - feature checklist as a status table (14 items),
  - sequencing table with ordering rationale,
  - per-chapter goal / current state with file:line citations /
    plan / open questions / "done when",
  - out-of-scope table with target version + reason,
  - cross-cutting open questions Q1–Q5.

Format follows the user's "tables over prose" preference: every
multi-item section is either a table or bold-led bullets with
nested supporting detail. Scannable at three depths (30-second
scroll for bold leads, 2-minute read for nested detail, deep read
with code where it matters).
2026-05-24 10:54:08 -04:00
1e898a7328 feat(fleet-deploy): smoke-test contract as a Score companion
All checks were successful
Run Check Script / check (pull_request) Successful in 2m33s
Phase 0 of the smoke-test contract from ADR-023 P4 ("deploy returns
only after smoke-test success"). Lands inside harmony-fleet-deploy
under companion/smoke/ so the shape can be validated against a real
consumer before promoting to a top-level crate.

Three small types + a wrapper, zero edits to the Score / Interpret /
Maestro public API:

  Probe         single-attempt unit; classifies its own observation
                as Ok / Retry / Fatal via ProbeAttempt.
  SmokeSuite    ordered list of (probe, RetryPolicy) stages; sequential
                by design.
  SmokeTest     companion trait paired with a Score by associated type
                — `SM: SmokeTest<T, Score = S>` is the compile-time
                lock (ADR-024 §2 / JG's compile-time-feedback principle).
  deploy /      free functions binding Score::interpret to an optional
  deploy_with_  SmokeTest. The smoke variant returns only after the
  smoke         suite passes; a failed probe is DeployError::SmokeFailed
                with the full report attached.

Cardinality choices follow JG's *For the Love of Compilers* talk:
ProbeName is a validated newtype (rejected: empty / control-chars /
>128 bytes), ProbeAttempt is a three-arm sum because "not ready yet"
and "definitively no" drive different orchestration paths, ProbeFailure
splits Timeout from Rejected because the operator actions are different.
RetryPolicy::polling panics on a zero interval at construction rather
than spinning the executor in production.

One concrete probe ships in this PR — TcpReachable — to validate the
trait shape against real network I/O. Run_probe orchestrates retry +
timeout outside the probe itself so each probe stays single-attempt
and unit-testable.

Stage progress is emitted via tracing::info_span! / info! today.
Adding HarmonyEvent::SmokeStage* variants is deferred to Phase 1
(Score / Interpret stay untouched in Phase 0).

Phase 1 follow-ups: HttpHealthy, K8sPodReady, NatsKvKeyExists probes;
a real FleetOperatorSmokeTest composed of those; optional dashboard
event variants. None of those touch the contract — that's the point
of locking the shape here.

Tests: 21 new (4 in tcp::tests, 8 in probe::tests, 3 in suite::tests,
4 in deploy::tests). Full crate: 39 / 39 passing. No regressions.
build/check.sh equivalents: cargo check --all-targets --all-features
clean, cargo fmt --check clean, cargo clippy with zero findings on
the new module.
2026-05-23 16:29:35 -04:00