Files
harmony/docs/adr/023-deploy-architecture.md
Jean-Gabriel Gill-Couture 020ebcb1f9 refactor(fleet): deploy-architecture cleanup per ADR-023 — Scores everywhere, deploy crate, principles in CLAUDE.md
The previous e2e harness handrolled k8s manifests in `stack.rs`,
bypassing the Score-Topology-Interpret machinery harmony exists to
provide. This commit:

1. **ADR-023** codifies the rules: deploy with Scores (not
   manifests), e2e uses the same Scores as production, one Score
   per component, deploy blocks on smoke-test success, deploy logic
   lives in `*-deploy` crates, topologies are compile-time,
   thiserror over anyhow. CLAUDE.md mirrors the principles.

2. **New `fleet/harmony-fleet-deploy` crate** is the canonical home
   for fleet-component Scores:
   - `FleetOperatorScore` + helm-chart generator + `install_crds`
     moved out of `harmony::modules::fleet::operator` (they should
     never have lived in `harmony` core). `FleetServerScore`
     (composite of NATS + operator + Zitadel + callout) moved too.
   - New `FleetNatsScore` (preset over `NatsHelmChartScore` with
     fleet's required values; v1 supports `UserPass` auth, callout
     mode reserved on the public API for PR 1.5).
   - New `FleetAgentScore` with `FleetAgentTarget::Pod`; `Vm`
     target is a future variant that absorbs `FleetDeviceSetupScore`.
   - `harmony-fleet-deploy` binary built on the existing
     `harmony_cli` crate — no new CLI scaffolding.

3. **Operator runtime binary trimmed**: `Install` and `Chart`
   subcommands removed; both jobs now belong to
   `harmony-fleet-deploy`. The runtime binary becomes leaner.

4. **E2E harness rewritten** as a thin Score composer:
   `harmony-fleet-e2e/src/stack.rs` deploys the stack via
   `FleetNatsScore` + `FleetAgentScore`. The inline NATS manifest
   factory and the bespoke agent Pod renderer are gone.
   - Bring-up runs once per test binary via `shared_stack` +
     `tokio::sync::OnceCell` (matches the `fleet_e2e_demo` pattern).
   - Stale `e2e-*` namespaces from prior runs get pruned at
     startup so the leaks the OnceCell creates don't compound.

5. **`thiserror` for the agent's `CommandServer`** — replaces the
   anyhow-based surface with typed `CommandError` /
   `CommandServerError`.

6. **Memory** captures eight load-bearing principles (saved to
   `~/.claude/projects/.../memory/`) so future sessions don't drift
   back into manifest-handrolling.

Verified: `cargo test -p harmony-fleet-e2e --test ping` green
end-to-end against k3d in 25s warm.
2026-05-18 22:54:50 -04:00

193 lines
8.2 KiB
Markdown

# Architecture Decision Record: Deploy Architecture — Scores, Deploy Crates, and the E2E Contract
Initial Author: Jean-Gabriel Gill-Couture
Initial Date: 2026-05-18
Last Updated Date: 2026-05-18
## Status
Accepted. Enforces the principles already documented in
`CLAUDE.md` (Score-Topology-Interpret); this ADR adds the *deploy*
side of the contract (what a deploy crate is, how e2e harnesses
relate to production deploys, how the CLI surface is shaped) and
the smoke-test-on-deploy semantics.
## Context
The harmony codebase has drifted in three related ways that this
ADR exists to stop:
1. **Test harnesses handrolling k8s manifests.** Recent e2e harness
work (`fleet/harmony-fleet-e2e/src/stack.rs`, since reverted in
this PR) inlined `Deployment` / `Service` / `ConfigMap` structs
via `k8s_openapi::api::*`. That's the YAML-mud-pit anti-pattern
harmony exists to eliminate, dressed up in Rust. The fact that
e2e bypassed Scores while production used them meant e2e and
prod could diverge silently.
2. **Deploy logic scattered between `harmony` core and example
crates.** `FleetOperatorScore` lives in `harmony/src/modules/
fleet/operator/`, but its real "how to apply this end-to-end"
logic ended up duplicated across `examples/fleet_e2e_demo/src/
lib.rs::deploy_operator` and the operator's own `Chart`
subcommand. There's no single place a developer goes to deploy
the fleet operator.
3. **`deploy` returning before convergence.** Today `Outcome::
SUCCESS` means "apply submitted", same as `helm install`. Users
are left to figure out whether the result actually works. That's
exactly the user-experience problem harmony was built to fix.
The pattern below is the cleanup.
## Decision
Nine principles, grouped.
### Deployment as Scores
1. **Deploy with Scores, not handrolled manifests.** Capability
traits + compile-time bounds are the contract. No
`k8s_openapi::api::*` structs outside of `Score::interpret`
bodies. Test harnesses, examples, and CLI helpers compose
`*Score` types — they never reimplement deploys.
2. **E2E uses the same Scores as production.** Only the
`Topology` instance changes (local k3d, remote OKD, bare-metal
HA, …). A test harness is a `Score`-composer running against a
test Topology. If e2e needs something prod doesn't, add the
knob to the Score — don't fork the manifest in the harness.
3. **One Score per deployable component.** Composition is the
user-facing primitive: `MyAppScore` pulls in `PostgresScore`,
`HttpServerScore`, etc. Don't build monolithic "deploy
everything" Scores. Each primitive Score must be independently
testable and substitutable.
4. **Deploy returns only after smoke-test success.** Every Score
owns a readiness + smoke-test contract that the framework runs
and blocks on. Convergence errors must be actionable, in the
style of `rustc`'s error messages, not "exit code 1 from helm".
The *implementation* of the smoke-test contract (separate
trait? required Score method? companion struct?) is left open
for a follow-up ADR; the principle is locked in.
### Where deploy logic lives
5. **Deploy logic lives in a `*-deploy` crate** that depends on
both `harmony` and the runtime crate. Runtime binaries (the
thing that ships to constrained devices and to in-cluster pods)
stay free of the `harmony` dep. Pattern already established by
`harmony_agent/deploy`.
For the fleet stack specifically: **one** `fleet/harmony-fleet-
deploy` crate holds every fleet-component Score
(`FleetOperatorScore`, `FleetAgentScore`, `FleetNatsScore`,
`FleetCalloutScore`). The same crate is consumed by:
- production CLI (`harmony-fleet-deploy <component>
--topology <name>`)
- the e2e harness (composes the Scores against a k3d Topology)
- whatever future control-plane / web tool drives deploys
Fleet Scores that currently live in `harmony/src/modules/fleet/`
are migrated into `harmony-fleet-deploy` — they should never
have been in harmony core. This is a one-shot move done in the
PR that introduces the deploy crate.
### Topology selection
6. **Topologies are compile-time, selected at runtime.** A deploy
binary statically lists its supported topologies; the user
picks one at deploy time. Adding a brand-new topology backend
(AWS, GCP) is a rebuild — acceptable cost, because dynamic-
discovery topologies like `K8sAnywhere` already cover "any
physical place that runs k8s". No `Box<dyn Topology>` plugin
loaders.
### Framework evolution
7. **Extend Scores with companions, not API changes.** New
capabilities the framework wants to attach to Scores (planning,
dry-run, observability, eventually smoke-test) default to a
*companion* type or trait that wraps a Score rather than a new
method on `Score`/`Interpret`. The base public API stays simple.
The exception is principles every Score must honor (which may
force a required method) — but only after the principle has
been validated in practice via the companion-first iteration.
### CLI
8. **CLI: hybrid, staged.** Today (B): first-party tools ship as
separate `harmony-*` binaries built on the existing
`harmony_cli` crate. Improve that surface. Tomorrow (C): a
top-level `harmony` binary discovers `harmony-*` plugin
binaries on `$PATH` (`kubectl`-style) so a third-party
`MyAppScore` author gets `harmony deploy my-app` for free. The
plugin protocol is **not** in scope for any current PR; it's
a dedicated future effort.
### Error handling
9. **thiserror almost everywhere; anyhow only at binary glue.**
Library code, public crate boundaries, anything callers might
want to match on — typed errors via `thiserror`. `anyhow` is
reserved for `main.rs`-level glue where the error is just
printed. This was the second drift this PR uncovered.
## Out of scope (deferred, not rejected)
- **Score derive macro / deployment DSL.** Strategic intent from
day one; the framework's value-add concentrates here. Separate
design effort.
- **Score registry** (Crichton-style:
<https://willcrichton.net/rust-api-type-patterns/registries.html>).
Real itch — examples and Scores are hard to discover today.
Research + ADR first.
- **Inventory as capability-defined physical assets.** Inventory
is massively under-engineered today; the original idea is to
represent physical infrastructure (building → cable → switch
port → MAC) but most use cases ignore it. Decomposing inventory
into a capability set is a deep redesign.
- **Plug-in CLI discovery layer (C above).** Roadmap item;
explicitly named as the fix for the "too many disconnected
CLIs" cohesion problem.
- **`Application features` ↔ `capabilities` relationship.**
In-progress concept the project lead is personally unsure
about. Don't try to resolve in this ADR.
- **Concrete smoke-test contract shape.** See principle 4 —
principle locked, implementation deferred. Today's e2e test
suite plays the role of the smoke test until the trait/struct
shape is decided.
## Consequences
- The current `harmony-fleet-e2e/src/stack.rs` (introduced in this
PR) is **wrong** and gets rewritten as a Score composer with no
inline k8s manifests.
- `harmony::modules::fleet::operator::*` and any other fleet-deploy
modules in `harmony` core move into `fleet/harmony-fleet-deploy`.
Callers (`examples/fleet_e2e_demo`, the operator binary itself)
get updated paths.
- New Scores (`FleetAgentScore`, `FleetNatsScore`) land in
`fleet/harmony-fleet-deploy`. The agent crate gains nothing —
it stays a lean runtime binary.
- The deploy crate gets its own `main.rs` driven by `harmony_cli`,
exposing one subcommand per component plus an `all` composite.
- Future work (smoke-test contract, Score derive macro, registry,
CLI discovery) gets dedicated ADRs/PRs and does **not** sneak
into unrelated work.
## References
- `CLAUDE.md` — Score-Topology-Interpret pattern, capability
design rules.
- `docs/adr/002-hexagonal-architecture.md` — domain/adapter split
this builds on.
- `docs/adr/005-rust-dsl-over-yaml.md` — the original "no
YAML-mud-pit" call.
- `harmony_agent/deploy` — existing `*-deploy` crate pattern.
- `fleet/PLAN_requests_over_nats.md` — the working plan for the
request/reply work this ADR landed during.