harmony/docs/adr/023-deploy-architecture.md

# Architecture Decision Record: Deploy Architecture — Scores, Deploy Crates, and the E2E Contract

Initial Author: Jean-Gabriel Gill-Couture

Initial Date: 2026-05-18

Last Updated Date: 2026-05-18

## Status

Accepted. Enforces the principles already documented in
`CLAUDE.md` (Score-Topology-Interpret); this ADR adds the *deploy*
side of the contract (what a deploy crate is, how e2e harnesses
relate to production deploys, how the CLI surface is shaped) and
the smoke-test-on-deploy semantics.

## Context

The harmony codebase has drifted in three related ways that this
ADR exists to stop:

1. **Test harnesses handrolling k8s manifests.** Recent e2e harness
   work (`fleet/harmony-fleet-e2e/src/stack.rs`, since reverted in
   this PR) inlined `Deployment` / `Service` / `ConfigMap` structs
   via `k8s_openapi::api::*`. That's the YAML-mud-pit anti-pattern
   harmony exists to eliminate, dressed up in Rust. The fact that
   e2e bypassed Scores while production used them meant e2e and
   prod could diverge silently.

2. **Deploy logic scattered between `harmony` core and example
   crates.** `FleetOperatorScore` lives in `harmony/src/modules/
   fleet/operator/`, but its real "how to apply this end-to-end"
   logic ended up duplicated across `examples/fleet_e2e_demo/src/
   lib.rs::deploy_operator` and the operator's own `Chart`
   subcommand. There's no single place a developer goes to deploy
   the fleet operator.

3. **`deploy` returning before convergence.** Today `Outcome::
   SUCCESS` means "apply submitted", same as `helm install`. Users
   are left to figure out whether the result actually works. That's
   exactly the user-experience problem harmony was built to fix.

The pattern below is the cleanup.

## Decision

Nine principles, grouped.

### Deployment as Scores

1. **Deploy with Scores, not handrolled manifests.** Capability
   traits + compile-time bounds are the contract. No
   `k8s_openapi::api::*` structs outside of `Score::interpret`
   bodies. Test harnesses, examples, and CLI helpers compose
   `*Score` types — they never reimplement deploys.

2. **E2E uses the same Scores as production.** Only the
   `Topology` instance changes (local k3d, remote OKD, bare-metal
   HA, …). A test harness is a `Score`-composer running against a
   test Topology. If e2e needs something prod doesn't, add the
   knob to the Score — don't fork the manifest in the harness.

3. **One Score per deployable component.** Composition is the
   user-facing primitive: `MyAppScore` pulls in `PostgresScore`,
   `HttpServerScore`, etc. Don't build monolithic "deploy
   everything" Scores. Each primitive Score must be independently
   testable and substitutable.

4. **Deploy returns only after smoke-test success.** Every Score
   owns a readiness + smoke-test contract that the framework runs
   and blocks on. Convergence errors must be actionable, in the
   style of `rustc`'s error messages, not "exit code 1 from helm".
   The *implementation* of the smoke-test contract (separate
   trait? required Score method? companion struct?) is left open
   for a follow-up ADR; the principle is locked in.

### Where deploy logic lives

5. **Deploy logic lives in a `*-deploy` crate** that depends on
   both `harmony` and the runtime crate. Runtime binaries (the
   thing that ships to constrained devices and to in-cluster pods)
   stay free of the `harmony` dep. Pattern already established by
   `harmony_agent/deploy`.

   For the fleet stack specifically: **one** `fleet/harmony-fleet-
   deploy` crate holds every fleet-component Score
   (`FleetOperatorScore`, `FleetAgentScore`, `FleetNatsScore`,
   `FleetCalloutScore`). The same crate is consumed by:
   - production CLI (`harmony-fleet-deploy <component>
     --topology <name>`)
   - the e2e harness (composes the Scores against a k3d Topology)
   - whatever future control-plane / web tool drives deploys

   Fleet Scores that currently live in `harmony/src/modules/fleet/`
   are migrated into `harmony-fleet-deploy` — they should never
   have been in harmony core. This is a one-shot move done in the
   PR that introduces the deploy crate.

### Topology selection

6. **Topologies are compile-time, selected at runtime.** A deploy
   binary statically lists its supported topologies; the user
   picks one at deploy time. Adding a brand-new topology backend
   (AWS, GCP) is a rebuild — acceptable cost, because dynamic-
   discovery topologies like `K8sAnywhere` already cover "any
   physical place that runs k8s". No `Box<dyn Topology>` plugin
   loaders.

### Framework evolution

7. **Extend Scores with companions, not API changes.** New
   capabilities the framework wants to attach to Scores (planning,
   dry-run, observability, eventually smoke-test) default to a
   *companion* type or trait that wraps a Score rather than a new
   method on `Score`/`Interpret`. The base public API stays simple.
   The exception is principles every Score must honor (which may
   force a required method) — but only after the principle has
   been validated in practice via the companion-first iteration.

### CLI

8. **CLI: hybrid, staged.** Today (B): first-party tools ship as
   separate `harmony-*` binaries built on the existing
   `harmony_cli` crate. Improve that surface. Tomorrow (C): a
   top-level `harmony` binary discovers `harmony-*` plugin
   binaries on `$PATH` (`kubectl`-style) so a third-party
   `MyAppScore` author gets `harmony deploy my-app` for free. The
   plugin protocol is **not** in scope for any current PR; it's
   a dedicated future effort.

### Error handling

9. **thiserror almost everywhere; anyhow only at binary glue.**
   Library code, public crate boundaries, anything callers might
   want to match on — typed errors via `thiserror`. `anyhow` is
   reserved for `main.rs`-level glue where the error is just
   printed. This was the second drift this PR uncovered.

## Out of scope (deferred, not rejected)

- **Score derive macro / deployment DSL.** Strategic intent from
  day one; the framework's value-add concentrates here. Separate
  design effort.
- **Score registry** (Crichton-style:
  <https://willcrichton.net/rust-api-type-patterns/registries.html>).
  Real itch — examples and Scores are hard to discover today.
  Research + ADR first.
- **Inventory as capability-defined physical assets.** Inventory
  is massively under-engineered today; the original idea is to
  represent physical infrastructure (building → cable → switch
  port → MAC) but most use cases ignore it. Decomposing inventory
  into a capability set is a deep redesign.
- **Plug-in CLI discovery layer (C above).** Roadmap item;
  explicitly named as the fix for the "too many disconnected
  CLIs" cohesion problem.
- **`Application features` ↔ `capabilities` relationship.**
  In-progress concept the project lead is personally unsure
  about. Don't try to resolve in this ADR.
- **Concrete smoke-test contract shape.** See principle 4 —
  principle locked, implementation deferred. Today's e2e test
  suite plays the role of the smoke test until the trait/struct
  shape is decided.

## Consequences

- The current `harmony-fleet-e2e/src/stack.rs` (introduced in this
  PR) is **wrong** and gets rewritten as a Score composer with no
  inline k8s manifests.
- `harmony::modules::fleet::operator::*` and any other fleet-deploy
  modules in `harmony` core move into `fleet/harmony-fleet-deploy`.
  Callers (`examples/fleet_e2e_demo`, the operator binary itself)
  get updated paths.
- New Scores (`FleetAgentScore`, `FleetNatsScore`) land in
  `fleet/harmony-fleet-deploy`. The agent crate gains nothing —
  it stays a lean runtime binary.
- The deploy crate gets its own `main.rs` driven by `harmony_cli`,
  exposing one subcommand per component plus an `all` composite.
- Future work (smoke-test contract, Score derive macro, registry,
  CLI discovery) gets dedicated ADRs/PRs and does **not** sneak
  into unrelated work.

## References

- `CLAUDE.md` — Score-Topology-Interpret pattern, capability
  design rules.
- `docs/adr/002-hexagonal-architecture.md` — domain/adapter split
  this builds on.
- `docs/adr/005-rust-dsl-over-yaml.md` — the original "no
  YAML-mud-pit" call.
- `harmony_agent/deploy` — existing `*-deploy` crate pattern.
- `fleet/PLAN_requests_over_nats.md` — the working plan for the
  request/reply work this ADR landed during.