The previous e2e harness handrolled k8s manifests in `stack.rs`,
bypassing the Score-Topology-Interpret machinery harmony exists to
provide. This commit:
1. **ADR-023** codifies the rules: deploy with Scores (not
manifests), e2e uses the same Scores as production, one Score
per component, deploy blocks on smoke-test success, deploy logic
lives in `*-deploy` crates, topologies are compile-time,
thiserror over anyhow. CLAUDE.md mirrors the principles.
2. **New `fleet/harmony-fleet-deploy` crate** is the canonical home
for fleet-component Scores:
- `FleetOperatorScore` + helm-chart generator + `install_crds`
moved out of `harmony::modules::fleet::operator` (they should
never have lived in `harmony` core). `FleetServerScore`
(composite of NATS + operator + Zitadel + callout) moved too.
- New `FleetNatsScore` (preset over `NatsHelmChartScore` with
fleet's required values; v1 supports `UserPass` auth, callout
mode reserved on the public API for PR 1.5).
- New `FleetAgentScore` with `FleetAgentTarget::Pod`; `Vm`
target is a future variant that absorbs `FleetDeviceSetupScore`.
- `harmony-fleet-deploy` binary built on the existing
`harmony_cli` crate — no new CLI scaffolding.
3. **Operator runtime binary trimmed**: `Install` and `Chart`
subcommands removed; both jobs now belong to
`harmony-fleet-deploy`. The runtime binary becomes leaner.
4. **E2E harness rewritten** as a thin Score composer:
`harmony-fleet-e2e/src/stack.rs` deploys the stack via
`FleetNatsScore` + `FleetAgentScore`. The inline NATS manifest
factory and the bespoke agent Pod renderer are gone.
- Bring-up runs once per test binary via `shared_stack` +
`tokio::sync::OnceCell` (matches the `fleet_e2e_demo` pattern).
- Stale `e2e-*` namespaces from prior runs get pruned at
startup so the leaks the OnceCell creates don't compound.
5. **`thiserror` for the agent's `CommandServer`** — replaces the
anyhow-based surface with typed `CommandError` /
`CommandServerError`.
6. **Memory** captures eight load-bearing principles (saved to
`~/.claude/projects/.../memory/`) so future sessions don't drift
back into manifest-handrolling.
Verified: `cargo test -p harmony-fleet-e2e --test ping` green
end-to-end against k3d in 25s warm.
193 lines
8.2 KiB
Markdown
193 lines
8.2 KiB
Markdown
# Architecture Decision Record: Deploy Architecture — Scores, Deploy Crates, and the E2E Contract
|
|
|
|
Initial Author: Jean-Gabriel Gill-Couture
|
|
|
|
Initial Date: 2026-05-18
|
|
|
|
Last Updated Date: 2026-05-18
|
|
|
|
## Status
|
|
|
|
Accepted. Enforces the principles already documented in
|
|
`CLAUDE.md` (Score-Topology-Interpret); this ADR adds the *deploy*
|
|
side of the contract (what a deploy crate is, how e2e harnesses
|
|
relate to production deploys, how the CLI surface is shaped) and
|
|
the smoke-test-on-deploy semantics.
|
|
|
|
## Context
|
|
|
|
The harmony codebase has drifted in three related ways that this
|
|
ADR exists to stop:
|
|
|
|
1. **Test harnesses handrolling k8s manifests.** Recent e2e harness
|
|
work (`fleet/harmony-fleet-e2e/src/stack.rs`, since reverted in
|
|
this PR) inlined `Deployment` / `Service` / `ConfigMap` structs
|
|
via `k8s_openapi::api::*`. That's the YAML-mud-pit anti-pattern
|
|
harmony exists to eliminate, dressed up in Rust. The fact that
|
|
e2e bypassed Scores while production used them meant e2e and
|
|
prod could diverge silently.
|
|
|
|
2. **Deploy logic scattered between `harmony` core and example
|
|
crates.** `FleetOperatorScore` lives in `harmony/src/modules/
|
|
fleet/operator/`, but its real "how to apply this end-to-end"
|
|
logic ended up duplicated across `examples/fleet_e2e_demo/src/
|
|
lib.rs::deploy_operator` and the operator's own `Chart`
|
|
subcommand. There's no single place a developer goes to deploy
|
|
the fleet operator.
|
|
|
|
3. **`deploy` returning before convergence.** Today `Outcome::
|
|
SUCCESS` means "apply submitted", same as `helm install`. Users
|
|
are left to figure out whether the result actually works. That's
|
|
exactly the user-experience problem harmony was built to fix.
|
|
|
|
The pattern below is the cleanup.
|
|
|
|
## Decision
|
|
|
|
Nine principles, grouped.
|
|
|
|
### Deployment as Scores
|
|
|
|
1. **Deploy with Scores, not handrolled manifests.** Capability
|
|
traits + compile-time bounds are the contract. No
|
|
`k8s_openapi::api::*` structs outside of `Score::interpret`
|
|
bodies. Test harnesses, examples, and CLI helpers compose
|
|
`*Score` types — they never reimplement deploys.
|
|
|
|
2. **E2E uses the same Scores as production.** Only the
|
|
`Topology` instance changes (local k3d, remote OKD, bare-metal
|
|
HA, …). A test harness is a `Score`-composer running against a
|
|
test Topology. If e2e needs something prod doesn't, add the
|
|
knob to the Score — don't fork the manifest in the harness.
|
|
|
|
3. **One Score per deployable component.** Composition is the
|
|
user-facing primitive: `MyAppScore` pulls in `PostgresScore`,
|
|
`HttpServerScore`, etc. Don't build monolithic "deploy
|
|
everything" Scores. Each primitive Score must be independently
|
|
testable and substitutable.
|
|
|
|
4. **Deploy returns only after smoke-test success.** Every Score
|
|
owns a readiness + smoke-test contract that the framework runs
|
|
and blocks on. Convergence errors must be actionable, in the
|
|
style of `rustc`'s error messages, not "exit code 1 from helm".
|
|
The *implementation* of the smoke-test contract (separate
|
|
trait? required Score method? companion struct?) is left open
|
|
for a follow-up ADR; the principle is locked in.
|
|
|
|
### Where deploy logic lives
|
|
|
|
5. **Deploy logic lives in a `*-deploy` crate** that depends on
|
|
both `harmony` and the runtime crate. Runtime binaries (the
|
|
thing that ships to constrained devices and to in-cluster pods)
|
|
stay free of the `harmony` dep. Pattern already established by
|
|
`harmony_agent/deploy`.
|
|
|
|
For the fleet stack specifically: **one** `fleet/harmony-fleet-
|
|
deploy` crate holds every fleet-component Score
|
|
(`FleetOperatorScore`, `FleetAgentScore`, `FleetNatsScore`,
|
|
`FleetCalloutScore`). The same crate is consumed by:
|
|
- production CLI (`harmony-fleet-deploy <component>
|
|
--topology <name>`)
|
|
- the e2e harness (composes the Scores against a k3d Topology)
|
|
- whatever future control-plane / web tool drives deploys
|
|
|
|
Fleet Scores that currently live in `harmony/src/modules/fleet/`
|
|
are migrated into `harmony-fleet-deploy` — they should never
|
|
have been in harmony core. This is a one-shot move done in the
|
|
PR that introduces the deploy crate.
|
|
|
|
### Topology selection
|
|
|
|
6. **Topologies are compile-time, selected at runtime.** A deploy
|
|
binary statically lists its supported topologies; the user
|
|
picks one at deploy time. Adding a brand-new topology backend
|
|
(AWS, GCP) is a rebuild — acceptable cost, because dynamic-
|
|
discovery topologies like `K8sAnywhere` already cover "any
|
|
physical place that runs k8s". No `Box<dyn Topology>` plugin
|
|
loaders.
|
|
|
|
### Framework evolution
|
|
|
|
7. **Extend Scores with companions, not API changes.** New
|
|
capabilities the framework wants to attach to Scores (planning,
|
|
dry-run, observability, eventually smoke-test) default to a
|
|
*companion* type or trait that wraps a Score rather than a new
|
|
method on `Score`/`Interpret`. The base public API stays simple.
|
|
The exception is principles every Score must honor (which may
|
|
force a required method) — but only after the principle has
|
|
been validated in practice via the companion-first iteration.
|
|
|
|
### CLI
|
|
|
|
8. **CLI: hybrid, staged.** Today (B): first-party tools ship as
|
|
separate `harmony-*` binaries built on the existing
|
|
`harmony_cli` crate. Improve that surface. Tomorrow (C): a
|
|
top-level `harmony` binary discovers `harmony-*` plugin
|
|
binaries on `$PATH` (`kubectl`-style) so a third-party
|
|
`MyAppScore` author gets `harmony deploy my-app` for free. The
|
|
plugin protocol is **not** in scope for any current PR; it's
|
|
a dedicated future effort.
|
|
|
|
### Error handling
|
|
|
|
9. **thiserror almost everywhere; anyhow only at binary glue.**
|
|
Library code, public crate boundaries, anything callers might
|
|
want to match on — typed errors via `thiserror`. `anyhow` is
|
|
reserved for `main.rs`-level glue where the error is just
|
|
printed. This was the second drift this PR uncovered.
|
|
|
|
## Out of scope (deferred, not rejected)
|
|
|
|
- **Score derive macro / deployment DSL.** Strategic intent from
|
|
day one; the framework's value-add concentrates here. Separate
|
|
design effort.
|
|
- **Score registry** (Crichton-style:
|
|
<https://willcrichton.net/rust-api-type-patterns/registries.html>).
|
|
Real itch — examples and Scores are hard to discover today.
|
|
Research + ADR first.
|
|
- **Inventory as capability-defined physical assets.** Inventory
|
|
is massively under-engineered today; the original idea is to
|
|
represent physical infrastructure (building → cable → switch
|
|
port → MAC) but most use cases ignore it. Decomposing inventory
|
|
into a capability set is a deep redesign.
|
|
- **Plug-in CLI discovery layer (C above).** Roadmap item;
|
|
explicitly named as the fix for the "too many disconnected
|
|
CLIs" cohesion problem.
|
|
- **`Application features` ↔ `capabilities` relationship.**
|
|
In-progress concept the project lead is personally unsure
|
|
about. Don't try to resolve in this ADR.
|
|
- **Concrete smoke-test contract shape.** See principle 4 —
|
|
principle locked, implementation deferred. Today's e2e test
|
|
suite plays the role of the smoke test until the trait/struct
|
|
shape is decided.
|
|
|
|
## Consequences
|
|
|
|
- The current `harmony-fleet-e2e/src/stack.rs` (introduced in this
|
|
PR) is **wrong** and gets rewritten as a Score composer with no
|
|
inline k8s manifests.
|
|
- `harmony::modules::fleet::operator::*` and any other fleet-deploy
|
|
modules in `harmony` core move into `fleet/harmony-fleet-deploy`.
|
|
Callers (`examples/fleet_e2e_demo`, the operator binary itself)
|
|
get updated paths.
|
|
- New Scores (`FleetAgentScore`, `FleetNatsScore`) land in
|
|
`fleet/harmony-fleet-deploy`. The agent crate gains nothing —
|
|
it stays a lean runtime binary.
|
|
- The deploy crate gets its own `main.rs` driven by `harmony_cli`,
|
|
exposing one subcommand per component plus an `all` composite.
|
|
- Future work (smoke-test contract, Score derive macro, registry,
|
|
CLI discovery) gets dedicated ADRs/PRs and does **not** sneak
|
|
into unrelated work.
|
|
|
|
## References
|
|
|
|
- `CLAUDE.md` — Score-Topology-Interpret pattern, capability
|
|
design rules.
|
|
- `docs/adr/002-hexagonal-architecture.md` — domain/adapter split
|
|
this builds on.
|
|
- `docs/adr/005-rust-dsl-over-yaml.md` — the original "no
|
|
YAML-mud-pit" call.
|
|
- `harmony_agent/deploy` — existing `*-deploy` crate pattern.
|
|
- `fleet/PLAN_requests_over_nats.md` — the working plan for the
|
|
request/reply work this ADR landed during.
|