Some checks failed
Run Check Script / check (pull_request) Failing after 1m52s
183 lines
7.1 KiB
Markdown
183 lines
7.1 KiB
Markdown
# Architecture Decision Record: Fleet Platform — Capability Decomposition
|
||
|
||
Initial Author: Jean-Gabriel Gill-Couture (with research by Claude)
|
||
|
||
Initial Date: 2026-05-20
|
||
|
||
Last Updated Date: 2026-05-20
|
||
|
||
## Status
|
||
|
||
**Draft — under review.** Captures the proposed shape for review;
|
||
not yet locked. If accepted, supersedes the as-built layout of
|
||
`harmony/src/modules/fleet/` documented in ADR-023's first
|
||
revision.
|
||
|
||
## Context
|
||
|
||
The fleet platform shipped under `feat/iot-walking-skeleton`
|
||
spans three concerns that today share two locations:
|
||
|
||
1. **Domain logic** — what a `FleetDevice` is, what a
|
||
`FleetDeployment` looks like, what the reconciler-contracts
|
||
wire types mean.
|
||
2. **Adapters** — concrete NATS, Zitadel, Kubernetes, Helm
|
||
integrations that drive the domain.
|
||
3. **Deploy procedures** — how to bring up the operator, agent,
|
||
NATS, Zitadel as Scores against a Topology.
|
||
|
||
Today these live in `harmony/src/modules/fleet/` (mixed), the
|
||
`harmony-reconciler-contracts` crate (wire types only), the
|
||
`harmony-fleet-deploy` crate (Scores for deploy), and the
|
||
`harmony-fleet-operator`/`harmony-fleet-agent` binaries
|
||
(runtime). The boundary between domain and adapter is not
|
||
type-level: `harmony/src/modules/fleet/setup_score.rs` for
|
||
example reaches into Zitadel, NATS, Kube, and Helm directly.
|
||
Anyone wanting to swap NATS for a different transport would
|
||
touch every fleet file.
|
||
|
||
ADR-023 already addressed the *deploy*-side of this (deploy
|
||
Scores live in `*-deploy` crates, not in `harmony` core). This
|
||
ADR proposes the *domain*-side decomposition: pull a thin
|
||
fleet-domain crate above the existing reconciler-contracts, push
|
||
provider-specific code into adapter crates, and re-direct the
|
||
deploy crate to consume the domain rather than the framework
|
||
primitives directly.
|
||
|
||
## Decision (proposed)
|
||
|
||
Five crates, layered by dependency direction:
|
||
|
||
```
|
||
harmony-reconciler-contracts (existing — wire types only)
|
||
▲
|
||
│
|
||
harmony-fleet-domain (new — domain records + capability traits)
|
||
▲
|
||
│
|
||
harmony-fleet-adapters-* (new — one crate per provider)
|
||
▲ (nats, zitadel, kube)
|
||
│
|
||
harmony-fleet-deploy (existing — bring-up Scores)
|
||
harmony-fleet-operator (existing — daemon)
|
||
harmony-fleet-agent (existing — daemon)
|
||
```
|
||
|
||
### `harmony-fleet-domain`
|
||
|
||
The domain crate. Depends only on `harmony-reconciler-contracts`
|
||
and `harmony_types`. Holds:
|
||
|
||
- **Domain records**: `FleetDevice`, `FleetDeployment`,
|
||
`FleetState`, `EnrollmentIntent`, `DeviceCredential`.
|
||
- **Capability traits**: `DeviceRegistry`,
|
||
`DesiredStatePublisher`, `ObservedStateConsumer`,
|
||
`IdentityProvider`, `AgentLifecycle`. These are the seam
|
||
between domain logic and provider-specific implementations.
|
||
|
||
### `harmony-fleet-adapters-nats`, `-zitadel`, `-kube`
|
||
|
||
One crate per provider. Each implements the capability traits
|
||
above for its specific backend:
|
||
|
||
- `nats` — `NatsDeviceRegistry`, `NatsDesiredStatePublisher`,
|
||
`NatsObservedStateConsumer`.
|
||
- `zitadel` — `ZitadelIdentityProvider`, machine-user
|
||
provisioning, JWT-bearer minting.
|
||
- `kube` — `KubeFleetReflector` writes `Device` and
|
||
`Deployment` CRDs as a *reflection* of domain state, not as
|
||
the source of truth. CRD types move here from
|
||
`harmony-fleet-operator`.
|
||
|
||
### `harmony-fleet-deploy`
|
||
|
||
Stays as the home for `FleetOperatorScore`, `FleetAgentScore`,
|
||
`FleetNatsScore`, `FleetCalloutScore`. Updates: imports
|
||
`harmony-fleet-domain` for types, uses
|
||
`harmony-fleet-adapters-*` to compose Scores against capability
|
||
traits rather than reaching directly into NATS/Zitadel client
|
||
crates.
|
||
|
||
### Direction of dependency
|
||
|
||
The fleet *domain* doesn't depend on the framework. The
|
||
framework's *deploy procedures* depend on the fleet's domain.
|
||
Inversion of today's direction, where `harmony::modules::fleet`
|
||
imports from `harmony_secret`, `harmony_zitadel_auth`, NATS
|
||
client crates, kube client crates, etc.
|
||
|
||
After this ADR is implemented, `harmony::modules::fleet`
|
||
disappears entirely. `harmony` core stays focused on framework
|
||
primitives.
|
||
|
||
## Open questions
|
||
|
||
These are the decision points pending review — flagged so the
|
||
review has concrete pivots:
|
||
|
||
- **Q1.** Is `IdentityProvider` the right capability name, or
|
||
should we name the two distinct concerns separately
|
||
(`DeviceCredentialMinter`, `OperatorTokenProvider`)? CLAUDE.md
|
||
rule says "if reality has two distinct concerns, two
|
||
traits."
|
||
- **Q2.** Should the `Device` CRD exist at all, or should the
|
||
agent publish to a kube `Node` (per the alternative-D
|
||
direction)? Today's mid-ground (own CRD that mirrors `Node`)
|
||
arguably the worst of both worlds.
|
||
- **Q3.** Where does `ReconcileScore`'s adjacently-tagged enum
|
||
live? It's the canonical wire seam between operator and
|
||
agent. Should sit in `harmony-reconciler-contracts` (so both
|
||
binaries import only that crate); confirm before the move.
|
||
- **Q4.** Does this redesign block the v0.1 production push, or
|
||
does it land in v0.2 alongside the agent-upgrade work
|
||
(ADR-022)? Public API churn after a customer is on it is more
|
||
expensive than a 3-day delay before they are. Recommendation:
|
||
ship the redesign first.
|
||
- **Q5.** Where do runtime tools (the `harmony-fleet` CLI plugin,
|
||
the operator's frontend) sit in the dependency graph? If they
|
||
depend on `harmony-fleet-domain` only, they build without
|
||
pulling in helm/kube/ansible at compile time — which is also
|
||
the right shape for the device-side enrollment binary
|
||
(currently feature-gated).
|
||
|
||
## Out of scope
|
||
|
||
- **Alternative D (kube-native devices).** A future v2.0
|
||
destination, not v0.1 or v0.2 work. Captured as the long-term
|
||
direction; the capability traits in this ADR are the
|
||
intentional seam that makes the migration possible later.
|
||
- **Topology decomposition.** Whether `K8sBareTopology` /
|
||
`K8sAnywhereTopology` should themselves be capability sets is a
|
||
separate concern. Tracked as a working draft at
|
||
`docs/adr/drafts/topology-proliferation.md`.
|
||
|
||
## Consequences
|
||
|
||
If accepted:
|
||
|
||
- New deployable fleet components author their Scores against
|
||
capability traits in `harmony-fleet-domain`, not against
|
||
provider clients directly. Swapping NATS for a different
|
||
transport becomes a single-crate change.
|
||
- CRD types move out of operator code and into
|
||
`harmony-fleet-adapters-kube`. Operator depends on adapter
|
||
crate; runtime binary stays slim.
|
||
- `harmony` core has no fleet code. The framework's `modules/`
|
||
directory is reserved for general-purpose primitives (DNS,
|
||
K8s, Helm, NATS, PostgreSQL, …); domain-specific code lives
|
||
in its own crate tree.
|
||
- Future fleet adapters (a different transport, a different
|
||
identity provider) are additive: one new crate, no changes to
|
||
domain or deploy.
|
||
|
||
## References
|
||
|
||
- `ROADMAP/fleet_platform/architecture_review.md` §§4–5 —
|
||
comparison matrix and Alternative-B rationale from which this
|
||
ADR is extracted.
|
||
- `docs/adr/023-deploy-architecture.md` — companion ADR for the
|
||
deploy-side rules. This ADR is the domain-side companion.
|
||
- `docs/adr/022-fleet-agent-upgrade.md` — the agent-upgrade
|
||
procedure, which sits cleanly on top of the
|
||
`AgentLifecycle` capability proposed here.
|