Some checks failed
Run Check Script / check (pull_request) Failing after 52s
Working document for the architectural redesign of the fleet
platform before v0.1 ships to production. Captures four sections
of research:
§1 — Current state inventory. Markdown-bullet map of every public
type, score, trait, and module across `harmony/modules/fleet/`,
`harmony-reconciler-contracts`, and `fleet/harmony-fleet-*/`.
Sorted by domain meaning (identity, desired state, observed
state, setup, plumbing) rather than location, so the
cross-cutting concerns become visible. Includes a text "diagram"
of the dependency graph showing the two problematic edges:
runtime crates importing CRD types from the framework crate
(`harmony-fleet-operator` ← `harmony::modules::fleet::operator::crd`
verified at `controller.rs:37`, `device_reconciler.rs:21`,
`main.rs:9`) and the agent importing podman wire types from the
framework crate (`harmony-fleet-agent` ← `harmony::modules::podman`
verified at `main.rs:21-22`, `reconciler.rs:11`).
§2 — Theory review. Pulls principles from JG's *Pour l'amour des
compilateurs* talk (2026-04-30), its references (Crichton,
Feldman, Maguire, Goedecke, Fowler), and harmony's own load-bearing
ADRs (002 hexagonal, 003 infrastructure abstractions, 015 higher-
order topologies, 016 agent + global mesh, 018 template hydration).
Synthesizes eight design principles for the redesign — including
Goedecke's guardrail that "type-driven" ≠ "type-everything" so we
don't over-fit the cardinality argument.
§3 — Ten concrete shape problems (P1–P10), framed as cardinality
mismatches, leaky boundaries, and "is this resolved yet" branches
rather than bugs. P1 is the placement issue JG flagged in code
review; P2 is `FleetDeviceAuth`'s mixed resolved/unresolved
states; P10 is the credential-shape staircase across operator
workstation / operator pod / agent.
§4 — Five design alternatives, each scored against P1–P10:
A. Move + thin façade (conservative cleanup).
B. Resolved-only at boundaries + capability traits (principled
incremental).
C. Dataflow reframe (events in, state out).
D. Fleet as kube control plane, period (deliberately weird).
E. Algebra of fleets (deliberately mathematical).
A is too little, C/D/E are right-shape but wrong-timing for the
3-day window. B is the working recommendation, with explicit
awareness that D is the v2.0 destination and the capability
traits in B are the seam that lets us migrate without breaking
callers.
§5 sketches a concrete shape for B: new `harmony-fleet/` domain
crate with no framework dependency, `harmony-fleet-adapters-*`
crates for NATS/Zitadel/kube, the existing operator/agent/auth
crates wire adapters together, the framework's
`harmony::modules::fleet` collapses to a re-export module that
goes away by v0.2.
§6 — Five open questions for JG's review before locking the
choice. §7 — explicit "spike one slice, then commit or back out"
process so we don't lock the wrong shape.
Not an ADR yet. The ADR happens after JG agrees on which
alternative is the working hypothesis and the spike confirms the
shape feels better in code than on paper.
887 lines
40 KiB
Markdown
887 lines
40 KiB
Markdown
# Fleet platform — architecture review
|
||
|
||
Working document for the architectural redesign of the fleet platform
|
||
before v0.1 ships to production. Started 2026-05-07.
|
||
|
||
This is a research + design document, not a plan to execute. The
|
||
output of this work is an ADR (or set of ADRs) that lock the new
|
||
shape; the v0.2 roadmap will reference whichever option we pick.
|
||
|
||
## Why now
|
||
|
||
- Three days from production. No customers depend on the API yet
|
||
→ API/UX/DX is still cheap to change. After ship, every breaking
|
||
change costs us a week of customer-coordination overhead.
|
||
- The `harmony/modules/fleet/` placement is wrong — already flagged
|
||
in code review. The reasons it ended up there are subtle (cross-
|
||
module imports of `K8sAnywhereTopology`, `HelmChartScore`,
|
||
`K8sResourceScore`, `harmony_secret`, `Topology` capability
|
||
traits). Those need to be written down before the file move,
|
||
not after.
|
||
- The plumbing — NATS + Zitadel + auth callout + operator + agent
|
||
— is sound. Highly secure, scalable by design, low resource
|
||
footprint. The redesign is about **moving code** and **better
|
||
data structures**, not rebuilding mechanisms.
|
||
- The frame from JG's *Pour l'amour des compilateurs* talk:
|
||
cardinality-matched types, "make impossible states impossible",
|
||
expressive types as the deterministic feedback loop that scales
|
||
with LLM-era code generation throughput. Apply that frame here.
|
||
|
||
## Working plan
|
||
|
||
1. **Inventory.** Map every public type, trait, score, module, and
|
||
crate that participates in the fleet domain. Markdown-bullet
|
||
shape; no diagrams.
|
||
2. **Read the room.** Pull principles from JG's talk, its
|
||
references, and harmony's existing ADRs (002 hexagonal, 003
|
||
infrastructure abstractions, 015 higher-order topologies, 016
|
||
harmony agent + global mesh, 017 NATS interconnection, 018
|
||
template hydration). Note where the existing fleet design
|
||
already follows them and where it doesn't.
|
||
3. **Identify the design problems.** Not bugs — *shape* problems.
|
||
Cardinality mismatches, leaky boundaries, "is this resolved
|
||
yet" branches, location/dependency loops.
|
||
4. **Sketch alternatives.** Three to five. At least one
|
||
conventional cleanup, at least one out-of-the-box that
|
||
reframes the domain. Compare on the same axes (cardinality,
|
||
placement, ergonomics, extensibility).
|
||
5. **Pick (or recommend) one.** Land as ADR.
|
||
|
||
This document covers steps 1–4. The pick happens in conversation
|
||
with JG before the ADR.
|
||
|
||
---
|
||
|
||
## §1 — Current state inventory
|
||
|
||
### §1.1 — Where the code lives
|
||
|
||
The fleet domain spans **three concerns** that today live in
|
||
**three locations**:
|
||
|
||
- **Framework-side scoring** (what runs on the operator's
|
||
workstation when they `cargo run` the install) → lives in
|
||
`harmony/src/modules/fleet/`. This is the wrong home; it's the
|
||
thing this review is about moving.
|
||
- `mod.rs` — re-exports
|
||
- `assets.rs` — Ubuntu/Debian cloud image fetchers, libvirt SSH
|
||
keypair management
|
||
- `libvirt_pool.rs` — libvirt storage pool bring-up
|
||
- `setup_score.rs` (1053 LOC, the monster) — `FleetDeviceSetupScore`,
|
||
`FleetDeviceSetupConfig`, `FleetDeviceAuth`
|
||
(TomlShared|ZitadelJwt|ZitadelEnroll), `AdminAuth`, `HostsEntry`,
|
||
`merge_hosts_file`
|
||
- `vm_score.rs` — `ProvisionVmScore` (libvirt VM bring-up)
|
||
- `preflight.rs` — `check_fleet_smoke_preflight*` (host system
|
||
checks)
|
||
- `server.rs` — `FleetServerScore`, `FleetServerInterpret`
|
||
(composed bring-up of Zitadel + NATS + callout + operator)
|
||
- `operator/`
|
||
- `mod.rs`, `score.rs` — `FleetOperatorScore`,
|
||
`FleetOperatorInterpret` (operator helm install)
|
||
- `chart.rs` (453 LOC) — chart rendering (`ChartOptions`,
|
||
`OperatorCredentials`, `build_chart`, `operator_secret`,
|
||
`build_operator_deployment`, `build_cluster_role`)
|
||
- `crd.rs` — `Deployment` CRD type (`DeploymentSpec`,
|
||
`Rollout`, `RolloutStrategy`, `DeploymentStatus`,
|
||
`DeploymentAggregate`, `AggregateLastError`); `Device` CRD type
|
||
(`DeviceSpec`)
|
||
- **Cross-boundary wire types** (the "contract" agent and operator
|
||
both have to agree on) → lives in `harmony-reconciler-contracts/`.
|
||
- `fleet.rs` — `DeviceInfo`, `DeploymentState`, `HeartbeatPayload`,
|
||
`DeploymentName`, `InvalidDeploymentName`
|
||
- `kv.rs` — bucket name constants + key-builder functions
|
||
- `status.rs` — `Phase`, `InventorySnapshot`
|
||
- re-exports `harmony_types::id::Id`
|
||
- **Runtime binaries** (what runs in the cluster + on devices) →
|
||
lives in `fleet/`.
|
||
- `harmony-fleet-operator/` — the operator pod. `controller.rs`,
|
||
`device_reconciler.rs`, `fleet_aggregator.rs` (833 LOC),
|
||
`install.rs`, `main.rs`. Pulls `Deployment`/`Device` CRDs from
|
||
`harmony::modules::fleet::operator::crd` (cross-crate import
|
||
that should give us pause).
|
||
- `harmony-fleet-agent/` — the on-device daemon. `config.rs`,
|
||
`reconciler.rs`, `fleet_publisher.rs`, `main.rs`.
|
||
- `harmony-fleet-auth/` — JWT-bearer / NATS-credentials helpers
|
||
used by both the operator AND the agent. `config.rs`,
|
||
`credentials.rs` (553 LOC). Sits between contracts and the
|
||
runtime crates.
|
||
|
||
### §1.2 — Public types, sorted by domain meaning (not location)
|
||
|
||
#### Identity & devices
|
||
|
||
- `harmony_types::id::Id` — opaque, sortable, collision-safe
|
||
identifier. Used as device id, deployment id, …
|
||
- `DeploymentName` (newtype with validation, `harmony-reconciler-contracts`)
|
||
- `DeviceInfo` — heartbeat payload that materializes into a
|
||
`Device` CR
|
||
- `DeviceSpec` — kube CRD, holds an optional `InventorySnapshot`
|
||
- `InventorySnapshot` — hardware/OS facts published once at
|
||
registration
|
||
|
||
#### Deployment desired-state
|
||
|
||
- `DeploymentSpec` — kube CRD: `target_selector: LabelSelector`,
|
||
`score: ReconcileScore`, `rollout: Rollout`
|
||
- `ReconcileScore` (in `harmony::modules::podman`, re-exported
|
||
from `harmony::modules::fleet::operator::crd`) — externally-tagged
|
||
enum, today only `PodmanV0(PodmanV0Score)`
|
||
- `PodmanV0Score`, `PodmanService`, `EnvVar`, `VolumeMount`,
|
||
`RestartPolicy`
|
||
- `Rollout`, `RolloutStrategy::Immediate`
|
||
|
||
#### Deployment observed-state
|
||
|
||
- `DeploymentState` — what the agent publishes per device per
|
||
deployment after reconcile
|
||
- `DeploymentStatus` (kube CRD) — operator-side rollup of all
|
||
device states for one Deployment CR
|
||
- `DeploymentAggregate` — counts (matched, succeeded, failed,
|
||
pending) + `last_error: Option<AggregateLastError>`
|
||
- `Phase` — `Pending | Running | Failed`
|
||
|
||
#### Authentication / identity provider
|
||
|
||
- `FleetDeviceAuth` — sum type with `TomlShared | ZitadelJwt |
|
||
ZitadelEnroll`. **The `ZitadelEnroll` arm carries
|
||
unresolved-state — admin credentials that must be turned into a
|
||
device JSON key at execute time. Mixes resolved and unresolved
|
||
states in one type, which is the cardinality bug we keep hitting.**
|
||
- `AdminAuth` — `Sso { client_id } | Token(String)` (used inside
|
||
`ZitadelEnroll`)
|
||
- `CredentialsSection` — TOML-on-disk shape (in
|
||
`harmony-fleet-auth`, parallel to `FleetDeviceAuth`)
|
||
- `CredentialSource` — runtime credential factory
|
||
- `NatsCredential` — what async-nats actually consumes
|
||
- `MachineKeyFile`, `CachedToken`
|
||
|
||
#### Setup procedures (Scores)
|
||
|
||
- `FleetDeviceSetupScore` (`FleetDeviceSetupConfig`) — the workhorse:
|
||
installs podman, drops the agent binary, drops the credentials
|
||
TOML, drops the keyfile, brings up the systemd unit.
|
||
- `FleetServerScore` — orchestrates Zitadel install + identity
|
||
setup + NATS install + callout install + operator install. Wraps
|
||
five other scores.
|
||
- `FleetOperatorScore` — operator helm chart render + install + the
|
||
credentials Secret apply.
|
||
- `ProvisionVmScore` — libvirt VM bring-up. Used by VM rehearsals.
|
||
- (External, not in fleet/) `ZitadelScore`, `ZitadelSetupScore`,
|
||
`NatsK8sScore`, `NatsAuthCalloutScore` — all consumed by the
|
||
composed install.
|
||
|
||
#### Operator-internal types
|
||
|
||
- `FleetState`, `SharedFleetState`, `DeploymentKey`, `DevicePair`,
|
||
`CachedDeployment`, `Context`, `Error` (the controller's local
|
||
error type), `selector_matches`, `apply_state`, `drop_state`,
|
||
`compute_aggregate`
|
||
|
||
#### Agent-internal types
|
||
|
||
- `AgentConfig`, `AgentSection`, `NatsSection`, `CredentialsSection`
|
||
- `FleetPublisher`, `Reconciler`
|
||
|
||
#### Fleet plumbing for development
|
||
|
||
- `FleetSshKeypair`, the cloud-image consts, `HarmonyFleetPool`,
|
||
`merge_hosts_file`, `HostsEntry`, `check_fleet_smoke_preflight*`
|
||
|
||
#### NATS subjects + KV buckets (the wire seam)
|
||
|
||
- `BUCKET_DESIRED_STATE` = `"desired-state"`
|
||
- `BUCKET_DEVICE_INFO` = `"device-info"`
|
||
- `BUCKET_DEVICE_STATE` = `"device-state"`
|
||
- `BUCKET_DEVICE_HEARTBEAT` = `"device-heartbeat"`
|
||
- Key builders: `desired_state_key(device_id, deployment_name)`,
|
||
`device_info_key(device_id)`, `device_state_key(device_id,
|
||
deployment_name)`, `device_heartbeat_key(device_id)`
|
||
|
||
### §1.3 — Concept clusters
|
||
|
||
When you squint at the inventory, the domain falls into **five
|
||
clusters**:
|
||
|
||
1. **Identity** — who is this device, who is this deployment, who
|
||
is the operator, what auth do they have.
|
||
2. **Desired state** — what should be running where.
|
||
3. **Observed state** — what is actually running where.
|
||
4. **Setup** — bringing all this into existence on a fresh
|
||
cluster + fresh device.
|
||
5. **Plumbing** — the NATS/kube/Zitadel mechanisms that make 1–4
|
||
work.
|
||
|
||
The current code does not cleanly separate these. Examples:
|
||
|
||
- `setup_score.rs` mixes **Setup** (drop binary, run systemd) with
|
||
**Identity** (`FleetDeviceAuth`). 1053 LOC.
|
||
- `FleetDeviceAuth` mixes resolved-Identity (`ZitadelJwt` —
|
||
here's a key) with Setup-time-Identity-resolution-intent
|
||
(`ZitadelEnroll` — here's how to mint a key).
|
||
- The chart-render helpers (`build_operator_deployment`, etc.) are
|
||
`pub` from `harmony::modules::fleet::operator::chart` so the
|
||
composed-install scores can pluck the secret out before helm
|
||
install. Plumbing leaking through Setup.
|
||
- `harmony::modules::fleet::operator::crd::DeploymentSpec` is the
|
||
CRD definition AND it's the type the operator daemon imports to
|
||
reconcile. Cross-crate import from a runtime crate
|
||
(`harmony-fleet-operator`) into a framework crate (`harmony`).
|
||
This is the placement bug.
|
||
|
||
### §1.4 — The shape problem in one diagram (text)
|
||
|
||
```
|
||
framework/operator workstation
|
||
│
|
||
harmony::modules::fleet ──┤ Scores: FleetServerScore, FleetDeviceSetupScore,
|
||
│ FleetOperatorScore, ProvisionVmScore
|
||
│ CRD types: Deployment, Device, DeploymentSpec, ...
|
||
│ Chart rendering helpers (operator/chart.rs)
|
||
│
|
||
harmony-reconciler-contracts ── wire types: DeviceInfo, DeploymentState,
|
||
│ HeartbeatPayload, KV constants
|
||
│ ▲ ▲
|
||
│ │ │
|
||
│ │ imports imports│
|
||
│ │ │
|
||
fleet/harmony-fleet-agent fleet/harmony-fleet-operator
|
||
▲ ▲
|
||
│ │
|
||
│ ALSO imports ALSO imports│
|
||
│ from harmony::modules:: from harmony::modules::
|
||
│ podman (PodmanV0Score) fleet::operator::crd
|
||
```
|
||
|
||
Two problematic edges:
|
||
|
||
1. `harmony-fleet-operator` imports `harmony::modules::fleet::operator::crd::Deployment`. The runtime daemon depends on the framework crate just for CRD type definitions.
|
||
2. `harmony-fleet-agent` imports `harmony::modules::podman::{PodmanV0Score, PodmanTopology, ReconcileScore}`. The agent depends on the framework crate's *podman module* for the score it deserializes off the wire.
|
||
|
||
Both edges should run *through* `harmony-reconciler-contracts`, not around it. That's the placement bug surfaced.
|
||
|
||
---
|
||
|
||
## §2 — Theory review
|
||
|
||
### §2.1 — From the talk
|
||
|
||
Pulling the load-bearing principles, ranked by relevance to this
|
||
redesign:
|
||
|
||
1. **Cardinality matters.** Types should match the cardinality of
|
||
the real-world concept. `&str` for "primary color" admits
|
||
infinite invalid inputs; `enum { Red, Yellow, Blue }` admits
|
||
exactly three. Friction is proportional to mismatch.
|
||
2. **Make impossible states impossible.** Don't comment the
|
||
constraint, code it. Push runtime errors to the design phase.
|
||
3. **Representations matter.** Same data, different shapes ↔
|
||
different operations are cheap. Roman numerals ↔ addition; Arabic
|
||
↔ multiplication. "An API is a computational representation of
|
||
real-world concepts."
|
||
4. **The compiler is a deterministic feedback channel.** In an era
|
||
when LLMs generate code at 5–10K LOC/day, the only sensor that
|
||
keeps up runs in milliseconds and is deterministic. Lean on it.
|
||
5. **Strong types reduce code volume + test boilerplate + token
|
||
waste + review burden + CI time + production incidents** — and
|
||
*increase* refactoring confidence and velocity-over-time. The
|
||
bet is asymmetric.
|
||
|
||
### §2.2 — From the references
|
||
|
||
Grouping by what they imply for *this* redesign:
|
||
|
||
#### Will Crichton — *Type-Driven API Design* + *Rust API Type Patterns*
|
||
|
||
- **Typestate.** Encode "phase of an operation" in the type
|
||
parameter. A `ProgressBar<Bounded>` exposes `.with_eta()`; a
|
||
`ProgressBar<Unbounded>` doesn't. The contradictory call doesn't
|
||
compile.
|
||
- Direct application: **`FleetDeviceAuth` mixes phases.** The
|
||
`ZitadelEnroll` arm is unresolved, the `ZitadelJwt` arm is
|
||
resolved, the `TomlShared` arm doesn't even need resolution. A
|
||
typestate would model these as distinct types; only one of them
|
||
has `agent.write_to_disk()`.
|
||
|
||
#### Richard Feldman — *Making Impossible States Impossible*
|
||
|
||
- Slogan-as-tool. Look at every `Option<T>` and ask *"can two of
|
||
these be inconsistent at once?"* If yes, that's an impossible
|
||
state — refactor.
|
||
- Direct application: `FleetDeviceSetupConfig` has `auth:
|
||
FleetDeviceAuth` AND `agent_binary_path: PathBuf`. Today nothing
|
||
prevents `auth = TomlShared` (no Zitadel) with
|
||
`agent_binary_path` pointing at the wrong-arch binary. We could
|
||
encode the agent binary's target arch as a typestate parameter
|
||
and refuse to deploy to a device with a known-different arch
|
||
inventory.
|
||
|
||
#### Sandy Maguire — *Protos Are Wrong*
|
||
|
||
- Protocol buffers throw away information real type systems
|
||
preserve. Sum types, exhaustiveness, parametric polymorphism,
|
||
Maybe/Result — protos can't express any of them precisely. The
|
||
"loose contract" sells you weak invariants.
|
||
- Direct application: `harmony-reconciler-contracts` is JSON-shaped
|
||
at the wire (matched on `type` tag for `ReconcileScore`).
|
||
We're already paying the proto-class tax: any new variant
|
||
requires both ends to know about it; the wire format doesn't
|
||
enforce a schema; old agents see new variants as parse errors.
|
||
This is an honest constraint — wire formats need to be permissive
|
||
by design — but it argues for keeping the **wire types small and
|
||
obviously evolvable** while letting in-memory types be
|
||
cardinality-matched.
|
||
|
||
#### Sean Goedecke — *Invalid States*
|
||
|
||
- The skeptic's case: making impossible states impossible *can be
|
||
over-applied*. Sometimes a `String` is the right cardinality
|
||
even when an enum exists, because the enum binds you to a
|
||
closed world.
|
||
- Direct application: **Don't make `device_id` a closed enum.**
|
||
The newtype + RFC1123 validation we just added is the right
|
||
cardinality match: it's a string-like, but only valid strings.
|
||
Over-modeling would have us build `enum DeviceId {
|
||
Pi(PiSerial), Vm(VmName), …}` — closed world, breaks first time
|
||
a customer plugs in an x86 box.
|
||
- Useful guardrail: **type-driven** ≠ **type-everything**. The
|
||
question to ask each time is "what's the cardinality of this
|
||
concept in reality" — not "can I model this".
|
||
|
||
#### Martin Fowler — *Harness Engineering* (April 2026)
|
||
|
||
- Computational sensors (compilers, type checkers, linters) over
|
||
inferential ones (tests, code review). Compiler runs on every
|
||
change; tests don't.
|
||
- Direct application: prefer compiler-checked invariants over
|
||
doc-comment invariants. If the docs say "this Score's `auth`
|
||
field must be resolved at the call site of `execute()`", the
|
||
compiler should enforce it.
|
||
|
||
### §2.3 — From harmony's own ADRs
|
||
|
||
Reading the existing ADRs *as design language already in use* —
|
||
what vocabulary should the new fleet shape stay consistent with?
|
||
|
||
#### ADR-002 (hexagonal architecture)
|
||
|
||
- "Domain isolated from adapters." Domain types own the
|
||
vocabulary; adapters (k8s client, NATS, helm) translate at the
|
||
edge.
|
||
- **Implication for fleet:** the *domain* is identity + desired
|
||
state + observed state. The *adapters* are NATS-KV, kube-CRD,
|
||
helm-chart, ansible-over-SSH. The current
|
||
`harmony::modules::fleet` mixes both. Pulling adapters out is the
|
||
refactor.
|
||
|
||
#### ADR-003 (infrastructure abstractions)
|
||
|
||
- "Abstractions at domain level, not provider level. `DnsServer`
|
||
not `OPNsenseDns`."
|
||
- **Implication for fleet:** capability traits like
|
||
`DeviceRegistry`, `DesiredStatePublisher`, `ObservedStateConsumer`
|
||
— each a standard infrastructure need that NATS-KV happens to
|
||
fulfill today, that another transport (gRPC streaming, MQTT,
|
||
Redis streams) could fulfill tomorrow.
|
||
|
||
#### ADR-015 (higher-order topologies)
|
||
|
||
- Higher-order topologies (`FailoverTopology<T>`,
|
||
`DecentralizedTopology<T>`) compose via blanket trait impls.
|
||
`T: PostgreSQL` ⇒ `FailoverTopology<T>: PostgreSQL`. Zero
|
||
boilerplate.
|
||
- **Implication for fleet:** `FleetTopology<T>` could compose with
|
||
a base `K8sTopology<T>` rather than being a parallel concept.
|
||
"A fleet is a thing that is *both* a kube cluster *and* a
|
||
device registry."
|
||
|
||
#### ADR-016 (Harmony Agent + Global Mesh)
|
||
|
||
- Agents are processes that observe + reconcile per a desired
|
||
state published into a NATS mesh. Mesh is the reliable hop;
|
||
agents are stateless processors at the edge.
|
||
- **Implication for fleet:** the IoT fleet is a *specialization*
|
||
of the agent + mesh ADR — devices are agents, the operator is
|
||
a coordinator. The fleet domain types should fit ADR-016's
|
||
vocabulary, not invent a parallel one.
|
||
|
||
#### ADR-017 (NATS clusters interconnection)
|
||
|
||
- Trust topology: per-cluster account isolation, gateway-mediated
|
||
cross-cluster traffic. Per-device permissions are a
|
||
specialization of per-account.
|
||
- **Implication for fleet:** the auth callout's per-device permission
|
||
templates should compose with the cluster-interconnection
|
||
account model — currently they're treated as orthogonal, which
|
||
is fine until we actually cross fleets.
|
||
|
||
#### ADR-018 (template hydration)
|
||
|
||
- Hydrating templates at the edge of the framework, not in the
|
||
middle. Same pattern as our generated chart YAML: render once,
|
||
apply via typed code.
|
||
- **Implication for fleet:** chart-rendering helpers
|
||
(`build_operator_deployment` et al.) are template-hydration
|
||
edges. They *should* be hidden from domain code. Today they're
|
||
`pub` — visible to consumers like `fleet_staging_install` who
|
||
reach in and grab `operator_secret(opts)`. That's adapter
|
||
leakage.
|
||
|
||
### §2.4 — Synthesis: principles for the redesign
|
||
|
||
A short list, ordered. Each line is something the new shape
|
||
should satisfy:
|
||
|
||
1. **Domain types in `harmony-reconciler-contracts` (or a sibling
|
||
crate)**, with no dependency on `harmony` framework types.
|
||
2. **Resolved types only at the API surface.** Pre-resolution
|
||
intent is a separate type, used only by the resolver.
|
||
3. **Capabilities as traits**, not concrete types. `DeviceRegistry`,
|
||
`DesiredStatePublisher`, etc. The NATS-backed impl is one of
|
||
several allowed.
|
||
4. **Closed cardinality where reality is closed; open where reality
|
||
is open.** Goedecke's check, not Feldman's.
|
||
5. **Higher-order topology, not parallel topology.** A fleet is a
|
||
`FleetTopology<T>` over a base K8s topology, not a separate
|
||
capability hierarchy.
|
||
6. **Adapters hidden behind capabilities.** Helm chart rendering,
|
||
k8s resource apply, NATS subjects — none of these surface from
|
||
the fleet's public API.
|
||
7. **No yaml in framework code paths.** Existing principle from
|
||
v0_1; keep.
|
||
8. **Keep wire types minimal + permissive.** Not because they're
|
||
the canonical model, but because they're the
|
||
evolvability seam (Maguire's protos critique applies in
|
||
reverse — *embrace* the loose contract on the wire, *reject* it
|
||
in-memory).
|
||
|
||
---
|
||
|
||
## §3 — Design problems with the current shape
|
||
|
||
Concrete issues the redesign needs to fix. Not "bugs" — *shape*
|
||
problems. Each numbered so we can refer back when comparing
|
||
alternatives.
|
||
|
||
- **P1. `harmony/modules/fleet/` is in the wrong crate.** It pulls
|
||
framework dependencies (`HelmChartScore`, `K8sResourceScore`,
|
||
`K8sAnywhereTopology`, `harmony_secret`, etc.) and the runtime
|
||
daemons import *from it*. This makes the operator/agent depend
|
||
transitively on every harmony module — including the OPNsense
|
||
XML codegen, OKD bootstrap stuff, etc. Compile times suffer; the
|
||
release surface is wrong (you can't `cargo install
|
||
harmony-fleet-operator` without all of harmony).
|
||
- **P2. `FleetDeviceAuth` mixes resolved + unresolved states.**
|
||
`ZitadelEnroll` is pre-resolution intent; `ZitadelJwt` is
|
||
post-resolution credential. A single match arm has to handle
|
||
both. The "render TOML for both" hack we wrote works but is a
|
||
symptom — the TOML for an unresolved auth should be undefined,
|
||
not "same as resolved".
|
||
- **P3. `setup_score.rs` is 1053 LOC monolith.** Eight responsibilities
|
||
in one file: ssh-vs-local connection, ansible orchestration,
|
||
systemd unit text, hosts-file merging, podman package install,
|
||
fleet-agent user provisioning, keyfile writing, agent restart.
|
||
Readability is poor; testability is per-orchestration not
|
||
per-step.
|
||
- **P4. CRD types live in framework crate.** `Deployment` and
|
||
`Device` CRDs are defined in
|
||
`harmony::modules::fleet::operator::crd`. The runtime operator
|
||
crate (`harmony-fleet-operator`) imports them from there. This
|
||
is the most visible symptom of P1.
|
||
- **P5. `ReconcileScore` polymorphism is anemic.** Today there's
|
||
exactly one variant, `PodmanV0`. The wire format is set up for
|
||
evolution but no second variant exists, and the cross-crate
|
||
import from `harmony::modules::podman` makes adding one
|
||
expensive (re-export dance).
|
||
- **P6. Adapter leakage from chart rendering.**
|
||
`build_operator_deployment`, `operator_secret`, `build_chart`
|
||
are `pub`. Consumers in `examples/` reach in to compose helm
|
||
releases by hand. Domain code should not see "what does the
|
||
operator's helm chart look like".
|
||
- **P7. Composed scores wrap composed scores wrap composed scores.**
|
||
`FleetServerScore` wraps {ZitadelScore, ZitadelSetupScore,
|
||
NatsK8sScore, NatsAuthCalloutScore, FleetOperatorScore}. Each
|
||
of those does its own k8s resource apply + helm install.
|
||
Failure modes are deep: a problem in one score's interpret
|
||
surfaces wrapped through five layers of "context()". Hard to
|
||
debug; hard to reason about ordering.
|
||
- **P8. Topology assumptions are everywhere.** Every `Score`
|
||
bound is a hand-rolled union of capability traits — `T:
|
||
Topology + HelmCommand + K8sclient + TlsRouter + 'static`. Add
|
||
a new capability and every callsite has to be updated. Higher-
|
||
order topology composition (ADR-015) would let us name "a
|
||
thing that is a fleet-capable cluster" once.
|
||
- **P9. `Id` is overloaded.** Same type for device IDs, machine
|
||
user IDs, deployment IDs, topology names. Newtype-ing each
|
||
would catch arg-order swaps at compile time.
|
||
- **P10. Configuration is a staircase.** Operator workstation has
|
||
`ZitadelClientConfig` cache file. Operator pod has env-var-from-
|
||
Secret. Agent has TOML on disk. Three different shapes for
|
||
fundamentally the same data (issuer URL, audience, key
|
||
material). Maguire's protos critique applies internally — we're
|
||
using *several* loose-contract serializations of the same
|
||
domain object.
|
||
|
||
---
|
||
|
||
## §4 — Design alternatives
|
||
|
||
Five sketches. The first three are increasingly principled
|
||
cleanups; the last two are deliberately weird, included to force
|
||
us to recognize where the *core* of the domain actually is.
|
||
|
||
For each: one paragraph of premise, the resulting top-level types,
|
||
how it answers each of P1–P10 (✓ / ✗ / partial), and the
|
||
honest pros + cons.
|
||
|
||
### Alternative A — Move + thin façade (the conservative cleanup)
|
||
|
||
**Premise:** the existing types are mostly right; the location is
|
||
wrong and the façade leaks. Move `harmony/modules/fleet/` to
|
||
`fleet/harmony-fleet/`. Re-export only what's intended public.
|
||
Don't redesign types.
|
||
|
||
**Top-level types:** unchanged. `FleetDeviceSetupScore`,
|
||
`FleetServerScore`, `FleetOperatorScore`, `FleetDeviceAuth`,
|
||
`AdminAuth`, `Deployment` CRD, `Device` CRD. Same shapes, new
|
||
location.
|
||
|
||
**P1 ✓** (location fix is the goal). **P2 ✗** (auth still mixes
|
||
resolved/unresolved). **P3 ✗** (monolith preserved). **P4 ✓**
|
||
(CRDs co-located with operator). **P5 ✗**. **P6 partial** (we
|
||
can `pub(crate)` the chart helpers but the underlying coupling
|
||
remains). **P7 ✗**. **P8 ✗**. **P9 ✗**. **P10 ✗**.
|
||
|
||
**Pros:** small, safe, mechanical. Two days of work. No customer-
|
||
visible breakage. Unblocks P4 cleanup naturally.
|
||
|
||
**Cons:** doesn't actually fix the shape. We'd be back here in
|
||
six weeks. JG's review already said this isn't enough. Not the
|
||
right answer for v0.1 timing — *would* be the right answer if
|
||
we'd already shipped to two customers and couldn't break their
|
||
code.
|
||
|
||
### Alternative B — Resolved-only at boundaries + capability traits (the principled cleanup)
|
||
|
||
**Premise:** Crichton's typestate + ADR-003's domain capabilities
|
||
applied to the existing shape. Split resolved vs. unresolved
|
||
auth into separate types. Define capability traits for the
|
||
adapters. Move into the right crate. **No wholesale rewrite.**
|
||
|
||
**Top-level types:**
|
||
|
||
- New crate `harmony-fleet/` (sibling to `harmony-fleet-operator`,
|
||
-agent, -auth). Domain types live here.
|
||
- `FleetIdentity`, `FleetDevice`, `FleetDeployment` — domain
|
||
records. Plain data.
|
||
- `DeviceCredential` — *resolved* only (a JSON keyfile + issuer
|
||
URL + audience). Replaces `FleetDeviceAuth::ZitadelJwt`.
|
||
- `EnrollmentIntent` — pre-resolution. Carries `AdminAuth` and
|
||
what to mint. Method `resolve(&self) -> Result<DeviceCredential>`.
|
||
- `Score`s become small + single-responsibility:
|
||
- `EnrollDeviceScore` — runs `EnrollmentIntent::resolve` then
|
||
publishes to NATS.
|
||
- `InstallAgentScore` — drops binary + config + systemd unit.
|
||
Takes a `DeviceCredential`. Doesn't know about Zitadel.
|
||
- `InstallOperatorScore` — helm chart + Secret. Doesn't know
|
||
about devices.
|
||
- `BringUpFleetScore` — composes the above. Single layer of
|
||
composition, not five.
|
||
- Capability traits:
|
||
- `DeviceRegistry` — list/get/upsert/delete a `FleetDevice`.
|
||
Implementations: `NatsKvDeviceRegistry`,
|
||
(later) `RedisStreamsDeviceRegistry`.
|
||
- `DesiredStatePublisher`, `ObservedStateConsumer` — same
|
||
shape.
|
||
- `IdentityProvider` — mint a device credential, issue an
|
||
admin token. Today: Zitadel. Tomorrow: something else.
|
||
|
||
**P1 ✓ P2 ✓ P3 ✓** (split into 4–5 small Scores). **P4 ✓ P5 ✓**
|
||
(resolve in the runtime crate, contracts stay neutral).
|
||
**P6 ✓** (chart helpers `pub(crate)`, surfaced via `IdentityProvider`
|
||
+ `DeploymentReleaseManager` traits). **P7 ✓** (one composer,
|
||
not five). **P8 partial** (capability traits defined but bound
|
||
unions still get long). **P9 ✓** with newtypes. **P10 partial**
|
||
(still three on-disk shapes for credentials, but unified by
|
||
trait).
|
||
|
||
**Pros:** highest-leverage incremental redesign. Buys us most of
|
||
the principles without rebuilding plumbing. Customer-visible
|
||
breakage is contained to public API renames + import path
|
||
moves — no behavior change. Three days is realistic.
|
||
|
||
**Cons:** we still have a `Score`-shaped mental model where the
|
||
*unit of execution* is "a Score". If the right primitive turns
|
||
out to be smaller (an effect, an event, a capability call), this
|
||
choice wastes some leverage.
|
||
|
||
### Alternative C — The dataflow reframe (events in, state out)
|
||
|
||
**Premise:** the fleet platform is, in essence, a **stream
|
||
processor**. Events flow in (heartbeats, intent CR creates,
|
||
agent reconcile reports). State materializes out (Device CRs,
|
||
DeploymentAggregate counters, KV desired-state writes). Today
|
||
we model it imperatively as a series of `Score`s; the dataflow
|
||
shape is fighting that.
|
||
|
||
**Top-level types:**
|
||
|
||
- `FleetEvent` — sum type. `DeviceHeartbeat | DeviceFirstSeen |
|
||
DeploymentDesired | DeploymentObserved | DeploymentDeleted | …`
|
||
- `FleetStateSnapshot` — what the operator currently knows. Pure
|
||
data, derivable.
|
||
- `Reducer` — `(state, event) → state`. Pure function. Tests
|
||
trivially.
|
||
- `Effect` — sum type of side-effects the reducer wants done:
|
||
`WriteKv(bucket, key, value) | UpsertCr(cr) | EmitMetric(...)`.
|
||
Reducer returns `(new_state, Vec<Effect>)`.
|
||
- `EffectRunner` — adapter that performs effects. The only thing
|
||
that touches NATS / kube. One implementation per environment.
|
||
- The operator pod's main loop: `for event in stream { (state,
|
||
effects) = reduce(state, event); runner.run_all(effects) }`.
|
||
~50 lines.
|
||
|
||
**P1 ✓ P2 ✓ P3 ✓ P4 ✓ P5 ✓ P6 ✓ P7 ✓ P8 ✓** (capabilities
|
||
collapse into the `EffectRunner` trait). **P9 ✓ P10 partial**.
|
||
|
||
**Pros:** dramatically simpler operator code. Reducer is pure →
|
||
property-test-friendly. The dataflow is the platform. Aligns
|
||
with how Kafka / Materialize / Flink-class systems are
|
||
structured. Easy to add a new event type — the compiler shows
|
||
you every reducer arm to update.
|
||
|
||
**Cons:** large rewrite of the operator. Three days is
|
||
unrealistic. The current `fleet_aggregator.rs` (833 LOC) already
|
||
roughly does this but in a less disciplined shape — maybe the
|
||
incremental version of this is "make `apply_state` a real
|
||
reducer and split `compute_aggregate` into pure pieces". That's
|
||
more like Alternative B with extra discipline. The full effect-
|
||
typed version is a nice end-state but not a sprint goal.
|
||
|
||
**Cite:** Materialize's dataflow paper; Kent Beck's *Augmented
|
||
Coding* on factoring; Gergely Orosz on event-sourcing; the talk's
|
||
"good Lego bricks" framing applies — *events* are the bricks.
|
||
|
||
### Alternative D — The fleet as a **kube control plane**, period (deliberately weird)
|
||
|
||
**Premise:** strip the design to one observation. **A fleet is a
|
||
Kubernetes cluster whose Nodes happen to be devices, not
|
||
servers.** Stop modelling Devices and Deployments separately
|
||
from kube primitives. Use Kubernetes itself as the data model.
|
||
The operator is one CRD reconciler. NATS is just the transport
|
||
between the API server (in the cluster) and the device-side
|
||
kubelet-equivalent.
|
||
|
||
**Top-level types:**
|
||
|
||
- `Device` is a Node CR. Already exists; we stop wrapping it.
|
||
- `Deployment` is a `DaemonSet` (one pod per matching node) or a
|
||
`Deployment` (count: N targeted nodes). We stop inventing a
|
||
CRD; we use the standard one.
|
||
- `DeviceInfo` is the Node's `.status` (capacity, allocatable,
|
||
conditions). We stop publishing parallel data; we update
|
||
Node status from the agent's NATS messages.
|
||
- The agent on the device is a custom kubelet that speaks NATS to
|
||
the operator instead of HTTPS to the API server.
|
||
- The auth callout still exists; it gates NATS access.
|
||
- No `harmony-fleet-operator`-specific CRDs. No `Deployment` /
|
||
`Device` CRs of our own.
|
||
|
||
**P1 ✓ P2 ✓ P3 ✓ P4 N/A** (no CRDs of our own to misplace).
|
||
**P5 ✓ P6 ✓ P7 ✓ P8 ✓ P9 ✓ P10 ✓**.
|
||
|
||
**Pros:** the simplest *conceptual* answer. We stop fighting kube
|
||
+ inventing parallel concepts. Customers already understand
|
||
DaemonSets, Node selectors, and `kubectl get nodes`. The agent
|
||
becomes a known kind of thing (a kubelet variant) with shoulders
|
||
to stand on (k3s-iot, kine, virtual-kubelet projects already
|
||
prove this works).
|
||
|
||
**Cons:** *a lot* of plumbing changes. Devices need to register
|
||
as Nodes (which means either a real kubelet on each Pi, or a
|
||
virtual-kubelet façade). The agent's reconcile loop becomes
|
||
"watch a CR via NATS, render manifests, run pods" — bigger than
|
||
"watch a KV value, run podman". JetStream KV becomes redundant
|
||
with the kube API server. **Probably the right end-state for
|
||
v2.0, wrong for v0.1.** Worth noting, though, because comparing
|
||
A/B/C to D pulls out which of our current invented concepts are
|
||
load-bearing (very few — DeviceInfo is mostly just Node.status;
|
||
DeploymentAggregate is mostly just kube's
|
||
.status.observedGeneration / .status.conditions stuff).
|
||
|
||
**Cite:** virtual-kubelet, k3s-iot, KubeEdge, OpenYurt. They've
|
||
walked this path; the lessons are public.
|
||
|
||
### Alternative E — Algebra of fleets (deliberately weird, mathematical)
|
||
|
||
**Premise:** model the platform as a small algebra. A fleet is a
|
||
**set of devices** + an **assignment function** (selector → set
|
||
of deployments). Operations on fleets are set-theoretic +
|
||
function composition. Treat the API as a query language over
|
||
this algebra.
|
||
|
||
**Top-level types:**
|
||
|
||
- `Fleet` ::= `Set<Device>`. With operations: union, intersection,
|
||
filter-by-selector, partition.
|
||
- `Selector` ::= a pure predicate `Device → bool`. Built from
|
||
primitives `label("k") = "v"`, `arch = aarch64`, …, combined
|
||
with `&`, `|`, `!`.
|
||
- `Assignment` ::= `Selector → Set<Deployment>`. Pure function.
|
||
- `World` ::= `(Fleet, Assignment)`. Pure data. The operator's job
|
||
is to make reality match the World.
|
||
- `Diff(World, Reality) → Vec<Action>`. Pure function. Closed
|
||
form — given the algebra, you can prove what actions are
|
||
*necessary* and *sufficient*.
|
||
|
||
**P1–P10 ✓** (in principle). **Code volume probably 30% of
|
||
current.**
|
||
|
||
**Pros:** clarity. Properties become provable: "no device gets
|
||
an unassigned deployment", "removing a label removes the
|
||
assignment", "two operators can edit independently and the merge
|
||
is well-defined" (because functions compose). The "make
|
||
impossible states impossible" principle, applied to the *fleet
|
||
shape itself*, not to individual types.
|
||
|
||
**Cons:** **almost certainly an over-fit.** The real platform has
|
||
dirty edges (devices that fail, network partitions, half-applied
|
||
state) that don't sit naturally in a pure algebra. Most teams
|
||
that go down this road end up bolting "real-world" escape hatches
|
||
back on, ending up with the original design plus extra category
|
||
theory. **Useful as a north star** for the cardinality choices,
|
||
**not as the platform's actual shape.**
|
||
|
||
**Cite:** Hillel Wayne *Using Formal Methods at Work*; Conal
|
||
Elliott on functional reactive programming; the classic "set
|
||
theory for systems people" talks.
|
||
|
||
### Comparison matrix
|
||
|
||
| | A. Move | B. Capabilities | C. Dataflow | D. Kube-native | E. Algebra |
|
||
|---|---|---|---|---|---|
|
||
| Fixes P1 (location) | ✓ | ✓ | ✓ | ✓ | ✓ |
|
||
| Fixes P2 (auth states) | ✗ | ✓ | ✓ | ✓ | ✓ |
|
||
| Fixes P3 (monolith) | ✗ | ✓ | ✓ | ✓ | ✓ |
|
||
| Fixes P4 (CRD placement) | ✓ | ✓ | ✓ | N/A | N/A |
|
||
| Fixes P5 (anemic enum) | ✗ | ✓ | ✓ | N/A | partial |
|
||
| Fixes P6 (adapter leak) | partial | ✓ | ✓ | ✓ | ✓ |
|
||
| Fixes P7 (deep wrap) | ✗ | ✓ | ✓ | ✓ | ✓ |
|
||
| Fixes P8 (trait union) | ✗ | partial | ✓ | ✓ | ✓ |
|
||
| Fixes P9 (Id overload) | ✗ | ✓ | ✓ | ✓ | ✓ |
|
||
| Fixes P10 (config staircase) | ✗ | partial | partial | ✓ | partial |
|
||
| Fits 3-day window | ✓ | ✓ (tight) | ✗ | ✗ | ✗ |
|
||
| Customer-visible breakage | low | medium | medium | very high | high |
|
||
| Risk to demo schedule | very low | low | medium | very high | high |
|
||
| Long-term ceiling | low | high | high | very high | very high |
|
||
|
||
---
|
||
|
||
## §5 — Recommendation (preliminary)
|
||
|
||
Read the matrix as: **B is the right answer for now**, with
|
||
**explicit awareness of D as the v2.0 destination**.
|
||
|
||
- A is too little. We'd be back here.
|
||
- C and E are right in shape but wrong in timing — we don't have a
|
||
week to rebuild the operator's reconcile loop, and the platform
|
||
isn't in production yet, so there's no urgent "we have to
|
||
refactor anyway" pressure.
|
||
- D is conceptually the cleanest, but a v0.1 production push
|
||
is the wrong moment to start running custom kubelets.
|
||
- B captures most of the leverage of C/D within the 3-day window,
|
||
with a clean migration path to either of them later (the
|
||
capability traits are the seam — swap the implementation, not the
|
||
callers).
|
||
|
||
**One concrete shape** to pursue under Alternative B (worth
|
||
sketching as the strawman ADR):
|
||
|
||
- New crate `harmony-fleet/` (the domain crate). Depends on
|
||
`harmony-reconciler-contracts` only.
|
||
- Domain records: `FleetDevice`, `FleetDeployment`, `FleetState`.
|
||
- Capability traits: `DeviceRegistry`, `DesiredStatePublisher`,
|
||
`ObservedStateConsumer`, `IdentityProvider`,
|
||
`AgentLifecycle`.
|
||
- `harmony-fleet-adapters-nats/` — `NatsDeviceRegistry`,
|
||
`NatsDesiredStatePublisher`, etc. NATS-specific.
|
||
- `harmony-fleet-adapters-zitadel/` — `ZitadelIdentityProvider`.
|
||
- `harmony-fleet-adapters-kube/` — `KubeFleetReflector` (writes
|
||
`Device` and `Deployment` CRs as a *reflection* of the domain
|
||
state, not as the source of truth).
|
||
- `harmony-fleet-operator/` — daemon. Wires adapters together.
|
||
- `harmony-fleet-agent/` — daemon. Wires adapters together.
|
||
- `harmony-fleet-cli/` — tomorrow's `harmony-fleet` plugin.
|
||
- `harmony/modules/fleet/` is **deleted**. The framework `harmony`
|
||
crate gets a thin `harmony::modules::fleet` *re-export only*
|
||
module that points at `harmony-fleet`. After v0.2 is shipped,
|
||
the re-export module goes away too.
|
||
|
||
CRDs (`Deployment`, `Device`) move to
|
||
`harmony-fleet-adapters-kube/` because they're a kube-specific
|
||
projection of the domain, not the domain itself. The agent
|
||
imports `harmony-fleet`'s domain types, not the CRDs.
|
||
|
||
The setup-side scores stay in `harmony` (because they need the
|
||
framework's `HelmCommand`, `K8sclient`, etc.) but they consume
|
||
`harmony-fleet`'s domain types. The fleet's *domain* doesn't
|
||
depend on the framework; the framework's *deploy procedures*
|
||
depend on the fleet's domain. Direction of dependency is the
|
||
inverse of today.
|
||
|
||
## §6 — Open questions before we lock this
|
||
|
||
These are real questions; pulling them out so JG's review has
|
||
something concrete to react to:
|
||
|
||
- **Q1.** Is `IdentityProvider` the right capability name, or is
|
||
it more honest to name it after what we actually need
|
||
(`DeviceCredentialMinter`, `OperatorTokenProvider`)? The talk
|
||
argues against generic names — if reality has two distinct
|
||
concerns, two traits.
|
||
- **Q2.** Should `Device` CRD live in adapters-kube, or should it
|
||
not exist at all (replaced by reading kube-API node info, per
|
||
alternative D)? The middle ground (own CRD that mirrors kube
|
||
Node) is what we have today, and it's the worst of both.
|
||
- **Q3.** The agent's wire-format for `ReconcileScore` —
|
||
externally tagged enum, today only `PodmanV0`. Move it to
|
||
`harmony-reconciler-contracts` (canonical wire seam) and let
|
||
*both* the agent and the operator import only that crate. This
|
||
removes the `harmony::modules::podman` cross-crate dependency.
|
||
Worth doing in any of A/B/C.
|
||
- **Q4.** Does the v0.1 prod push wait for this redesign, or does
|
||
it ship on the current shape with the redesign happening in
|
||
v0.2? Tradeoff: shipping now means committing to *some* public
|
||
API; shipping after means slipping the customer date.
|
||
Recommendation: **ship the redesign first, slip 3 days**, on
|
||
the grounds that public API churn after a customer is on it
|
||
costs more than a 3-day delay before they're on it.
|
||
- **Q5.** Where do the *runtime tools* (the `harmony-fleet` CLI
|
||
plugin, future frontend) sit in the dependency graph? If they
|
||
depend on `harmony-fleet`'s domain crate only, we can build
|
||
them without pulling in helm / kube / ansible at compile time.
|
||
This is what we want for the device-side enrollment binary too
|
||
(already feature-gated; the redesign should make the gate
|
||
unnecessary).
|
||
|
||
---
|
||
|
||
## §7 — Next steps
|
||
|
||
1. Sit with this document. Walk away from it for an hour.
|
||
2. Round-table on §3 — do P1–P10 capture *the* problems, or are
|
||
we missing one?
|
||
3. Round-table on §4 — does the comparison matrix feel honest,
|
||
or is it tilted?
|
||
4. Pick one alternative as the working hypothesis.
|
||
5. Spike: take one slice through the chosen alternative
|
||
(suggested: `EnrollmentIntent::resolve` + `DeviceCredential` +
|
||
the `IdentityProvider` trait — the smallest end-to-end shape
|
||
that touches every layer). Commit it on a branch. Eyeball:
|
||
does the resulting code feel better?
|
||
6. Either: commit to the alternative as ADR-023, or back out
|
||
and try another.
|
||
|
||
This document gets updated as we go. It is NOT meant to be
|
||
locked at first draft.
|