All checks were successful
Run Check Script / check (pull_request) Successful in 2m21s
634 lines
22 KiB
HTML
634 lines
22 KiB
HTML
<!doctype html>
|
|
<html lang="en">
|
|
<head>
|
|
<meta charset="utf-8">
|
|
<meta name="viewport" content="width=device-width,initial-scale=1">
|
|
<title>Harmony Fleet — Architecture</title>
|
|
<script src="https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.min.js"></script>
|
|
<script>
|
|
mermaid.initialize({
|
|
startOnLoad: true,
|
|
theme: "base",
|
|
themeVariables: {
|
|
fontFamily: "ui-sans-serif, -apple-system, Segoe UI, Inter, sans-serif",
|
|
primaryColor: "#eef3fb",
|
|
primaryBorderColor: "#7a93b7",
|
|
primaryTextColor: "#1f2937",
|
|
lineColor: "#5b6b80",
|
|
tertiaryColor: "#fafbfd",
|
|
clusterBkg: "#f6f8fc",
|
|
clusterBorder: "#c6d2e2",
|
|
noteBkgColor: "#fff8e1",
|
|
noteTextColor: "#3a2f00",
|
|
actorBkg: "#eef3fb",
|
|
actorBorder: "#7a93b7",
|
|
sequenceNumberColor: "#1f2937"
|
|
}
|
|
});
|
|
</script>
|
|
<style>
|
|
:root {
|
|
--ink: #1f2937;
|
|
--ink-soft: #4b5563;
|
|
--paper: #ffffff;
|
|
--paper-tint: #f6f8fc;
|
|
--rule: #e3e8ef;
|
|
--accent: #2c5282;
|
|
--accent-soft: #ebf2fb;
|
|
--warn: #b7791f;
|
|
--warn-soft: #fff8e1;
|
|
--mono: ui-monospace, SFMono-Regular, "JetBrains Mono", Menlo, Consolas, monospace;
|
|
--sans: ui-sans-serif, -apple-system, "Segoe UI", Inter, system-ui, sans-serif;
|
|
}
|
|
* { box-sizing: border-box; }
|
|
html, body {
|
|
margin: 0;
|
|
background: var(--paper);
|
|
color: var(--ink);
|
|
font-family: var(--sans);
|
|
line-height: 1.6;
|
|
font-size: 16px;
|
|
}
|
|
main {
|
|
max-width: 880px;
|
|
margin: 0 auto;
|
|
padding: 4rem 1.5rem 6rem;
|
|
}
|
|
header.hero {
|
|
margin-bottom: 3rem;
|
|
border-bottom: 1px solid var(--rule);
|
|
padding-bottom: 2rem;
|
|
}
|
|
header.hero h1 {
|
|
font-size: 2.4rem;
|
|
line-height: 1.15;
|
|
letter-spacing: -0.02em;
|
|
margin: 0 0 1rem;
|
|
color: var(--ink);
|
|
}
|
|
header.hero p.subtitle {
|
|
margin: 0;
|
|
color: var(--ink-soft);
|
|
font-size: 1.1rem;
|
|
}
|
|
header.hero p.subtitle b { color: var(--ink); font-weight: 600; }
|
|
h2 {
|
|
margin-top: 3.5rem;
|
|
margin-bottom: 1rem;
|
|
font-size: 1.55rem;
|
|
letter-spacing: -0.01em;
|
|
color: var(--ink);
|
|
display: flex;
|
|
align-items: baseline;
|
|
gap: 0.75rem;
|
|
}
|
|
h2 .layer {
|
|
font-size: 0.7rem;
|
|
text-transform: uppercase;
|
|
letter-spacing: 0.08em;
|
|
color: var(--accent);
|
|
background: var(--accent-soft);
|
|
padding: 0.15rem 0.55rem;
|
|
border-radius: 999px;
|
|
font-weight: 600;
|
|
line-height: 1.6;
|
|
flex-shrink: 0;
|
|
}
|
|
h3 {
|
|
margin-top: 2rem;
|
|
font-size: 1.1rem;
|
|
color: var(--ink);
|
|
}
|
|
p, li { color: var(--ink); }
|
|
a { color: var(--accent); text-decoration: none; border-bottom: 1px solid transparent; }
|
|
a:hover { border-bottom-color: var(--accent); }
|
|
code {
|
|
font-family: var(--mono);
|
|
font-size: 0.92em;
|
|
background: var(--paper-tint);
|
|
padding: 0.08em 0.35em;
|
|
border-radius: 4px;
|
|
border: 1px solid var(--rule);
|
|
}
|
|
pre {
|
|
background: var(--paper-tint);
|
|
border: 1px solid var(--rule);
|
|
border-radius: 8px;
|
|
padding: 1rem 1.2rem;
|
|
overflow-x: auto;
|
|
font-family: var(--mono);
|
|
font-size: 0.88rem;
|
|
line-height: 1.5;
|
|
}
|
|
pre code {
|
|
background: none;
|
|
border: none;
|
|
padding: 0;
|
|
}
|
|
blockquote {
|
|
margin: 1.5rem 0;
|
|
padding: 0.6rem 1.2rem;
|
|
border-left: 3px solid var(--accent);
|
|
background: var(--accent-soft);
|
|
color: var(--ink);
|
|
border-radius: 0 6px 6px 0;
|
|
}
|
|
blockquote p { margin: 0.3rem 0; }
|
|
.callout {
|
|
margin: 1.5rem 0;
|
|
padding: 0.8rem 1.2rem;
|
|
border-left: 3px solid var(--warn);
|
|
background: var(--warn-soft);
|
|
border-radius: 0 6px 6px 0;
|
|
color: #4a3c10;
|
|
font-size: 0.95rem;
|
|
}
|
|
.callout b { color: #3a2f00; }
|
|
table {
|
|
border-collapse: collapse;
|
|
width: 100%;
|
|
margin: 1.5rem 0;
|
|
font-size: 0.95rem;
|
|
}
|
|
th, td {
|
|
text-align: left;
|
|
padding: 0.6rem 0.8rem;
|
|
border-bottom: 1px solid var(--rule);
|
|
vertical-align: top;
|
|
}
|
|
th {
|
|
background: var(--paper-tint);
|
|
font-weight: 600;
|
|
color: var(--ink);
|
|
border-bottom: 2px solid var(--rule);
|
|
}
|
|
tr:hover td { background: var(--paper-tint); }
|
|
details {
|
|
margin: 1.2rem 0;
|
|
border: 1px solid var(--rule);
|
|
border-radius: 8px;
|
|
background: var(--paper-tint);
|
|
padding: 0;
|
|
overflow: hidden;
|
|
}
|
|
details summary {
|
|
cursor: pointer;
|
|
padding: 0.75rem 1.1rem;
|
|
font-weight: 600;
|
|
color: var(--ink);
|
|
list-style: none;
|
|
user-select: none;
|
|
display: flex;
|
|
align-items: center;
|
|
gap: 0.5rem;
|
|
transition: background 80ms ease;
|
|
}
|
|
details summary::-webkit-details-marker { display: none; }
|
|
details summary::before {
|
|
content: "▸";
|
|
color: var(--accent);
|
|
transition: transform 120ms ease;
|
|
display: inline-block;
|
|
font-size: 0.85em;
|
|
}
|
|
details[open] summary::before { transform: rotate(90deg); }
|
|
details summary:hover { background: rgba(0,0,0,0.02); }
|
|
details > *:not(summary) {
|
|
padding: 0 1.1rem;
|
|
}
|
|
details > *:not(summary):last-child {
|
|
padding-bottom: 1rem;
|
|
}
|
|
details[open] summary {
|
|
border-bottom: 1px solid var(--rule);
|
|
}
|
|
.mermaid {
|
|
background: var(--paper);
|
|
border: 1px solid var(--rule);
|
|
border-radius: 8px;
|
|
padding: 1.2rem;
|
|
margin: 1.5rem 0;
|
|
text-align: center;
|
|
overflow-x: auto;
|
|
}
|
|
.diagram {
|
|
background: var(--paper);
|
|
border: 1px solid var(--rule);
|
|
border-radius: 8px;
|
|
padding: 1.2rem;
|
|
margin: 1.5rem 0;
|
|
overflow-x: auto;
|
|
}
|
|
.diagram img {
|
|
display: block;
|
|
width: 100%;
|
|
height: auto;
|
|
}
|
|
hr {
|
|
border: none;
|
|
border-top: 1px solid var(--rule);
|
|
margin: 3rem 0;
|
|
}
|
|
ul, ol { padding-left: 1.4rem; }
|
|
ul li, ol li { margin: 0.25rem 0; }
|
|
.stop-here {
|
|
margin: 2rem 0;
|
|
text-align: center;
|
|
color: var(--ink-soft);
|
|
font-style: italic;
|
|
font-size: 0.95rem;
|
|
}
|
|
.stop-here::before, .stop-here::after {
|
|
content: " — ";
|
|
color: var(--rule);
|
|
}
|
|
footer {
|
|
margin-top: 5rem;
|
|
padding-top: 2rem;
|
|
border-top: 1px solid var(--rule);
|
|
color: var(--ink-soft);
|
|
font-size: 0.9rem;
|
|
}
|
|
</style>
|
|
</head>
|
|
<body>
|
|
<main>
|
|
|
|
<header class="hero">
|
|
<h1>Harmony Fleet — Architecture</h1>
|
|
<p class="subtitle">
|
|
An operator declares <b>what</b> to run, in Kubernetes.
|
|
Agents on devices make it real, in their own containers.
|
|
NATS is the bus between them. Zitadel signs the agent's passport.
|
|
</p>
|
|
</header>
|
|
|
|
<p>This document walks the system in layers. Read until you stop having questions —
|
|
each layer adds one idea on top of the previous one.</p>
|
|
|
|
<hr>
|
|
|
|
<h2><span class="layer">Layer 0</span> One picture</h2>
|
|
|
|
<p>Physical topology. Everything that is not a human or an edge device
|
|
runs <b>inside the Kubernetes cluster</b>: Fleet Operator, NATS +
|
|
JetStream, the auth callout, and Zitadel — all as pods. Devices
|
|
(Raspberry Pi, VM, bare-metal) live outside the cluster and connect to
|
|
NATS through the callout-authenticated path. An operator pushes
|
|
deployments from the top via the dashboard or <code>kubectl</code>; a
|
|
sysadmin enrolls each device once over SSH.</p>
|
|
|
|
<div class="mermaid">
|
|
flowchart TB
|
|
classDef actor fill:#fff8e1,stroke:#b7791f,stroke-width:1.5px,color:#3a2f00,font-weight:600
|
|
classDef kube fill:#eef3fb,stroke:#7a93b7,color:#1f2937
|
|
classDef bus fill:#e6f4ea,stroke:#6aa974,color:#1f2937
|
|
classDef auth fill:#f3ecfb,stroke:#9070b8,color:#1f2937
|
|
classDef dev fill:#fdecea,stroke:#c97870,color:#1f2937
|
|
classDef devLite fill:#fdecea,stroke:#c97870,color:#1f2937,stroke-dasharray:4 3
|
|
|
|
OPER(["👤 Operator / CI<br/>uses dashboard / kubectl"]):::actor
|
|
|
|
subgraph CLUSTER [Kubernetes cluster k3d / OKD / any]
|
|
direction LR
|
|
KAPI["Kubernetes Control Plane<br/><i>Deployment</i> and <i>Device</i> CRDs"]:::kube
|
|
OP["Harmony Fleet Operator<br/>reconciles CRDs ↔ KV"]:::kube
|
|
KV[("NATS JetStream<br/>desired-state · device-info<br/>device-state · device-heartbeat")]:::bus
|
|
CALLOUT["NATS Auth Callout"]:::auth
|
|
ZITADEL["Zitadel<br/>OIDC + machine users"]:::auth
|
|
|
|
KAPI --- OP
|
|
OP -- writes / watches KV --> KV
|
|
KV -- NATS delegates auth --> CALLOUT
|
|
CALLOUT -- validates JWT --> ZITADEL
|
|
end
|
|
|
|
subgraph FLEET [Fleet of devices Raspberry Pi · VM · bare-metal]
|
|
direction LR
|
|
AG1["Fleet Agent + Podman<br/>apply / delete deployments<br/>report status · respond to commands"]:::dev
|
|
AG2["Fleet Agent + Podman<br/>· · ·"]:::devLite
|
|
end
|
|
|
|
SYS(["🧑🔧 Sysadmin<br/>one-time SSH enrollment per device"]):::actor
|
|
|
|
OPER -- kubectl / dashboard --> KAPI
|
|
|
|
AG1 -- connect with JWT, watch / publish KV --> KV
|
|
AG2 -.-> KV
|
|
|
|
SYS -- SSH device-setup --> AG1
|
|
SYS -- SSH device-setup --> AG2
|
|
</div>
|
|
|
|
<p>That's it. The rest of the document explains the boxes.</p>
|
|
|
|
<hr>
|
|
|
|
<h2><span class="layer">Layer 1</span> The three planes</h2>
|
|
|
|
<p>The fleet system has three planes that are deliberately decoupled:</p>
|
|
|
|
<table>
|
|
<thead><tr><th>Plane</th><th>What lives here</th><th>Why</th></tr></thead>
|
|
<tbody>
|
|
<tr>
|
|
<td><b>Control</b></td>
|
|
<td>Kubernetes (k3d, OKD, vanilla — anything) + the <b>Fleet Operator</b></td>
|
|
<td>Operators already know how to talk to k8s. <code>kubectl apply</code> is the API.</td>
|
|
</tr>
|
|
<tr>
|
|
<td><b>Bus</b></td>
|
|
<td>A NATS server with JetStream + an auth callout that talks to Zitadel</td>
|
|
<td>Edge devices come and go; the bus tolerates that. KV gives us last-writer-wins state without bespoke sync.</td>
|
|
</tr>
|
|
<tr>
|
|
<td><b>Edge</b></td>
|
|
<td>Each device runs the <b>Fleet Agent</b> binary, which drives <b>Podman</b></td>
|
|
<td>Devices don't speak k8s — they speak NATS and run containers locally.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<div class="mermaid">
|
|
flowchart LR
|
|
subgraph control [Control plane — Kubernetes]
|
|
direction TB
|
|
API[API Server + etcd]
|
|
OP[Fleet Operator]
|
|
DASH[/Dashboard — optional, feature-gated/]
|
|
API <--> OP
|
|
OP --- DASH
|
|
end
|
|
subgraph bus [Bus — NATS]
|
|
direction TB
|
|
NATS[NATS + JetStream KV]
|
|
CALLOUT[Auth Callout]
|
|
ZIT[Zitadel OIDC]
|
|
NATS -. token check .-> CALLOUT
|
|
CALLOUT -. validate JWT .-> ZIT
|
|
end
|
|
subgraph edge [Edge — fleet device]
|
|
direction TB
|
|
AGENT[Fleet Agent]
|
|
PODMAN[Podman]
|
|
AGENT --> PODMAN
|
|
end
|
|
OP <-->|KV| NATS
|
|
AGENT <-->|KV + commands| NATS
|
|
</div>
|
|
|
|
<div class="stop-here">Stop here if you only needed to know the shape</div>
|
|
|
|
<hr>
|
|
|
|
<h2><span class="layer">Layer 2</span> A deployment, end-to-end</h2>
|
|
|
|
<p>Walk through what happens when an operator runs <code>kubectl apply -f my-deployment.yaml</code>:</p>
|
|
|
|
<div class="mermaid">
|
|
sequenceDiagram
|
|
autonumber
|
|
actor User as Actor (SRE)
|
|
participant K8s as API Server
|
|
participant Op as Fleet Operator
|
|
participant Bus as NATS KV
|
|
participant Ag as Agent (on device)
|
|
participant Pm as Podman
|
|
|
|
User->>K8s: kubectl apply Deployment CR
|
|
K8s-->>Op: watch event (Deployment added)
|
|
Op->>Op: evaluate spec.targetSelector against Device CR labels
|
|
Op->>Bus: PUT desired-state.<dev>.<dep> = ReconcileScore JSON
|
|
Bus-->>Ag: KV watch event
|
|
Ag->>Ag: deserialize Score, build Interpret
|
|
Ag->>Pm: pull image, create/update container
|
|
Pm-->>Ag: container Running
|
|
Ag->>Bus: PUT device-state.state.<dev>.<dep> = Running
|
|
Bus-->>Op: KV watch event
|
|
Op->>K8s: PATCH Deployment.status.aggregate
|
|
</div>
|
|
|
|
<p>Things to notice:</p>
|
|
<ul>
|
|
<li><b>The agent never talks to the API server.</b> Only the operator does. Everything edge-bound flows through NATS.</li>
|
|
<li><b>The flow is one-way for desired state, one-way for reported state.</b> The two paths cross at NATS, never at k8s.</li>
|
|
<li><b>The aggregator coalesces</b> — status patches fire at 1 Hz, not on every event, so high-frequency churn doesn't beat up the API server.</li>
|
|
</ul>
|
|
|
|
<details>
|
|
<summary>The CRDs in detail</summary>
|
|
<p>Group: <code>fleet.nationtech.io</code> · Version: <code>v1alpha1</code></p>
|
|
<ul>
|
|
<li>
|
|
<b><code>Deployment</code></b> (kind), plural <code>deployments</code>, short <code>fleetdep</code>, <b>namespaced</b><br>
|
|
Spec: <code>targetSelector: LabelSelector</code>, <code>score: ReconcileScore</code>, <code>rollout: Rollout</code><br>
|
|
Status: <code>aggregate: { matchedDeviceCount, succeeded, failed, pending, lastError }</code>
|
|
</li>
|
|
<li>
|
|
<b><code>Device</code></b> (kind), plural <code>devices</code>, short <code>fleetdev</code>, <b>cluster-scoped</b><br>
|
|
Spec: <code>inventory: InventorySnapshot</code><br>
|
|
Cluster-scoped because devices are infrastructure — the same way <code>Node</code> is cluster-scoped.
|
|
</li>
|
|
</ul>
|
|
<p>Devices in k8s are <b>created by the operator</b> from agent-published <code>device-info</code> KV entries. Agents never touch the API server.</p>
|
|
<p>Source: <code>harmony/src/modules/fleet/operator/crd.rs</code></p>
|
|
</details>
|
|
|
|
<hr>
|
|
|
|
<h2><span class="layer">Layer 3</span> The four KV buckets</h2>
|
|
|
|
<p>The bus is more granular than "a NATS KV". The fleet contract pins <b>four</b> named buckets, each with its own write/read direction.</p>
|
|
|
|
<table>
|
|
<thead><tr><th>Bucket</th><th>Writer</th><th>Reader(s)</th><th>Key format</th><th>Purpose</th></tr></thead>
|
|
<tbody>
|
|
<tr>
|
|
<td><code>desired-state</code></td>
|
|
<td>Operator</td>
|
|
<td>Agent (watch)</td>
|
|
<td><code><device>.<deployment></code></td>
|
|
<td>The score the agent should reconcile to</td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>device-state</code></td>
|
|
<td>Agent</td>
|
|
<td>Operator (watch + aggregator)</td>
|
|
<td><code>state.<device>.<deployment></code></td>
|
|
<td>Current reconcile phase per (device, deployment)</td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>device-info</code></td>
|
|
<td>Agent</td>
|
|
<td>Operator (watches; creates/patches <code>Device</code> CR)</td>
|
|
<td><code>info.<device></code></td>
|
|
<td>Routing labels, inventory snapshot, agent version</td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>device-heartbeat</code></td>
|
|
<td>Agent</td>
|
|
<td>Operator (liveness)</td>
|
|
<td><code>heartbeat.<device></code></td>
|
|
<td>Tiny liveness ping every N seconds, kept off the state bucket to avoid churn</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<figure class="diagram">
|
|
<img
|
|
src="harmony-fleet-assets/layer-3-kv-buckets.svg"
|
|
alt="Layer 3 KV bucket data flow between the operator, NATS stores, and agent"
|
|
>
|
|
</figure>
|
|
|
|
<p>These four bucket names are <b>the contract</b> between agent and operator. They live in one place to keep cross-component drift from happening:</p>
|
|
|
|
<pre><code>// harmony-reconciler-contracts/src/kv.rs
|
|
pub const BUCKET_DESIRED_STATE: &str = "desired-state";
|
|
pub const BUCKET_DEVICE_INFO: &str = "device-info";
|
|
pub const BUCKET_DEVICE_STATE: &str = "device-state";
|
|
pub const BUCKET_DEVICE_HEARTBEAT: &str = "device-heartbeat";</code></pre>
|
|
|
|
<p>There's also a <b>commands</b> path for request/response RPCs (ping today; logs/exec planned) on core-NATS subjects <code>device-commands.<device-id>.<verb></code>, separate from JetStream KV.</p>
|
|
|
|
<hr>
|
|
|
|
<h2><span class="layer">Layer 4</span> Identity & auth</h2>
|
|
|
|
<p>Agents authenticate to NATS with a <b>Zitadel-signed JWT bearer token</b>. NATS doesn't validate the JWT itself; it delegates to a NATS <b>auth callout</b>, which is just another connected client running our <code>harmony-fleet-auth</code> binary.</p>
|
|
|
|
<div class="mermaid">
|
|
sequenceDiagram
|
|
autonumber
|
|
participant Ag as Agent
|
|
participant Z as Zitadel
|
|
participant N as NATS server
|
|
participant C as Auth Callout (harmony-fleet-auth)
|
|
|
|
Note over Ag,Z: One-time bootstrap (or before token expiry)
|
|
Ag->>Z: JWT assertion (RFC 7523, signed with device key)
|
|
Z-->>Ag: short-lived access token
|
|
|
|
Note over Ag,N: Every (re)connect
|
|
Ag->>N: CONNECT with bearer = access token
|
|
N->>C: auth callout request
|
|
C->>Z: introspect / validate signature
|
|
Z-->>C: token valid, claims = { device_id, ... }
|
|
C-->>N: ALLOW, permissions scoped to device_id
|
|
N-->>Ag: connection accepted
|
|
</div>
|
|
|
|
<p><b>Per-device scoping</b> — the callout derives NATS subject permissions from the JWT's <code>device_id</code> claim, so a compromised device key can only touch its own subjects.</p>
|
|
<p><b>Token rotation</b> — the agent's auth callback is invoked by <code>async-nats</code> on every reconnect; the token cache mints a fresh one within a 5-minute leeway window. This is how the "never lose connectivity across token rollovers" guarantee holds.</p>
|
|
|
|
<div class="callout">
|
|
<b>Today vs. target.</b> The CLI in <code>harmony-fleet-deploy/src/main.rs</code> defaults to <b>user/pass NATS</b> (<code>FleetNatsScore::user_pass</code>) for the v1 walking skeleton. The Zitadel/callout path is wired through <code>FleetServerScore</code>'s optional fields and is the production target — the diagram describes the target, not what the dev <code>main.rs</code> lights up by default.
|
|
</div>
|
|
|
|
<details>
|
|
<summary>Where this lives in code</summary>
|
|
<ul>
|
|
<li>Auth callout binary: <code>fleet/harmony-fleet-auth/src/lib.rs</code></li>
|
|
<li>Credential source + JWT minting: <code>fleet/harmony-fleet-auth/src/credentials.rs</code></li>
|
|
<li>Composing it into a server install: <code>FleetServerScore { auth_callout: Some(...) }</code> in <code>fleet/harmony-fleet-deploy/src/server.rs</code></li>
|
|
</ul>
|
|
</details>
|
|
|
|
<hr>
|
|
|
|
<h2><span class="layer">Layer 5</span> Device enrollment (one-time setup)</h2>
|
|
|
|
<p>A device joins the fleet through <code>FleetDeviceSetupScore</code> (in <code>harmony/src/modules/fleet/setup_score.rs</code>). Three flavours, in order of seriousness:</p>
|
|
|
|
<ol>
|
|
<li><b>Dev / lab</b> — <code>FleetDeviceAuth::TomlShared</code>: a shared NATS user/pass baked into config. Zero auth infra. Don't ship this to a real device.</li>
|
|
<li><b>Production A</b> — <code>FleetDeviceAuth::ZitadelJwt</code>: an admin pre-creates a Zitadel machine user, exports its key JSON, and drops it at <code>/etc/fleet-agent/zitadel-key.json</code> on the device.</li>
|
|
<li><b>Production B (recommended)</b> — <code>FleetDeviceAuth::ZitadelEnroll</code>: the setup score itself talks to Zitadel's management API to mint a per-device machine key. No pre-provisioning. Works either developer-on-device (Zitadel device-code flow opens a browser) or operator-via-SSH.</li>
|
|
</ol>
|
|
|
|
<p>What the setup score does, in order:</p>
|
|
<ol>
|
|
<li>Renders <code>/etc/fleet-agent/config.toml</code> (device id, NATS URL, auth credentials).</li>
|
|
<li>Drops the agent binary at <code>/usr/local/bin/fleet-agent</code>.</li>
|
|
<li>Enables <code>fleet-agent.service</code> (systemd).</li>
|
|
<li>Agent boots, connects to NATS with bearer token from the keyfile.</li>
|
|
<li>Agent publishes initial DeviceInfo into the <code>device-info</code> bucket at key <code>info.<device_id></code>.</li>
|
|
<li>Agent starts watching the <code>desired-state</code> bucket for keys matching <code><device_id>.></code>.</li>
|
|
<li>Agent answers <code>device-commands.<device_id>.ping</code>.</li>
|
|
</ol>
|
|
|
|
<p>After step 5 the operator reflects the agent-published DeviceInfo into a cluster-scoped <code>Device</code> CR. From that moment, a new <code>Deployment</code> CR whose <code>targetSelector</code> matches the Device's labels will land on the device automatically.</p>
|
|
|
|
<hr>
|
|
|
|
<h2><span class="layer">Layer 6</span> What runs where</h2>
|
|
|
|
<div class="mermaid">
|
|
flowchart TB
|
|
subgraph cluster [Kubernetes — fleet-system namespace]
|
|
direction TB
|
|
OP["Pod: harmony-fleet-operator
|
|
watches CRDs, writes desired-state KV,
|
|
aggregates device-state into CR status,
|
|
optional dashboard on :18080"]
|
|
NATS["Pod: NATS + JetStream
|
|
4 KV buckets, command subjects"]
|
|
CO["Pod: harmony-fleet-auth
|
|
NATS auth callout — validates JWTs"]
|
|
ZT["Pods: Zitadel + Postgres
|
|
OIDC, JWT signing"]
|
|
end
|
|
subgraph device [Edge — a Raspberry Pi or any podman host]
|
|
direction TB
|
|
AG["systemd: fleet-agent.service
|
|
watches desired-state.<id>.>
|
|
writes device-state, device-info, device-heartbeat
|
|
handles device-commands.<id>.<verb>"]
|
|
PM[podman socket]
|
|
AG --> PM
|
|
end
|
|
AG <-->|NATS over WSS / TLS| NATS
|
|
OP <-->|in-cluster NATS| NATS
|
|
NATS -. callout .- CO
|
|
CO -. JWT introspect .- ZT
|
|
OP -. dashboard SSO / JWKS .- ZT
|
|
</div>
|
|
|
|
<hr>
|
|
|
|
<h2>Cheat sheet — where to start reading</h2>
|
|
|
|
<table>
|
|
<thead><tr><th>If you want to understand…</th><th>Open this file</th></tr></thead>
|
|
<tbody>
|
|
<tr><td>What a Deployment / Device CR looks like</td><td><code>harmony/src/modules/fleet/operator/crd.rs</code></td></tr>
|
|
<tr><td>The names of the KV buckets and key formats</td><td><code>harmony-reconciler-contracts/src/kv.rs</code></td></tr>
|
|
<tr><td>Operator: how CR → KV reconciliation works</td><td><code>fleet/harmony-fleet-operator/src/fleet_aggregator.rs</code></td></tr>
|
|
<tr><td>Agent: how KV → Podman reconciliation works</td><td><code>fleet/harmony-fleet-agent/src/reconciler.rs</code></td></tr>
|
|
<tr><td>Auth: JWT minting and NATS callout protocol</td><td><code>fleet/harmony-fleet-auth/src/credentials.rs</code></td></tr>
|
|
<tr><td>Deploying the whole server-side stack</td><td><code>fleet/harmony-fleet-deploy/src/server.rs</code></td></tr>
|
|
<tr><td>One-time device enrollment</td><td><code>harmony/src/modules/fleet/setup_score.rs</code></td></tr>
|
|
<tr><td>Why it's shaped this way (philosophy)</td><td><code>docs/adr/016-…</code> and <code>docs/adr/023-deploy-architecture.md</code></td></tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<h2>Glossary, for quick reference</h2>
|
|
|
|
<ul>
|
|
<li><b>Score</b> — a Rust struct describing desired state (declarative). <code>ReconcileScore</code> is the variant agents apply.</li>
|
|
<li><b>Topology</b> — what the environment can do (capabilities exposed as traits). The agent uses <code>PodmanTopology</code>; the deploy CLI uses <code>K8sAnywhereTopology</code>.</li>
|
|
<li><b>Interpret</b> — the glue that drives a Topology to fulfil a Score. Agents call <code>score.create_interpret().execute(&inv, &PodmanTopology)</code>.</li>
|
|
<li><b>Auth callout</b> — a NATS feature where the server delegates AuthN to a connected client; here, that client is <code>harmony-fleet-auth</code>.</li>
|
|
<li><b>K8sAnywhere</b> — single Topology implementation that targets any reachable cluster (k3d, OKD, vanilla) via the kubeconfig. Today the only topology wired into <code>harmony-fleet-deploy</code>; <code>K8sBareTopology</code> is planned.</li>
|
|
</ul>
|
|
|
|
<footer>
|
|
Source of truth lives in the repo. This document validates against
|
|
<code>fleet/</code> and <code>harmony/src/modules/fleet/</code> as of the commit on
|
|
<code>feat/iot-walking-skeleton</code>. If a layer looks wrong to you, it probably is — open a PR.
|
|
</footer>
|
|
|
|
</main>
|
|
</body>
|
|
</html>
|