NationTech/harmony

Fork 2

Files

History

Jean-Gabriel Gill-Couture 89e5e104dc

Run Check Script / check (push) Successful in 2m14s

Details

Compile and package harmony_composer / package_harmony_composer (push) Successful in 8m22s

Details

harmony-fleet-operator — release / release (push) Successful in 3m17s

Details

feat(fleet): unify deploy config, switch CLI to tracing, fix OCI chart name collision

fleet-deploy:
- Rename harmony-fleet-release binary to harmony-fleet-publish
- Route all deploy settings through ConfigClient (env → OpenBao → prompt)
  instead of bespoke flags; seed FleetDeploySecrets via OpenBao
- Rename HARMONY_SECRET_NAMESPACE to HARMONY_CONFIG_NAMESPACE
- Append -chart to the Helm chart artifact name so it no longer collides
  with the Docker image in Harbor (application/vnd.cncf.helm.config.v1+json)

harmony_cli:
- Switch from log to tracing for structured output
- Defer topology prep so --list and declined runs are no-ops
- Drop ANSI colour codes around log emojis
- Init cli logger in fleet deploy binary

openbao:
- Scope unseal-keys cache file per instance
- Example gains setup capability and updated README

roadmap:
- Add unified CLI design document (ROADMAP/13-unified-cli.md)
- Update v0.3 fleet platform plan

Squashed commit of the following:

commit 36d9d9aaec
Merge: 12c8d9cf e7148aa8
Author: johnride <jg@nationtech.io>
Date:   Mon Jun 1 15:42:56 2026 +0000

    Merge pull request 'fix: fleet operator chart name was conflicting with the container name. Append -chart to the chart name' (#317) from fix/fleet-operator-chart-name into chore/rename-release-to-publish

    Reviewed-on: #317

commit e7148aa85f
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Mon Jun 1 11:35:15 2026 -0400

    fix: fleet operator chart name was conflicting with the container name. Append -chart to the chart name

commit 12c8d9cfa0
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Mon Jun 1 11:12:23 2026 -0400

    feat: Init cli logger in fleet deploy

commit edb62668b6
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sun May 31 12:56:36 2026 -0400

    doc: Roadmap entry for cli design and implementation

commit f2ecccb4ab
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sun May 31 12:32:19 2026 -0400

    refactor(fleet-deploy): rename harmony-fleet-release to harmony-fleet-publish

    Deploy/publish wording is more intuitive than deploy/release.

commit 2e9052b217
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sun May 31 10:12:54 2026 -0400

    fix(openbao): remove extra blank line in example

    Pre-existing formatting issue caught by cargo fmt --check.

commit f7299ebe2b
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sun May 31 09:13:39 2026 -0400

    refactor(fleet-deploy): rename HARMONY_SECRET_NAMESPACE to HARMONY_CONFIG_NAMESPACE

    The env var name was a misnomer — ConfigClient resolves both config and
    secrets, not just secrets. The struct field was already config_namespace.
    Legacy SecretManager keeps the old var; this forces migration to
    ConfigClient for new code.

commit d39aa15152
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sun May 31 09:06:20 2026 -0400

    feat: fleet deploy uses configuration from configclient for all settings, update the 0_3 plan

commit 57d056fced
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sat May 30 11:07:03 2026 -0400

    fix(openbao): scope unseal-keys cache file per instance

    The root token + unseal keys were written to a single fixed
    `~/.local/share/harmony/openbao/unseal-keys.json`, so deploying a second
    OpenBao instance (different namespace/release) overwrote the first's keys —
    after which the first could never be unsealed. Key the file by
    namespace+release (`unseal-keys-<ns>-<release>.json`); `cached_root_token`
    now takes the `OpenbaoInstance` to read the right one.

commit 44aa83199a
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sat May 30 11:05:30 2026 -0400

    fix(harmony_cli): drop ANSI colour codes around log emojis

    `console::style(emoji).green()/.yellow()/.red()/.blue()` embedded raw ANSI
    escapes in the message string. `console` force-emits them off its own TTY
    detection, which disagrees with the tracing writer, so they leaked as literal
    `\x1b[..m` garbage around the emoji. Emit plain emojis — the glyph already
    conveys status and the tracing fmt layer still colours the level.

commit 4fef957edb
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sat May 30 08:40:54 2026 -0400

    feat: Example openbao now can do openbao  setup and better readme

commit af3205d353
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sat May 30 05:55:49 2026 -0400

    refactor(harmony_cli): defer topology prep so --list/declined runs are no-ops

    `Maestro::initialize` (hence `topology.ensure_ready()`) ran before `init`'s
    `--list` / confirmation short-circuits, so merely listing a binary's scores —
    or declining to run them — still prepared the topology (cert-manager install,
    etc.). Build the maestro unprepared and call `prepare_topology()` only once we
    commit to interpreting. Expose `Maestro::prepare_topology`; add tests proving
    `--list` skips prep while the run path triggers it.

commit 199e285e52
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sat May 30 05:04:34 2026 -0400

    feat: Use tracing instead of logger in harmon_cli and  work on fleet_staging_install refactor to use harmony_cli properly, still some more work to do

commit fac83d853d
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Fri May 29 22:39:39 2026 -0400

    refactor(fleet-staging): use tracing instead of println for output

    Swap env_logger for tracing_subscriber (its fmt bridges the framework's
    log:: deploy-progress output) and route the install banner + step logs
    through tracing::info! — no raw println.

commit 0400e9d454
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Fri May 29 20:25:22 2026 -0400

    feat(fleet-staging): add OpenBao + seed FleetDeploySecrets; route operator creds through the deploy crate

    fleet_staging_install now deploys OpenBao (co-located in fleet-staging,
    cert-manager TLS at secrets-stg.<base>), configures it (fleet-deployer
    read policy), and seeds the operator's FleetDeploySecrets so the operator
    can be upgraded alone via 'harmony-fleet-deploy --from-tag'. Behavior of
    the existing bring-up is unchanged.

    Credential-TOML construction moved out of the example into
    OperatorCredentials::zitadel_jwt (deploy crate) so all callers share it.
    New openbao::cached_root_token() lets the seed reuse the root token setup
    already cached. Seeding mirrors the harmony_sso port-forward pattern.

2026-06-01 11:51:11 -04:00

harmony-fleet-agent

feat: Fleet E2E tests harness improving a lot, firing up a VM and testing agent behavior

2026-05-20 21:42:47 -04:00

harmony-fleet-assets

update html and add asset

2026-05-25 08:36:53 -04:00

harmony-fleet-auth

feat: refactor fleet agent config into a strongly typed struct, remove brittle string processing

2026-05-20 13:41:40 -04:00

harmony-fleet-deploy

feat(fleet): unify deploy config, switch CLI to tracing, fix OCI chart name collision

2026-06-01 11:51:11 -04:00

harmony-fleet-e2e

refactor(fleet): drop deploy-crate dev creds, HARMONY_* env vars, lean docs

2026-05-22 17:54:48 -04:00

harmony-fleet-operator

chore: fix fmt

2026-05-22 17:04:49 -04:00

scripts

fix(docker,build): tighten .dockerignore + multi-stage callout image

2026-05-05 12:34:19 -04:00

ARCHITECTURE.html

update html and add asset

2026-05-25 08:36:53 -04:00

deployment-process.md

feat(fleet): unify deploy config, switch CLI to tracing, fix OCI chart name collision

2026-06-01 11:51:11 -04:00

PLAN_requests_over_nats.md

feat(fleet): request/reply commands over NATS — wire types, agent server, operator client, e2e harness

2026-05-18 09:47:36 -04:00

README.md

refactor(fleet): deploy binary is operator-only — load config, run one Score

2026-05-29 15:03:38 -04:00

requests_over_nats.md

feat(fleet): request/reply commands over NATS — wire types, agent server, operator client, e2e harness

2026-05-18 09:47:36 -04:00

README.md

Harmony Fleet

IoT / decentralized-edge orchestration for harmony. A fleet stack is:

Component	Crate	Role
Operator	`harmony-fleet-operator`	Watches `Deployment` CRs, writes desired state into NATS JetStream KV, aggregates device state back into CR status. Runtime binary; no `harmony` dep.
Agent	`harmony-fleet-agent`	One per device. Watches the desired-state KV, drives the local runtime (podman today), publishes heartbeats + per-deployment state, answers `device-commands.*` request/reply.
Auth	`harmony-fleet-auth`	Shared NATS credential plumbing — `TomlShared` (dev) and `ZitadelJwt` (prod with auth-callout).
Deploy	`harmony-fleet-deploy`	The canonical deploy crate. Imports `harmony` and exposes one `*Score` per component (`FleetOperatorScore`, `FleetAgentScore`, `FleetNatsScore`, `FleetServerScore`). Both the production CLI and the e2e harness compose these — see ADR-023.
E2E harness	`harmony-fleet-e2e`	Brings the stack up in a fresh k3d namespace and runs integration tests against it.

The on-the-wire types both ends agree on (KV bucket names, key formats, command-protocol payloads) live in ../harmony-reconciler-contracts.

Architecture in one line

FleetOperatorScore, FleetAgentScore, etc. are real Rust types with capability-bound Topology parameters. Production deploys, the e2e harness, and any future control-plane tool all compose the same Scores; the only thing that changes is the Topology instance. No handrolled YAML or imperative manifest factories anywhere. Read ADR-023 before adding deploy logic.

Quickstart — run the e2e ping test

The fastest path to a green fleet stack on your laptop. Requires podman, kubectl, and helm on $PATH; everything else (k3d, the NATS chart, all images) is fetched / built on demand.

HARMONY_FLEET_E2E=1 cargo test -p harmony-fleet-e2e --test ping -- --nocapture

What it does, in order:

Ensures a fleet-e2e k3d cluster exists (creates one if not). NodePort 30423 on the host forwards to NATS inside the cluster.
Builds harmony-fleet-agent in release mode, packages it into localhost/harmony-fleet-agent:e2e, and sideloads the image into the k3d cluster's containerd store.
Mints a per-bring-up namespace e2e-<uuid8> and prunes any leftover e2e-* namespaces from prior runs (NodePort 30423 is cluster-scoped, so a stuck Terminating namespace would block the new bring-up — the prune waits up to 90 s for full cleanup before proceeding).
Deploys NATS via FleetNatsScore (helm chart, JetStream on, static admin/device users, NodePort Service).
Waits for NATS to be reachable from the host on nats://localhost:30423 (admin/e2e-admin).
Deploys one FleetAgentScore { target: Pod } — runs with runtime_enabled = false so it skips podman and only runs the command-server + heartbeat loop.
Waits for the agent Deployment to be Ready.
The test publishes device-commands.<device_id>.ping via FleetCommandsClient::ping and asserts the agent replies with { device_id, agent_version, uptime_s }.

Cold first run: ~80 s (release build of the agent dominates). Warm: ~25 s.

Useful env knobs

Var	Effect
`HARMONY_FLEET_E2E=1`	Required. Without it the test is skipped — keeps `cargo test --workspace` cheap on machines without k3d.
`FLEET_E2E_KEEP=1`	Skip namespace teardown on Drop. Lets you `kubectl -n e2e-<…> logs deploy/…` after a failure. The next run prunes it.
`RUST_LOG=info`	Or `debug` for the per-message `command dispatch` traces inside `harmony-fleet-agent::command_server`.

Connecting to NATS while the stack is up

# Host-side, via the NodePort
nats://localhost:30423           # user=admin pass=e2e-admin (full access)
nats://localhost:30423           # user=device pass=e2e-device (device permissions)

# In-cluster, from any Pod in the same namespace
nats://fleet-nats.e2e-<uuid8>.svc.cluster.local:4222

FLEET_E2E_KEEP=1 + the harness's stdout line [e2e] NATS: nats://127.0.0.1:30423 … is the path most tests will take — leave the harness running, point a NATS client at that URL.

Inspecting the agent

# Find your namespace
kubectl get ns -l harmony.io/managed-by=fleet-e2e

# Tail the agent
kubectl -n e2e-<uuid8> logs deploy/fleet-agent-<device-id> -f

# Tail NATS (StatefulSet, not Deployment)
kubectl -n e2e-<uuid8> logs sts/fleet-nats -c nats -f

# Send a ping by hand (requires the `nats` CLI:
#   https://github.com/nats-io/natscli/releases)
nats --server nats://localhost:30423 --user admin --password e2e-admin \
     request "device-commands.vm-device-00-<uuid8>.ping" ""

Or if you don't want to install the nats binary :

alias natsbox='podman run --network=host --rm docker.io/natsio/nats-box:latest nats --server nats://localhost:30423 --user admin --password e2e-admin'

You should see something like {"device_id":"vm-device-00-<uuid8>","agent_version":"0.1.0","uptime_s":12}.

Cleaning up

The shared OnceCell in harmony-fleet-e2e lives for the test binary's lifetime, so namespaces survive a cargo test exit (the static is never explicitly dropped). The next cargo test invocation prunes them. To force a manual cleanup:

kubectl delete ns -l harmony.io/managed-by=fleet-e2e
# wipe the whole cluster:
k3d cluster delete fleet-e2e

Production deploys

harmony-fleet-deploy puts the published operator chart on a real cluster (OKD, vanilla k8s, anywhere K8sAnywhereTopology can reach) — the harmony apply / CD path. It loads FleetDeploySecrets from config (Env → OpenBao) and runs one FleetOperatorScore; auth is Zitadel-SSO-only. The full bring-up stack (FleetNatsScore + FleetAgentScore + …) is composed by the e2e harness directly over the same lib Scores, not by this binary.

# Deploy a released tag (version parsed from it in Rust):
cargo run -p harmony-fleet-deploy -- \
  --filter FleetOperatorScore \
  --from-tag harmony-fleet-operator-v0.0.2 \
  --namespace fleet-system --yes

See deployment-process.md for the clickable CD workflow and the in-cluster runner bootstrap.

Connecting to the operator

The operator runs as a single-replica Deployment in --namespace (default fleet-system).

# Tail logs
kubectl -n fleet-system logs deploy/harmony-fleet-operator -f

# Port-forward the embedded web dashboard (web-frontend feature)
kubectl -n fleet-system port-forward deploy/harmony-fleet-operator 18080:18080

# Or run the dashboard standalone with seeded fake data — no NATS, no cluster
cargo run -p harmony-fleet-operator --features web-frontend -- serve-web --mock
# browse http://127.0.0.1:18080

Existing manual rehearsal — `examples/fleet_e2e_demo`

examples/fleet_e2e_demo brings up a fuller stack than the e2e harness — real Zitadel, the auth-callout, libvirt VM agents over SSH — at the cost of a 5-min cold start. It's the manual rehearsal flow; not what you want during the dev loop. See the example's RUNBOOK.md.

The harness and the rehearsal will converge: the follow-up PR lifts FleetCalloutScore + a mock-OIDC fixture into harmony-fleet-deploy, at which point the harness can run the full production auth path in ~30 s instead of 5 min, and fleet_e2e_demo thins down to a caller over the same Scores.

What's next

This branch lands the deploy-architecture cleanup (ADR-023), the per-component Scores, and the ping path. Slated immediately after:

Zitadel + auth callout in harmony-fleet-deploy. New FleetCalloutScore (preset over NatsAuthCalloutScore) plus an in-cluster mock-OIDC fixture so the e2e harness can exercise the real auth-callout code path without paying Zitadel's 5-min cold-start cost. The harness's AuthMode::Callout variant is already on the public API for this.
Operator pod in the e2e harness. FleetOperatorScore is already in the deploy crate; wiring it into the harness gives integration tests against the actual Deployment / Device reconcile loops.
Verb::Logs and Verb::Exec — the next two verbs on the device-commands.* protocol. Same harness, same TDD shape as ping.
CRD types out of harmony core. harmony::modules::fleet::operator::crd is the last fleet-deploy thing still living in harmony. The ReconcileScore payload coupling is the only blocker.
Smoke-test contract. ADR-023 principle 4 — every Score blocks on a smoke test before deploy returns success. Today the e2e suite plays that role; the trait/companion shape lands once it's been validated in practice.

See PLAN_requests_over_nats.md for the full TDD-style plan this branch implements.

README.md

Harmony Fleet

Architecture in one line

Quickstart — run the e2e ping test

Useful env knobs

Connecting to NATS while the stack is up

Inspecting the agent

Cleaning up

Production deploys

Connecting to the operator

Existing manual rehearsal — examples/fleet_e2e_demo

What's next

Existing manual rehearsal — `examples/fleet_e2e_demo`