Files
harmony/ROADMAP/fleet_platform/v0_3_plan.md
Jean-Gabriel Gill-Couture 89e5e104dc
All checks were successful
Run Check Script / check (push) Successful in 2m14s
Compile and package harmony_composer / package_harmony_composer (push) Successful in 8m22s
harmony-fleet-operator — release / release (push) Successful in 3m17s
feat(fleet): unify deploy config, switch CLI to tracing, fix OCI chart name collision
fleet-deploy:
- Rename harmony-fleet-release binary to harmony-fleet-publish
- Route all deploy settings through ConfigClient (env → OpenBao → prompt)
  instead of bespoke flags; seed FleetDeploySecrets via OpenBao
- Rename HARMONY_SECRET_NAMESPACE to HARMONY_CONFIG_NAMESPACE
- Append -chart to the Helm chart artifact name so it no longer collides
  with the Docker image in Harbor (application/vnd.cncf.helm.config.v1+json)

harmony_cli:
- Switch from log to tracing for structured output
- Defer topology prep so --list and declined runs are no-ops
- Drop ANSI colour codes around log emojis
- Init cli logger in fleet deploy binary

openbao:
- Scope unseal-keys cache file per instance
- Example gains setup capability and updated README

roadmap:
- Add unified CLI design document (ROADMAP/13-unified-cli.md)
- Update v0.3 fleet platform plan

Squashed commit of the following:

commit 36d9d9aaec
Merge: 12c8d9cf e7148aa8
Author: johnride <jg@nationtech.io>
Date:   Mon Jun 1 15:42:56 2026 +0000

    Merge pull request 'fix: fleet operator chart name was conflicting with the container name. Append -chart to the chart name' (#317) from fix/fleet-operator-chart-name into chore/rename-release-to-publish

    Reviewed-on: #317

commit e7148aa85f
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Mon Jun 1 11:35:15 2026 -0400

    fix: fleet operator chart name was conflicting with the container name. Append -chart to the chart name

commit 12c8d9cfa0
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Mon Jun 1 11:12:23 2026 -0400

    feat: Init cli logger in fleet deploy

commit edb62668b6
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sun May 31 12:56:36 2026 -0400

    doc: Roadmap entry for cli design and implementation

commit f2ecccb4ab
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sun May 31 12:32:19 2026 -0400

    refactor(fleet-deploy): rename harmony-fleet-release to harmony-fleet-publish

    Deploy/publish wording is more intuitive than deploy/release.

commit 2e9052b217
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sun May 31 10:12:54 2026 -0400

    fix(openbao): remove extra blank line in example

    Pre-existing formatting issue caught by cargo fmt --check.

commit f7299ebe2b
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sun May 31 09:13:39 2026 -0400

    refactor(fleet-deploy): rename HARMONY_SECRET_NAMESPACE to HARMONY_CONFIG_NAMESPACE

    The env var name was a misnomer — ConfigClient resolves both config and
    secrets, not just secrets. The struct field was already config_namespace.
    Legacy SecretManager keeps the old var; this forces migration to
    ConfigClient for new code.

commit d39aa15152
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sun May 31 09:06:20 2026 -0400

    feat: fleet deploy uses configuration from configclient for all settings, update the 0_3 plan

commit 57d056fced
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sat May 30 11:07:03 2026 -0400

    fix(openbao): scope unseal-keys cache file per instance

    The root token + unseal keys were written to a single fixed
    `~/.local/share/harmony/openbao/unseal-keys.json`, so deploying a second
    OpenBao instance (different namespace/release) overwrote the first's keys —
    after which the first could never be unsealed. Key the file by
    namespace+release (`unseal-keys-<ns>-<release>.json`); `cached_root_token`
    now takes the `OpenbaoInstance` to read the right one.

commit 44aa83199a
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sat May 30 11:05:30 2026 -0400

    fix(harmony_cli): drop ANSI colour codes around log emojis

    `console::style(emoji).green()/.yellow()/.red()/.blue()` embedded raw ANSI
    escapes in the message string. `console` force-emits them off its own TTY
    detection, which disagrees with the tracing writer, so they leaked as literal
    `\x1b[..m` garbage around the emoji. Emit plain emojis — the glyph already
    conveys status and the tracing fmt layer still colours the level.

commit 4fef957edb
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sat May 30 08:40:54 2026 -0400

    feat: Example openbao now can do openbao  setup and better readme

commit af3205d353
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sat May 30 05:55:49 2026 -0400

    refactor(harmony_cli): defer topology prep so --list/declined runs are no-ops

    `Maestro::initialize` (hence `topology.ensure_ready()`) ran before `init`'s
    `--list` / confirmation short-circuits, so merely listing a binary's scores —
    or declining to run them — still prepared the topology (cert-manager install,
    etc.). Build the maestro unprepared and call `prepare_topology()` only once we
    commit to interpreting. Expose `Maestro::prepare_topology`; add tests proving
    `--list` skips prep while the run path triggers it.

commit 199e285e52
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Sat May 30 05:04:34 2026 -0400

    feat: Use tracing instead of logger in harmon_cli and  work on fleet_staging_install refactor to use harmony_cli properly, still some more work to do

commit fac83d853d
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Fri May 29 22:39:39 2026 -0400

    refactor(fleet-staging): use tracing instead of println for output

    Swap env_logger for tracing_subscriber (its fmt bridges the framework's
    log:: deploy-progress output) and route the install banner + step logs
    through tracing::info! — no raw println.

commit 0400e9d454
Author: Jean-Gabriel Gill-Couture <jg@nationtech.io>
Date:   Fri May 29 20:25:22 2026 -0400

    feat(fleet-staging): add OpenBao + seed FleetDeploySecrets; route operator creds through the deploy crate

    fleet_staging_install now deploys OpenBao (co-located in fleet-staging,
    cert-manager TLS at secrets-stg.<base>), configures it (fleet-deployer
    read policy), and seeds the operator's FleetDeploySecrets so the operator
    can be upgraded alone via 'harmony-fleet-deploy --from-tag'. Behavior of
    the existing bring-up is unchanged.

    Credential-TOML construction moved out of the example into
    OperatorCredentials::zitadel_jwt (deploy crate) so all callers share it.
    New openbao::cached_root_token() lets the seed reuse the root token setup
    already cached. Seeding mirrors the harmony_sso port-forward pattern.
2026-06-01 11:51:11 -04:00

135 lines
5.4 KiB
Markdown

# Fleet Platform v0.3 — Staging to production-ready
Written 2026-05-31. Picks up after OpenBao + Zitadel + NATS + callout + operator are deployed and functional on staging (2-3 weeks old versions).
## Current state
- [x] OpenBao running at `secrets-stg.cb1.nationtech.io`
- [x] Zitadel running at `sso-stg.cb1.nationtech.io`
- [x] NATS + auth callout deployed in `fleet-staging` namespace
- [x] Operator deployed (older version, 2-3 weeks old)
- [x] Config-driven OpenBao installer (`examples/openbao`)
- [x] `harmony-fleet-deploy` binary reads `FleetDeployConfig` + `FleetDeploySecrets` from OpenBao
## Immediate next steps
### 1. Provision operator credentials in OpenBao
- [ ] Fetch existing creds from the running cluster:
```bash
oc -n fleet-staging get secret harmony-fleet-operator-secrets -o jsonpath='{.data.credentials\.toml}' | base64 -d
```
- [ ] Seed into OpenBao at `secret/data/fleet-staging/FleetDeploySecrets`:
```bash
export VAULT_ADDR=https://secrets-stg.cb1.nationtech.io
export VAULT_TOKEN=<root token>
oc -n fleet-staging get secret harmony-fleet-operator-secrets -o jsonpath='{.data.credentials\.toml}' | base64 -d \
| jq -Rs '{value: ({operator_credentials_toml: .} | tojson)}' \
| bao kv put secret/fleet-staging/FleetDeploySecrets -
```
- [ ] Verify the secret is readable: `bao kv get secret/fleet-staging/FleetDeploySecrets`
### 2. Private repo deploy script
- [ ] Create `.envrc` with minimal env:
```bash
export OPENBAO_URL=https://secrets-stg.cb1.nationtech.io
export HARMONY_CONFIG_NAMESPACE=fleet-staging
# export OPENBAO_TOKEN=<root token for now; SSO later>
```
- [ ] Write deploy invocation (shell script or just `harmony-fleet-deploy` call):
```bash
harmony-fleet-deploy --from-tag harmony-fleet-operator-vX.Y.Z --yes
```
- [ ] Commit `.envrc` + script to private repo (shared with teammates)
### 3. Execute operator upgrade
- [ ] Run the deploy script from the private repo
- [ ] Verify operator pod starts and connects to NATS
- [ ] Verify operator reconciles existing CRs (check logs)
- [ ] Confirm no regression in existing fleet functionality
### 4. Operator UI ingress (trivial)
- [ ] Expose operator UI with TLS ingress on `fleet-stg.<base_domain>`
- [ ] Verify the UI loads and serves the SPA
- [ ] Confirm no auth gate yet (SSO is next)
### 5. SSO login flow
- [ ] Wire operator UI to Zitadel SSO at `sso-stg.<base_domain>`
- [ ] Test login/logout flow end-to-end
- [ ] Verify session persistence across page reloads
- [ ] Confirm RBAC: only authorized Zitadel users can access the UI
### 6. Real data in UI
- [ ] Replace mock device list with live `device-info` KV data
- [ ] Replace mock deployment list with live `Deployment` CR data
- [ ] Wire per-device drilldown to real `DeviceInfo` + last-heartbeat + agent version
- [ ] NATS tail panel: SSE stream of `device-info` and `device-state` updates (plain text)
- [ ] Verify data refreshes without manual reload
## Configuration model
### Environment (minimal, committed in private repo)
```bash
OPENBAO_URL=https://secrets-stg.cb1.nationtech.io
HARMONY_CONFIG_NAMESPACE=fleet-staging
# SSO auth or root token (SSO is the goal)
```
### OpenBao (read via ConfigClient)
- `FleetDeployConfig` (k8s namespaces, NATS URL, chart coords) at `secret/data/fleet-staging/FleetDeployConfig`
- `FleetDeploySecrets` (operator creds) at `secret/data/fleet-staging/FleetDeploySecrets`
## Missing features (post-UI)
### Auth & credentials
- [ ] Per-device OpenBao policies (templated policies, one role per device type)
- [ ] Device identity claim in JWT (Zitadel `client_id` with `device-` prefix)
- [ ] OpenBao JWT auth role granularity (extend `OpenbaoJwtAuth` to list of roles)
- [x] Move k8s namespaces + chart coords into `ConfigClient` config struct (env = only identifier + auth)
### Operator capabilities
- [ ] Agent upgrade path (ADR-022 exists; implementation pending)
- [ ] Device enrollment flow (operator-facing runbook)
- [ ] Revoke device / rotate key operations
- [ ] Fleet-wide rollout strategies (canary, %-based) on top of agent-upgrade primitive
### Observability
- [ ] Operator logs every CR it acquires (verify output reads well)
- [ ] NATS debugging one-liners in hand-off menu
- [ ] Journald log streaming (currently only `.status.aggregate.lastError`)
- [ ] Metrics dashboard (deferred until >100 devices)
### Quality & hardening
- [ ] Agent config-driven labels (`[labels]` in agent toml → DeviceInfo)
- [ ] `matchExpressions` in selectors (currently `matchLabels` only)
- [ ] `Device.status.conditions` populated from heartbeat staleness
- [ ] Operator graceful degradation on bad device_id (log + skip, don't restart-loop)
- [ ] Persist `nats_auth_pass` and issuer NKey via `harmony_secret` (regenerate-every-run footgun)
### Refactors (deferred, non-blocking)
- [ ] Decompose `FleetServerScore` into independent, ConfigClient-glued Scores
- [ ] Move `harmony/modules/fleet/` → `fleet/harmony-fleet/` (ADR-021 pending)
- [ ] Delete `examples/fleet_staging_deploy` (superseded by `fleet_staging_install`)
- [ ] Drop `K8sAnywhereTopology` for ad-hoc Score execution; introduce `K8sBareTopology`
## Principles (carried forward)
- No yaml in framework code paths
- Scores describe desired state; topologies expose capabilities
- Cross-boundary wire types in `harmony-reconciler-contracts`
- Never ship untested code
- Prove claims about upstream before blaming upstream
- Design the brick before moving the brick