harmony

NationTech/harmony

Fork 2

Files

History

Jean-Gabriel Gill-Couture 56602b505c

Run Check Script / check (pull_request) Successful in 2m35s

Details

feat(fleet-operator): aggregator recovery signal + orphan GC + recovery e2e (Ch2)

Operator restart + aggregator recovery (v0.3 plan Ch2). The aggregator already
cold-rebuilds from NATS KV + CR watches; this makes recovery observable, closes
an orphan gap, and pins each failure shape with a regression test.

- OperatorLiveness: a shared in-process latch (Recovering → Converged) the
  aggregator sets once all three cold-start sources replay (Deployment/Device
  watcher InitDone, device-state KV seen_current; empty-bucket short-circuit).
  The in-process dashboard reads it and shows a self-clearing banner via an
  HTMX self-poll (/__recovery), so the customer sees progress, not a blank.
- gc_orphaned_desired_state: at convergence, purge desired-state whose
  Deployment CR no longer exists (force-deleted while the operator was down,
  finalizer bypassed). Belt-and-suspenders with the controller finalizer.
- run() now owns its watchers in a JoinSet, so cancelling the aggregator
  aborts its children — no orphan tasks outliving a restart (matters for the
  restart-simulation tests and clean process teardown). Also made run() Send
  (hoisted a .await out of a tracing macro) so it can be spawned.
- docs/fleet-operator-recovery-scenarios.md enumerates the failure shapes and
  maps each to its test.
- harmony-fleet-e2e/tests/operator_recovery.rs: regression test per scenario
  (cold restart converges from KV; orphan GC; two operators write identical
  bytes; chaos kill under write load converges <30s) + AdminKv::put_device_state.

Writes stay idempotent + byte-deterministic, so two operators racing agree
without leader election (operator HA = D3, deferred).

2026-06-05 15:26:00 -04:00

operator_recovery.rs

feat(fleet-operator): aggregator recovery signal + orphan GC + recovery e2e (Ch2)

2026-06-05 15:26:00 -04:00

operator.rs

add test for operator and update read me

2026-05-22 14:57:45 -04:00

ping.rs

refactor(fleet): deploy-architecture cleanup per ADR-023 — Scores everywhere, deploy crate, principles in CLAUDE.md

2026-05-18 22:54:50 -04:00

vm_deploy_lifecycle.rs

feat: fleet e2e x86 vm support