Files
harmony/Cargo.toml
Jean-Gabriel Gill-Couture 76ecf6da42
All checks were successful
Run Check Script / check (pull_request) Successful in 2m35s
feat(fleet): agent self-upgrade + auto-rollback protocol, ADR-022 (Ch4)
Full ADR-022 protocol end to end. The state-machine brain and the operator's
commit decision are exhaustively unit-tested; OS side-effects sit behind a seam
so they're faked in tests and real on-device.

Contracts (harmony-reconciler-contracts):
- agent-upgrade marker + status KV buckets, AgentUpgradePhase, agent_version on
  the heartbeat, Verb::UpgradeStop on the command protocol.

Shared (new crate harmony_downloadable_asset):
- download + SHA-256 verify, lifted from k3d's pub(crate) copy; k3d now depends
  on it (DRY — second consumer is the agent). Tested with httptest.

Agent (harmony-fleet-agent):
- `drive`: Staging -> Verifying -> CutoverReady -> wait-for-operator-stop, with
  heartbeat-timeout revert. 6 unit tests incl. every failure/rollback path.
- UpgradeExecutor seam + real SystemdUpgradeExecutor: download+verify,
  `--self-test`, atomic symlink swap, systemd-run transient unit, revert. The
  executor self-heals the on-disk layout so first-upgrade rollback is safe even
  before M1 (preserves the running binary at its versioned path).
- `--self-test` flag; Verb::UpgradeStop handling gated by an armed
  UpgradeStopSignal so only the cutover-waiting old agent acts (both agents are
  subscribed). The agent never self-stops.

Operator (harmony-fleet-operator):
- upgrade_coordinator: sends the stop ONLY after independently observing the new
  version's heartbeat (single source of truth); reflects currentVersion + the
  upgrade phase onto the Device CR. 2 unit tests on the commit decision.
- FleetCommandsClient::upgrade_stop; Device.status.{currentVersion, upgrade}.

Deviations + flagged follow-ups (M1 clean install, libvirt vX->vX+1 e2e) in
ROADMAP/fleet_platform/ch4-agent-upgrade-status.md. Marker/status ride NATS KV
(survive operator restart, per Ch2).
2026-06-05 15:26:38 -04:00

123 lines
3.1 KiB
TOML

[workspace]
resolver = "2"
members = [
"examples/*",
"private_repos/*",
"harmony",
"harmony_zitadel_auth",
"harmony_downloadable_asset",
"harmony_types",
"harmony_macros",
"harmony_tui",
"harmony_execution",
"opnsense-config",
"opnsense-config-xml",
"harmony_cli",
"k3d",
"harmony_composer",
"harmony_inventory_agent",
"harmony_secret_derive",
"harmony_secret",
"network_stress_test",
"examples/kvm_okd_ha_cluster",
"examples/example_linux_vm",
"harmony_i18n",
"harmony_config_derive",
"harmony_config",
"brocade",
"harmony_agent",
"harmony_agent/deploy",
"harmony_node_readiness",
"harmony-k8s",
"harmony_assets", "opnsense-codegen", "opnsense-api",
"fleet/harmony-fleet-operator",
"fleet/harmony-fleet-agent",
"fleet/harmony-fleet-auth",
"fleet/harmony-fleet-deploy",
"fleet/harmony-fleet-e2e",
"harmony-reconciler-contracts",
"examples/fleet_server_install",
"examples/fleet_staging_install",
"nats/jwt",
"nats/callout",
"nats/integration-test-callout",
]
[workspace.package]
version = "0.1.0"
readme = "README.md"
license = "GNU AGPL v3"
[workspace.dependencies]
log = { version = "0.4", features = ["kv"] }
env_logger = "0.11"
derive-new = "0.7"
async-trait = "0.1"
tokio = { version = "1.40", features = [
"io-std",
"io-util",
"fs",
"macros",
"net",
"rt-multi-thread",
] }
tokio-retry = "0.3.0"
tokio-util = "0.7.15"
cidr = { features = ["serde"], version = "0.2" }
russh = "0.45"
russh-keys = "0.45"
rand = "0.9"
url = "2.5"
kube = { version = "1.1.0", features = [
"config",
"client",
"runtime",
"rustls-tls",
"ws",
"jsonpatch",
] }
k8s-openapi = { version = "0.25", features = ["v1_30", "schemars"] }
# TODO replace with https://github.com/bourumir-wyngs/serde-saphyr as serde_yaml is deprecated https://github.com/sebastienrousseau/serde_yml
serde_yaml = "0.9"
serde-value = "0.7"
http = "1.2"
inquire = "0.7"
convert_case = "0.8"
chrono = "0.4"
similar = "2"
uuid = { version = "1.11", features = ["v4", "fast-rng", "macro-diagnostics"] }
pretty_assertions = "1.4.1"
tempfile = "3.20.0"
bollard = "0.19.1"
base64 = "0.22.1"
tar = "0.4.44"
lazy_static = "1.5.0"
directories = "6.0.0"
futures-util = "0.3"
thiserror = "2.0.14"
serde = { version = "1.0.209", features = ["derive", "rc"] }
serde_json = "1.0.127"
askama = "0.14"
sqlx = { version = "0.8", features = ["runtime-tokio", "sqlite"] }
reqwest = { version = "0.12", features = [
"blocking",
"stream",
"rustls-tls",
"http2",
"json",
], default-features = false }
assertor = "0.0.4"
tokio-test = "0.4"
anyhow = "1.0"
clap = { version = "4", features = ["derive", "env"] }
# `websockets` enables `ws://` / `wss://` URL schemes. Without it the
# connector parses the URL but treats it as a raw TCP connect (no TLS,
# no HTTP Upgrade), so the agent against the OKD edge-TLS Route hangs
# 30s on `expected INFO, got nothing` because the router only speaks
# TLS+HTTPS on 443. The operator works without this feature because
# it talks to NATS in-cluster on `nats://...:4222` (raw TCP).
async-nats = { version = "0.45.0", features = ["websockets"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
toml = "0.8"