Files
harmony/harmony-reconciler-contracts/src/status.rs
Jean-Gabriel Gill-Couture 8a6a9f1a03 refactor(iot): Deployment.targetSelector + Device CRD (DaemonSet-like)
Kills the "CRD owns a list of device ids" smell. Deployment CR now
carries a standard K8s LabelSelector; Device is a first-class cluster-
scoped CR (like Node). Matching, desired-state KV writes, and status
aggregation all run off selector evaluation against the Device cache
— no list of device ids anywhere in the CRD spec.

Cross-resource model:
- Agent publishes DeviceInfo (with labels) to NATS `device-info` KV.
- device_reconciler watches that bucket → server-side-applies a
  cluster-scoped Device CR with metadata.labels + spec.inventory.
- Deployment controller is now just validation + finalizer cleanup.
- fleet_aggregator watches Deployment CRs + Device CRs + device-state
  KV, maintains in-memory selector → target device sets, writes/deletes
  `desired-state.<device>.<deployment>` KV on match changes, patches
  `.status.aggregate` at 1 Hz with matchedDeviceCount + phase counters.

Applied CRD shape verified on a live k3d cluster:
  kubectl get crd deployments.iot.nationtech.io -o json
    .spec.versions[0].schema.openAPIV3Schema.properties.spec
      → rollout / score / targetSelector (matchLabels + matchExpressions)
    .spec.versions[0].schema.openAPIV3Schema.properties.status.aggregate
      → matchedDeviceCount / succeeded / failed / pending / lastError
  kubectl get crd devices.iot.nationtech.io -o json
    .spec.scope = "Cluster"
    .spec.versions[0].schema.openAPIV3Schema.properties.spec
      → inventory (nullable, camelCased fields)

Load-test run: DEVICES=20 GROUP_SIZES=10,5,5 DURATION=20
  all 3 CRs hit expected matched=N / succeeded+failed+pending=N.

Other changes:
- k8s-openapi gets the `schemars` feature so LabelSelector derives JsonSchema.
- InventorySnapshot uses `#[serde(rename_all = "camelCase")]` for consistency with the rest of the CRD schema.
- agent publishes `device-id=<id>` as a default label so the
  example_iot_apply_deployment `--target-device <id>` shorthand
  works out-of-the-box (implemented as `--selector device-id=<id>`).
- example_iot_apply_deployment gains `--selector key=value` repeatable flag.
- load-test.sh explore banner exposes Device CR commands + new
  matchedDeviceCount column.
2026-04-22 22:55:38 -04:00

38 lines
1.3 KiB
Rust

//! Shared status primitives reused across the fleet wire format.
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
/// Coarse state of a single reconcile on one device.
///
/// Deliberately coarse — richer granularity (ImagePulling,
/// ContainerCreating, …) is agent-internal; the operator's
/// aggregation only needs success/failure/pending counts.
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq, JsonSchema)]
pub enum Phase {
/// Agent has applied the Score and the container is up.
Running,
/// Reconcile hit an error. See paired `last_error` for the message.
Failed,
/// Reconcile is in flight or waiting on an external dependency
/// (image pull, network, etc.). Agents may also report this
/// between a CR apply and the first reconcile tick.
Pending,
}
/// Static-ish facts about the device. Embedded in
/// [`crate::DeviceInfo`]; republished on change.
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq, JsonSchema)]
#[serde(rename_all = "camelCase")]
pub struct InventorySnapshot {
pub hostname: String,
pub arch: String,
pub os: String,
pub kernel: String,
pub cpu_cores: u32,
pub memory_mb: u64,
/// Agent semver (e.g. `"0.1.0"`). Lets the operator flag
/// agents that are behind the current release.
pub agent_version: String,
}