Kills the "CRD owns a list of device ids" smell. Deployment CR now
carries a standard K8s LabelSelector; Device is a first-class cluster-
scoped CR (like Node). Matching, desired-state KV writes, and status
aggregation all run off selector evaluation against the Device cache
— no list of device ids anywhere in the CRD spec.
Cross-resource model:
- Agent publishes DeviceInfo (with labels) to NATS `device-info` KV.
- device_reconciler watches that bucket → server-side-applies a
cluster-scoped Device CR with metadata.labels + spec.inventory.
- Deployment controller is now just validation + finalizer cleanup.
- fleet_aggregator watches Deployment CRs + Device CRs + device-state
KV, maintains in-memory selector → target device sets, writes/deletes
`desired-state.<device>.<deployment>` KV on match changes, patches
`.status.aggregate` at 1 Hz with matchedDeviceCount + phase counters.
Applied CRD shape verified on a live k3d cluster:
kubectl get crd deployments.iot.nationtech.io -o json
.spec.versions[0].schema.openAPIV3Schema.properties.spec
→ rollout / score / targetSelector (matchLabels + matchExpressions)
.spec.versions[0].schema.openAPIV3Schema.properties.status.aggregate
→ matchedDeviceCount / succeeded / failed / pending / lastError
kubectl get crd devices.iot.nationtech.io -o json
.spec.scope = "Cluster"
.spec.versions[0].schema.openAPIV3Schema.properties.spec
→ inventory (nullable, camelCased fields)
Load-test run: DEVICES=20 GROUP_SIZES=10,5,5 DURATION=20
all 3 CRs hit expected matched=N / succeeded+failed+pending=N.
Other changes:
- k8s-openapi gets the `schemars` feature so LabelSelector derives JsonSchema.
- InventorySnapshot uses `#[serde(rename_all = "camelCase")]` for consistency with the rest of the CRD schema.
- agent publishes `device-id=<id>` as a default label so the
example_iot_apply_deployment `--target-device <id>` shorthand
works out-of-the-box (implemented as `--selector device-id=<id>`).
- example_iot_apply_deployment gains `--selector key=value` repeatable flag.
- load-test.sh explore banner exposes Device CR commands + new
matchedDeviceCount column.
38 lines
1.3 KiB
Rust
38 lines
1.3 KiB
Rust
//! Shared status primitives reused across the fleet wire format.
|
|
|
|
use schemars::JsonSchema;
|
|
use serde::{Deserialize, Serialize};
|
|
|
|
/// Coarse state of a single reconcile on one device.
|
|
///
|
|
/// Deliberately coarse — richer granularity (ImagePulling,
|
|
/// ContainerCreating, …) is agent-internal; the operator's
|
|
/// aggregation only needs success/failure/pending counts.
|
|
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq, JsonSchema)]
|
|
pub enum Phase {
|
|
/// Agent has applied the Score and the container is up.
|
|
Running,
|
|
/// Reconcile hit an error. See paired `last_error` for the message.
|
|
Failed,
|
|
/// Reconcile is in flight or waiting on an external dependency
|
|
/// (image pull, network, etc.). Agents may also report this
|
|
/// between a CR apply and the first reconcile tick.
|
|
Pending,
|
|
}
|
|
|
|
/// Static-ish facts about the device. Embedded in
|
|
/// [`crate::DeviceInfo`]; republished on change.
|
|
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq, JsonSchema)]
|
|
#[serde(rename_all = "camelCase")]
|
|
pub struct InventorySnapshot {
|
|
pub hostname: String,
|
|
pub arch: String,
|
|
pub os: String,
|
|
pub kernel: String,
|
|
pub cpu_cores: u32,
|
|
pub memory_mb: u64,
|
|
/// Agent semver (e.g. `"0.1.0"`). Lets the operator flag
|
|
/// agents that are behind the current release.
|
|
pub agent_version: String,
|
|
}
|