Newtypes (review point #3) were the entry. Introducing them forced the event-payload redesign, and the redesign made the other two bugs obvious + trivial to fix. New contract types (harmony-reconciler-contracts::fleet): - DeploymentName: validated newtype. Rejects empty, > 253 bytes, '.' (alias an extra NATS subject token), NATS wildcards, and whitespace. Serde impl validates on deserialize so a malformed payload is rejected at the wire, not later. - AgentEpoch(u64): random-per-process. Prefixes every sequence. - Revision { agent_epoch, sequence } with lexicographic Ord. - LifecycleTransition enum: Applied { from, to, last_error } | Removed { from }. Replaces (from: Option<Phase>, to: Phase) so deletion is modeled explicitly in the wire format. Bug fixes that fell out of the redesign: #1 (drop_phase was silent on the wire): `drop_phase` now produces a RecordedTransition with Removed { from }, which the publisher serializes into a StateChangeEvent. Operator applies the Removed variant by decrementing `from` without a paired increment. Counters no longer over-count after deletions. #2 (sequence reset on agent restart): (agent_epoch, sequence) lexicographic ordering means the first post-restart event (seq=1 under a fresh epoch) outranks any pre-restart event the operator had applied. No more silently-dropped events after an agent crash. Split recommended in review point #4: - `record_apply` / `record_remove`: pure in-memory state updates returning Option<RecordedTransition>. - `publish_transition`: side-effectful wire emission. - `apply_phase` / `drop_phase`: thin composite helpers the hot path uses. Typed keys in the operator: - DevicePair { device_id, deployment: DeploymentName } replaces (String, String) so the two identifiers can't be swapped. - FleetState.deployment_namespace is keyed by DeploymentName. - Controller's kv_key signature takes &DeploymentName; invalid CR names surface as a clear Error rather than corrupting KV. Tests: - 27 contract tests (roundtrip every payload shape, including forward-compat parsing; validate DeploymentName rejection paths; assert Revision ordering across epochs). - 19 operator fleet_aggregator tests, including regression guards named for the specific bugs: removed_transition_decrements_without_paired_increment (#1) revision_ordering_handles_agent_restart (#2) - 8 agent reconciler tests (record_apply/record_remove purity, sequence monotonicity, agent_epoch stamping, ring buffer cap). Agent main wires a fresh AgentEpoch via rand::random::<u64>() at startup; FleetPublisher::connect takes it and includes it in every DeviceInfo + state-change event.
22 lines
869 B
TOML
22 lines
869 B
TOML
[package]
|
|
name = "harmony-reconciler-contracts"
|
|
version = "0.1.0"
|
|
edition = "2024"
|
|
license.workspace = true
|
|
|
|
# Cross-boundary types shared between a harmony operator (central,
|
|
# writing desired state to NATS JetStream KV) and a harmony agent
|
|
# (on-host, watching KV and reconciling). Deliberately lean: pure
|
|
# serde data types, bucket/key constants, small helpers. No tokio,
|
|
# no async-nats, no harmony. The on-device agent build pulls this
|
|
# in alongside a minimal async-nats client; the operator pulls it
|
|
# alongside kube-rs; harmony itself treats it as just another
|
|
# module. None of those consumers should pay for the others' deps.
|
|
|
|
[dependencies]
|
|
chrono = { workspace = true, features = ["serde"] }
|
|
harmony_types = { path = "../harmony_types" }
|
|
serde = { workspace = true, features = ["derive"] }
|
|
serde_json = { workspace = true }
|
|
thiserror = { workspace = true }
|