Files
harmony/harmony-reconciler-contracts/Cargo.toml
Jean-Gabriel Gill-Couture 2f08643aa0 refactor(iot): DeploymentName + Revision newtypes; LifecycleTransition models deletion; fixes bugs #1 and #2 from the review
Newtypes (review point #3) were the entry. Introducing them forced
the event-payload redesign, and the redesign made the other two
bugs obvious + trivial to fix.

New contract types (harmony-reconciler-contracts::fleet):
  - DeploymentName: validated newtype. Rejects empty, > 253 bytes,
    '.' (alias an extra NATS subject token), NATS wildcards, and
    whitespace. Serde impl validates on deserialize so a malformed
    payload is rejected at the wire, not later.
  - AgentEpoch(u64): random-per-process. Prefixes every sequence.
  - Revision { agent_epoch, sequence } with lexicographic Ord.
  - LifecycleTransition enum: Applied { from, to, last_error } |
    Removed { from }. Replaces (from: Option<Phase>, to: Phase) so
    deletion is modeled explicitly in the wire format.

Bug fixes that fell out of the redesign:

  #1 (drop_phase was silent on the wire): `drop_phase` now
     produces a RecordedTransition with Removed { from }, which
     the publisher serializes into a StateChangeEvent. Operator
     applies the Removed variant by decrementing `from` without
     a paired increment. Counters no longer over-count after
     deletions.

  #2 (sequence reset on agent restart): (agent_epoch, sequence)
     lexicographic ordering means the first post-restart event
     (seq=1 under a fresh epoch) outranks any pre-restart event
     the operator had applied. No more silently-dropped events
     after an agent crash.

Split recommended in review point #4:
  - `record_apply` / `record_remove`: pure in-memory state
    updates returning Option<RecordedTransition>.
  - `publish_transition`: side-effectful wire emission.
  - `apply_phase` / `drop_phase`: thin composite helpers the
    hot path uses.

Typed keys in the operator:
  - DevicePair { device_id, deployment: DeploymentName } replaces
    (String, String) so the two identifiers can't be swapped.
  - FleetState.deployment_namespace is keyed by DeploymentName.
  - Controller's kv_key signature takes &DeploymentName; invalid
    CR names surface as a clear Error rather than corrupting KV.

Tests:
  - 27 contract tests (roundtrip every payload shape, including
    forward-compat parsing; validate DeploymentName rejection
    paths; assert Revision ordering across epochs).
  - 19 operator fleet_aggregator tests, including regression
    guards named for the specific bugs:
      removed_transition_decrements_without_paired_increment  (#1)
      revision_ordering_handles_agent_restart                 (#2)
  - 8 agent reconciler tests (record_apply/record_remove purity,
    sequence monotonicity, agent_epoch stamping, ring buffer
    cap).

Agent main wires a fresh AgentEpoch via rand::random::<u64>() at
startup; FleetPublisher::connect takes it and includes it in every
DeviceInfo + state-change event.
2026-04-22 17:42:42 -04:00

22 lines
869 B
TOML

[package]
name = "harmony-reconciler-contracts"
version = "0.1.0"
edition = "2024"
license.workspace = true
# Cross-boundary types shared between a harmony operator (central,
# writing desired state to NATS JetStream KV) and a harmony agent
# (on-host, watching KV and reconciling). Deliberately lean: pure
# serde data types, bucket/key constants, small helpers. No tokio,
# no async-nats, no harmony. The on-device agent build pulls this
# in alongside a minimal async-nats client; the operator pulls it
# alongside kube-rs; harmony itself treats it as just another
# module. None of those consumers should pay for the others' deps.
[dependencies]
chrono = { workspace = true, features = ["serde"] }
harmony_types = { path = "../harmony_types" }
serde = { workspace = true, features = ["derive"] }
serde_json = { workspace = true }
thiserror = { workspace = true }