Compare commits

...

1 Commits

Author SHA1 Message Date
0b9499bc97 doc: adr 019 2026-02-06 15:32:50 -05:00

View File

@@ -0,0 +1,86 @@
Initial Date: 2025-02-06
## Status
Proposed
## Context
The Harmony Agent requires a persistent connection to the NATS Supercluster to perform Key-Value (KV) operations (Read/Write/Watch).
Service Requirements: The agent must authenticate with sufficient privileges to manage KV buckets and interact with the JetStream API.
Infrastructure: NATS is deployed as a multi-site Supercluster. Authentication must be consistent across sites to allow for agent failover and data replication.
https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro
Technical Constraint: In NATS, JetStream functionality is not global by default; it must be explicitly enabled and capped at the Account level to allow KV bucket creation and persistence.
## Issues
1. The "System Account" Trap
The Hole: Using the system account for the Harmony Agent.
The Risk: The NATS System Account is for server heartbeat and monitoring. It cannot (and should not) own JetStream KV buckets.
2. Multi-Site Authorization Sync
The Hole: Defining users in local nats.conf files via Helm.
The Risk: If an agent at Site-2 fails over to Site-3, but Site-3s local configuration doesn't have the testUser credentials, the agent will be locked out during an outage.
3. KV Replication Factor
The Hole: Not specifying the Replicas count for the KV bucket.
The Risk: If you create a KV bucket with the default (1 replica), it only exists at the site where it was created. If that site goes down, the data is lost despite having a Supercluster.
4. Subject-Level Permissions
The Hole: Only granting TEST.* permissions.
The Risk: NATS KV uses internal subjects (e.g., $KV.<bucket_name>.>). Without access to these, the agent will get an "Authorization Violation" even if it's logged in.
## Proposed Solution
To enable reliable, secure communication between the Harmony Agent and the NATS Supercluster, we will implement Account-isolated JetStream using NKey Authentication (or mTLS).
1. Dedicated Account Architecture
We will move away from the "Global/Default" account. A dedicated HARMONY account will be defined identically across all sites in the Supercluster. This ensures that the metadata for the KV bucket can replicate across the gateways.
System Account: Reserved for NATS internal health and Supercluster routing.
Harmony Account: Dedicated to Harmony Agent data, with JetStream explicitly enabled.
2. Authentication: Use harmony secret store mounted into nats container
Take advantage of currently implemented solution
3. JetStream & KV Configuration
To ensure the KV bucket is available across the Supercluster, the following configuration must be applied:
Replication Factor (R=3): KV buckets will be created with a replication factor of 3 to ensure data persists across Site-1, Site-2, and Site-3.
Permissions: The agent will be granted scoped access to:
$KV.HARMONY.> (Data operations)
$JS.API.CONSUMER.> and $JS.API.STREAM.> (Management operations)
## Consequence of Decision
Pros
Resilience: Agents can fail over to any site in the Supercluster and find their credentials and data.
Security: By using a dedicated account, the Harmony Agent cannot see or interfere with NATS system traffic.
Scalability: We can add Site-4 or Site-5 simply by copying the HARMONY account definition.
Cons / Risks
Configuration Drift: If one site's ConfigMap is updated without the others, authentication will fail during a site failover.
Complexity: Requires a "Management" step to ensure the account exists on all NATS instances before the agent attempts to connect.