3.6 KiB
Initial Date: 2025-02-06
Status
Proposed
Context
The Harmony Agent requires a persistent connection to the NATS Supercluster to perform Key-Value (KV) operations (Read/Write/Watch).
Service Requirements: The agent must authenticate with sufficient privileges to manage KV buckets and interact with the JetStream API.
Infrastructure: NATS is deployed as a multi-site Supercluster. Authentication must be consistent across sites to allow for agent failover and data replication.
https://docs.nats.io/running-a-nats-service/configuration/securing_nats/auth_intro
Technical Constraint: In NATS, JetStream functionality is not global by default; it must be explicitly enabled and capped at the Account level to allow KV bucket creation and persistence.
Issues
-
The "System Account" Trap
The Hole: Using the system account for the Harmony Agent.
The Risk: The NATS System Account is for server heartbeat and monitoring. It cannot (and should not) own JetStream KV buckets.
-
Multi-Site Authorization Sync
The Hole: Defining users in local nats.conf files via Helm.
The Risk: If an agent at Site-2 fails over to Site-3, but Site-3’s local configuration doesn't have the testUser credentials, the agent will be locked out during an outage.
-
KV Replication Factor
The Hole: Not specifying the Replicas count for the KV bucket.
The Risk: If you create a KV bucket with the default (1 replica), it only exists at the site where it was created. If that site goes down, the data is lost despite having a Supercluster.
-
Subject-Level Permissions
The Hole: Only granting TEST.* permissions.
The Risk: NATS KV uses internal subjects (e.g., $KV.<bucket_name>.>). Without access to these, the agent will get an "Authorization Violation" even if it's logged in.
Proposed Solution
To enable reliable, secure communication between the Harmony Agent and the NATS Supercluster, we will implement Account-isolated JetStream using NKey Authentication (or mTLS).
- Dedicated Account Architecture
We will move away from the "Global/Default" account. A dedicated HARMONY account will be defined identically across all sites in the Supercluster. This ensures that the metadata for the KV bucket can replicate across the gateways.
System Account: Reserved for NATS internal health and Supercluster routing.
Harmony Account: Dedicated to Harmony Agent data, with JetStream explicitly enabled.
-
Authentication: Use harmony secret store mounted into nats container
Take advantage of currently implemented solution
-
JetStream & KV Configuration
To ensure the KV bucket is available across the Supercluster, the following configuration must be applied:
Replication Factor (R=3): KV buckets will be created with a replication factor of 3 to ensure data persists across Site-1, Site-2, and Site-3.
Permissions: The agent will be granted scoped access to:
$KV.HARMONY.> (Data operations)
$JS.API.CONSUMER.> and $JS.API.STREAM.> (Management operations)
Consequence of Decision
Pros
Resilience: Agents can fail over to any site in the Supercluster and find their credentials and data.
Security: By using a dedicated account, the Harmony Agent cannot see or interfere with NATS system traffic.
Scalability: We can add Site-4 or Site-5 simply by copying the HARMONY account definition.
Cons / Risks
Configuration Drift: If one site's ConfigMap is updated without the others, authentication will fail during a site failover.
Complexity: Requires a "Management" step to ensure the account exists on all NATS instances before the agent attempts to connect.