harmony/docs/adr/019-Network-bond-setup.md

# Architecture Decision Record: Network Bonding Configuration via External Automation

Initial Author: Jean-Gabriel Gill-Couture & Sylvain Tremblay

Initial Date: 2026-02-13

Last Updated Date: 2026-02-13

## Status

Accepted

## Context

We need to configure LACP bonds on 10GbE interfaces across all worker nodes in the OpenShift cluster. A significant challenge is that interface names (e.g., `enp1s0f0` vs `ens1f0`) vary across different hardware nodes.

The standard OpenShift mechanism (MachineConfig) applies identical configurations to all nodes in a MachineConfigPool. Since the interface names differ, a single static MachineConfig cannot target specific physical devices across the entire cluster without complex workarounds.

## Decision

We will use the existing "Harmony" automation tool to generate and apply host-specific NetworkManager configuration files directly to the nodes.

1.  Harmony will generate the specific `.nmconnection` files for the bond and slaves based on its inventory of interface names.
2.  Files will be pushed to `/etc/NetworkManager/system-connections/` on each node.
3.  Configuration will be applied via `nmcli` reload or a node reboot.

## Rationale

*   **Inventory Awareness:** Harmony already possesses the specific interface mapping data for each host.
*   **Persistence:** Fedora CoreOS/SCOS allows writing to `/etc`, and these files persist across reboots and OS upgrades (rpm-ostree updates).
*   **Avoids Complexity:** This approach avoids the operational overhead of creating unique MachineConfigPools for every single host or hardware variant.
*   **Safety:** Unlike wildcard matching, this ensures explicit interface selection, preventing accidental bonding of reserved interfaces (e.g., future separation of Ceph storage traffic).

## Consequences

**Pros:**
*   Precise, per-host configuration without polluting the Kubernetes API with hundreds of MachineConfigs.
*   Standard Linux networking behavior; easy to debug locally.
*   Prevents accidental interface capture (unlike wildcards).

**Cons:**
*   **Loss of Declarative K8s State:** The network config is not managed by the Machine Config Operator (MCO).
*   **Node Replacement Friction:** Newly provisioned nodes (replacements) will boot with default config. Harmony must be run against new nodes manually or via a hook before they can fully join the cluster workload.

## Alternatives considered

1.  **Wildcard Matching in NetworkManager (e.g., `interface-name=enp*`):**
    *   *Pros:* Single MachineConfig for the whole cluster.
    *   *Cons:* Rejected because it is too broad. It risks capturing interfaces intended for other purposes (e.g., splitting storage and cluster networks later).

2.  **"Kitchen Sink" Configuration:**
    *   *Pros:* Single file listing every possible interface name as a slave.
    *   *Cons:* "Dirty" configuration; results in many inactive connections on every host; brittle if new naming schemes appear.

3.  **Per-Host MachineConfig:**
    *   *Pros:* Fully declarative within OpenShift.
    *   *Cons:* Requires a unique `MachineConfigPool` per host, which is an anti-pattern and unmaintainable at scale.

4.  **On-boot Generation Script:**
    *   *Pros:* Dynamic detection.
    *   *Cons:* Increases boot complexity; harder to debug if the script fails during startup.

## Additional Notes

While `/etc` is writable and persistent on CoreOS, this configuration falls outside the "Day 1" Ignition process. Operational runbooks must be updated to ensure Harmony runs on any node replacement events.