Files
harmony/adr/019-Network-bond-setup.md

3.5 KiB

Architecture Decision Record: Network Bonding Configuration via External Automation

Initial Author: Jean-Gabriel Gill-Couture & Sylvain Tremblay

Initial Date: 2026-02-13

Last Updated Date: 2026-02-13

Status

Accepted

Context

We need to configure LACP bonds on 10GbE interfaces across all worker nodes in the OpenShift cluster. A significant challenge is that interface names (e.g., enp1s0f0 vs ens1f0) vary across different hardware nodes.

The standard OpenShift mechanism (MachineConfig) applies identical configurations to all nodes in a MachineConfigPool. Since the interface names differ, a single static MachineConfig cannot target specific physical devices across the entire cluster without complex workarounds.

Decision

We will use the existing "Harmony" automation tool to generate and apply host-specific NetworkManager configuration files directly to the nodes.

  1. Harmony will generate the specific .nmconnection files for the bond and slaves based on its inventory of interface names.
  2. Files will be pushed to /etc/NetworkManager/system-connections/ on each node.
  3. Configuration will be applied via nmcli reload or a node reboot.

Rationale

  • Inventory Awareness: Harmony already possesses the specific interface mapping data for each host.
  • Persistence: Fedora CoreOS/SCOS allows writing to /etc, and these files persist across reboots and OS upgrades (rpm-ostree updates).
  • Avoids Complexity: This approach avoids the operational overhead of creating unique MachineConfigPools for every single host or hardware variant.
  • Safety: Unlike wildcard matching, this ensures explicit interface selection, preventing accidental bonding of reserved interfaces (e.g., future separation of Ceph storage traffic).

Consequences

Pros:

  • Precise, per-host configuration without polluting the Kubernetes API with hundreds of MachineConfigs.
  • Standard Linux networking behavior; easy to debug locally.
  • Prevents accidental interface capture (unlike wildcards).

Cons:

  • Loss of Declarative K8s State: The network config is not managed by the Machine Config Operator (MCO).
  • Node Replacement Friction: Newly provisioned nodes (replacements) will boot with default config. Harmony must be run against new nodes manually or via a hook before they can fully join the cluster workload.

Alternatives considered

  1. Wildcard Matching in NetworkManager (e.g., interface-name=enp*):

    • Pros: Single MachineConfig for the whole cluster.
    • Cons: Rejected because it is too broad. It risks capturing interfaces intended for other purposes (e.g., splitting storage and cluster networks later).
  2. "Kitchen Sink" Configuration:

    • Pros: Single file listing every possible interface name as a slave.
    • Cons: "Dirty" configuration; results in many inactive connections on every host; brittle if new naming schemes appear.
  3. Per-Host MachineConfig:

    • Pros: Fully declarative within OpenShift.
    • Cons: Requires a unique MachineConfigPool per host, which is an anti-pattern and unmaintainable at scale.
  4. On-boot Generation Script:

    • Pros: Dynamic detection.
    • Cons: Increases boot complexity; harder to debug if the script fails during startup.

Additional Notes

While /etc is writable and persistent on CoreOS, this configuration falls outside the "Day 1" Ignition process. Operational runbooks must be updated to ensure Harmony runs on any node replacement events.