3.5 KiB
Architecture Decision Record: Network Bonding Configuration via External Automation
Initial Author: Jean-Gabriel Gill-Couture & Sylvain Tremblay
Initial Date: 2026-02-13
Last Updated Date: 2026-02-13
Status
Accepted
Context
We need to configure LACP bonds on 10GbE interfaces across all worker nodes in the OpenShift cluster. A significant challenge is that interface names (e.g., enp1s0f0 vs ens1f0) vary across different hardware nodes.
The standard OpenShift mechanism (MachineConfig) applies identical configurations to all nodes in a MachineConfigPool. Since the interface names differ, a single static MachineConfig cannot target specific physical devices across the entire cluster without complex workarounds.
Decision
We will use the existing "Harmony" automation tool to generate and apply host-specific NetworkManager configuration files directly to the nodes.
- Harmony will generate the specific
.nmconnectionfiles for the bond and slaves based on its inventory of interface names. - Files will be pushed to
/etc/NetworkManager/system-connections/on each node. - Configuration will be applied via
nmclireload or a node reboot.
Rationale
- Inventory Awareness: Harmony already possesses the specific interface mapping data for each host.
- Persistence: Fedora CoreOS/SCOS allows writing to
/etc, and these files persist across reboots and OS upgrades (rpm-ostree updates). - Avoids Complexity: This approach avoids the operational overhead of creating unique MachineConfigPools for every single host or hardware variant.
- Safety: Unlike wildcard matching, this ensures explicit interface selection, preventing accidental bonding of reserved interfaces (e.g., future separation of Ceph storage traffic).
Consequences
Pros:
- Precise, per-host configuration without polluting the Kubernetes API with hundreds of MachineConfigs.
- Standard Linux networking behavior; easy to debug locally.
- Prevents accidental interface capture (unlike wildcards).
Cons:
- Loss of Declarative K8s State: The network config is not managed by the Machine Config Operator (MCO).
- Node Replacement Friction: Newly provisioned nodes (replacements) will boot with default config. Harmony must be run against new nodes manually or via a hook before they can fully join the cluster workload.
Alternatives considered
-
Wildcard Matching in NetworkManager (e.g.,
interface-name=enp*):- Pros: Single MachineConfig for the whole cluster.
- Cons: Rejected because it is too broad. It risks capturing interfaces intended for other purposes (e.g., splitting storage and cluster networks later).
-
"Kitchen Sink" Configuration:
- Pros: Single file listing every possible interface name as a slave.
- Cons: "Dirty" configuration; results in many inactive connections on every host; brittle if new naming schemes appear.
-
Per-Host MachineConfig:
- Pros: Fully declarative within OpenShift.
- Cons: Requires a unique
MachineConfigPoolper host, which is an anti-pattern and unmaintainable at scale.
-
On-boot Generation Script:
- Pros: Dynamic detection.
- Cons: Increases boot complexity; harder to debug if the script fails during startup.
Additional Notes
While /etc is writable and persistent on CoreOS, this configuration falls outside the "Day 1" Ignition process. Operational runbooks must be updated to ensure Harmony runs on any node replacement events.