Some checks failed
Run Check Script / check (pull_request) Failing after 10s
66 lines
3.5 KiB
Markdown
66 lines
3.5 KiB
Markdown
# Architecture Decision Record: Network Bonding Configuration via External Automation
|
|
|
|
Initial Author: Jean-Gabriel Gill-Couture & Sylvain Tremblay
|
|
|
|
Initial Date: 2026-02-13
|
|
|
|
Last Updated Date: 2026-02-13
|
|
|
|
## Status
|
|
|
|
Accepted
|
|
|
|
## Context
|
|
|
|
We need to configure LACP bonds on 10GbE interfaces across all worker nodes in the OpenShift cluster. A significant challenge is that interface names (e.g., `enp1s0f0` vs `ens1f0`) vary across different hardware nodes.
|
|
|
|
The standard OpenShift mechanism (MachineConfig) applies identical configurations to all nodes in a MachineConfigPool. Since the interface names differ, a single static MachineConfig cannot target specific physical devices across the entire cluster without complex workarounds.
|
|
|
|
## Decision
|
|
|
|
We will use the existing "Harmony" automation tool to generate and apply host-specific NetworkManager configuration files directly to the nodes.
|
|
|
|
1. Harmony will generate the specific `.nmconnection` files for the bond and slaves based on its inventory of interface names.
|
|
2. Files will be pushed to `/etc/NetworkManager/system-connections/` on each node.
|
|
3. Configuration will be applied via `nmcli` reload or a node reboot.
|
|
|
|
## Rationale
|
|
|
|
* **Inventory Awareness:** Harmony already possesses the specific interface mapping data for each host.
|
|
* **Persistence:** Fedora CoreOS/SCOS allows writing to `/etc`, and these files persist across reboots and OS upgrades (rpm-ostree updates).
|
|
* **Avoids Complexity:** This approach avoids the operational overhead of creating unique MachineConfigPools for every single host or hardware variant.
|
|
* **Safety:** Unlike wildcard matching, this ensures explicit interface selection, preventing accidental bonding of reserved interfaces (e.g., future separation of Ceph storage traffic).
|
|
|
|
## Consequences
|
|
|
|
**Pros:**
|
|
* Precise, per-host configuration without polluting the Kubernetes API with hundreds of MachineConfigs.
|
|
* Standard Linux networking behavior; easy to debug locally.
|
|
* Prevents accidental interface capture (unlike wildcards).
|
|
|
|
**Cons:**
|
|
* **Loss of Declarative K8s State:** The network config is not managed by the Machine Config Operator (MCO).
|
|
* **Node Replacement Friction:** Newly provisioned nodes (replacements) will boot with default config. Harmony must be run against new nodes manually or via a hook before they can fully join the cluster workload.
|
|
|
|
## Alternatives considered
|
|
|
|
1. **Wildcard Matching in NetworkManager (e.g., `interface-name=enp*`):**
|
|
* *Pros:* Single MachineConfig for the whole cluster.
|
|
* *Cons:* Rejected because it is too broad. It risks capturing interfaces intended for other purposes (e.g., splitting storage and cluster networks later).
|
|
|
|
2. **"Kitchen Sink" Configuration:**
|
|
* *Pros:* Single file listing every possible interface name as a slave.
|
|
* *Cons:* "Dirty" configuration; results in many inactive connections on every host; brittle if new naming schemes appear.
|
|
|
|
3. **Per-Host MachineConfig:**
|
|
* *Pros:* Fully declarative within OpenShift.
|
|
* *Cons:* Requires a unique `MachineConfigPool` per host, which is an anti-pattern and unmaintainable at scale.
|
|
|
|
4. **On-boot Generation Script:**
|
|
* *Pros:* Dynamic detection.
|
|
* *Cons:* Increases boot complexity; harder to debug if the script fails during startup.
|
|
|
|
## Additional Notes
|
|
|
|
While `/etc` is writable and persistent on CoreOS, this configuration falls outside the "Day 1" Ignition process. Operational runbooks must be updated to ensure Harmony runs on any node replacement events.
|