wip feat/add-new-node #268

Open

stremblay wants to merge 27 commits from feat/add-new-node into master

Author	SHA1	Message	Date
Sylvain Tremblay	feadb088c8	fix(inventory): order physical_hosts by rowid, not version_id All checks were successful Run Check Script / check (pull_request) Successful in 2m15s Details `physical_hosts.version_id` is `Id::default().to_string()`, which formats `{masked_hex_timestamp}_{rand7}`. Two problems for using that as a "latest row" key: 1. `format!("{:x}", v)` is variable-length. When the masked 24-bit clock crosses a hex-width boundary (every few days at most, ±18h or 12 days from cycle start), lexicographic comparison breaks: `'_'` (0x5F) > any digit, so "1234_…" sorts HIGHER than "123456_…" even though the latter was saved later. 2. The timestamp is masked to 24 bits, wrapping every ~194 days. Net effect: `MAX(version_id)` in get_all_hosts / get_latest_by_id / save's dedupe lookup can return an OLD row instead of the most recently inserted one — exactly the "stale IP in the Select" the user reported after re-adding a previously-deleted node. Fix: order by SQLite's implicit `rowid` instead — strictly monotonic INSERT counter, always reflects real save order. physical_hosts is not WITHOUT ROWID, so rowid is always there. The version_id column stays on every row (useful in the UI / audit logs) but no longer drives ranking. Id::default() is untouched; it's used throughout the codebase and fixing the format is a separate, wider change. sqlx offline cache regenerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 15:51:11 -04:00
Sylvain Tremblay	4e79f26181	fix(add-node): tolerate transient API errors + sleep before first poll Two fixes to the CSR/Ready loop, both learned the hard way: 1. Transient API errors no longer abort the score. During a CP reboot the apiserver connection you're talking to can drop (HAProxy flipped backends, etcd flapped quorum, whatever), and kube-rs surfaces it as e.g. `ServiceError: client error (SendRequest)`. Previously that propagated out of the loop with `?`, the score failed, Harmony exited — while the node was still mid-ignition, leaving a half-joined cluster no one was watching the CSRs for. Now get_resource, list_pending_kubelet_csrs, and approve_csr all warn + fall through to the next tick. Only the 15-minute deadline can stop us. 2. Sleep first, then poll. The reboot was just dispatched; the node is mid-reboot and can't possibly be Ready, and no CSRs exist yet. Polling immediately wastes an API round-trip every run. Swap the loop so the sleep(30s) happens at the top, giving the node time to boot before we ask about it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 14:50:58 -04:00
Sylvain Tremblay	db6cdca020	fix(add-node): prompt on SSH reboot failure instead of continuing blindly Previously, if the SSH reboot command errored (e.g. node already vanished, network hiccup), we logged a warn and fell straight into the 15-minute CSR/Ready loop against a node that wasn't rebooting — guaranteed timeout, 15 useless minutes. Replace with a Confirm prompt that mirrors the "operator declined SSH reboot" path: "power-cycle manually and press enter to continue, or decline to abort". Declining returns an InterpretError; confirming hands off to the CSR loop as usual, which is correct because by then the operator has taken over. Also: cap the SSH connect at 15s in harmony::infra::ssh. Kernel TCP retries were burning 2+ minutes on dead hosts before we even got to the failure branch; for Day-2 flows across a LAN that's far too slow to react. New SshError::ConnectTimeout variant reports the host+port +budget so the message is actionable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 14:41:56 -04:00
Sylvain Tremblay	4f6c449401	fix(os-artifacts): parse sha256sum output instead of piping through awk Previous command was sha256sum <path> 2>/dev/null \| awk '{print $1}' which assumes a Bourne-style shell. OPNsense's root shell is FreeBSD csh, which doesn't understand `2>/dev/null` (it parses that as two separate args + stdout redirect) and in general mangles the pipe+awk combination enough that what came back was unparseable — so sha256_of's "trim output and require exactly 64 hex chars" check rejected everything as `None`, and every file on the firewall showed as `<missing>` even when it was right there and correctly hashed. Drop the shell tricks: run the bare `sha256sum <path>` (every shell understands that), capture its output, and parse in Rust by scanning each line for a first token that looks like a 64-char hex hash. Robust to stderr noise interleaved into the SSH channel buffer too. Added 5 unit tests for the parser so the regression is locked down. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 14:30:50 -04:00
Sylvain Tremblay	4f568c6822	feat(okd): OKDOsArtifactsScore + wire as step 0 of OKDAddNodeScore Validates that /usr/local/http/scos/{scos-live-kernel.x86_64, scos-live-initramfs.x86_64.img, scos-live-rootfs.x86_64.img} on the firewall match what the running cluster's openshift-install declares in `coreos print-stream-json`. If any file is missing, the wrong hash, or backed by a dangling symlink, the score prints an ssh+curl+ln-sfn bundle the operator can paste to refresh directly on the firewall and loops on a Confirm until clean or they abort. Why the symlinks: stream JSON ships versioned upstream filenames (rhcos-418.94.xxx-live-kernel-x86_64) but bootstrap.ipxe.j2 asks for the stable ones. Bundle `curl -L -O`s the versioned payload and `ln -sfn`s the stable name at it; sha256sum follows symlinks so the same verifier catches stale-payload-behind-stable-name too. Score runs standalone (add it to any pipeline) AND is called as step 0 of OKDAddNodeScore::execute before anything else — operators never have to remember to wire it in. A cluster upgrade between bootstrap and a Day-2 add is the specific drift this catches. Requires `oc` on PATH and KUBECONFIG pointed at the cluster (same assumption the rest of the OKD flow already makes). `oc adm release extract --command=openshift-install --to=./data/okd/bin` runs fresh each invocation so we never verify against a stale installer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 14:13:43 -04:00
Sylvain Tremblay	96738db3ec	chore(add-node): bump CSR/Ready poll interval from 5s to 30s Polling every 5 seconds was copied from K8sClient::wait_for_node_ready but is overkill here: the kubelet-Ready and CSR-generation transitions each take minutes, so 5s just hammers the API server and floods the debug logs. 30s is plenty. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 13:46:49 -04:00
Sylvain Tremblay	d8b9b2e843	chore(add-node): verbose per-poll debug logs in the CSR/Ready loop With RUST_LOG=harmony=debug each iteration now reports: - poll number + elapsed/total timeout - whether the Node object exists yet, and if so, a one-line summary of its status.conditions (Ready=False MemoryPressure=False …) - total pending kubelet CSRs returned by the cluster - per-CSR accept/skip decisions with the signer and username - tick summary: approved-this-tick, skipped-this-tick, cumulative - sleeping X seconds before next poll Makes it obvious when the loop is stuck vs. when it's making progress (e.g. "Node not yet present" vs "Node present, Ready=False" vs "found 0 pending CSRs" vs "approved 1 skipped 2"). No behavior change; the old info logs for "Approved CSR …" and "Node Ready after …" still fire at info. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 13:45:09 -04:00
Sylvain Tremblay	7139a78b9e	fix(add-node): use systemctl --no-block reboot over SSH The previous reboot command, nohup bash -c 'sleep 2 && systemctl reboot' >/dev/null 2>&1 & looked clever but silently did nothing on the real discovery image: run_command returned Ok instantly (the backgrounded shell exited status 0), the log said "Reboot dispatched", and the physical host stayed up. sshd closes the session the moment the foreground shell exits and the backgrounded `nohup bash -c ...` got torn down before it ever hit `systemctl reboot`. Replace with `systemctl --no-block reboot`: systemd takes ownership, returns exit 0 immediately, SSH closes cleanly. No bash, no &, no redirection. Flip the Err branch from debug to warn too — with the old command an error was expected (session dying mid-reboot), with the new command an error means the reboot really didn't dispatch and the operator needs to see it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 13:38:18 -04:00
Sylvain Tremblay	106a71c56e	fix(okd-lb): dedupe ingress backend list when nodes carry master+worker OKD marks control-plane nodes as schedulable workers by default, so they come back from BOTH `node-role.kubernetes.io/master` and `node-role.kubernetes.io/worker` label-selector queries. When OKDAddNodeScore's Day-2 refresh hands those two slices to services_for_nodes, the port 80/443 "all nodes" pools were built with a naive `cps.chain(workers).collect()` and ended up with the CPs listed twice. OPNsense faithfully wrote two `server` stanzas with the same name per CP, and `haproxy -c` rejected the staging config: backend 'backend_0.0.0.0_80', another server named '192.168.40.21_80' was already defined at line 82 Fix: order-preserving dedupe by IP when composing all_nodes. Matches the OPNsense server-name convention ({ip}_{port}), so no two servers ever share a name regardless of what hostnames the cluster reports. Bootstrap path is unaffected — the static topology.control_plane / topology.workers Vecs in the example files don't overlap, so every insert succeeds and the output is byte-identical. New unit test pins the overlap case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 13:23:07 -04:00
Sylvain Tremblay	89e6c3935d	Merge branch 'master' into feat/add-new-node	2026-04-22 13:03:38 -04:00
Sylvain Tremblay	21e97c7ca9	refactor(add-node): refresh HAProxy before reboot, not after Ready All checks were successful Run Check Script / check (pull_request) Successful in 2m12s Details Moves HAProxy backend refresh from post-Ready (old step 13) to immediately after the DHCP reservation (new step 10.5), paralleling how DHCP primes the firewall before the node boots. Why this is safe: - HAProxy's health checks are designed for exactly this: cp3's backend is marked DOWN until its kubelet answers /readyz, and the other CPs keep serving traffic. No routing disruption. - The only visible effect is a few minutes of health-check failure logs on OPNsense while cp3 comes up. Why this is better: - All firewall-side prep (DHCP + HAProxy) runs together, matching the mental model "get the firewall ready, then ignite the node". - If OPNsense rejects the configure_service call, the score fails before the reboot — operator fixes and retries with no mid-transition node to babysit. - The post-reboot phase reduces to the only thing that actually needs post-reboot state: CSR approval + wait-for-Ready. Source of truth for the backend list stays the live cluster + the new (hostname, IP) we just DHCP-reserved — so subsequent add-node runs still see nodes added by earlier ones. `existing` is already in scope from step 1, so we only need one extra k8s call (the other role's nodes for ingress pool completeness). Introduced an existing_to_logical_hosts helper shared between that local use and the still-needed cluster_logical_hosts wrapper for the other role. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 09:12:42 -04:00
Sylvain Tremblay	c62984caec	feat(add-node): refresh OPNsense HAProxy backends after node Ready After CSR approval lands and the new node is Ready, OKDAddNodeScore now rebuilds the 6 OKD LoadBalancerServices (public 80/443, private 80/443 with node-ready health checks, 22623 CP-only, 6443 CP-only with /readyz check) from the live cluster's node list and runs them through LoadBalancerScore. opnsense-config's configure_service is already find-by-name + update-or-create, so existing backend servers keep their UUIDs and the new IP gets linked into the right pools without tearing down the config. To avoid duplicating the 60-line service-build, OKDLoadBalancerScore's internal builder is promoted to a pub fn: services_for_nodes(control_plane, workers) -> (Vec<LoadBalancerService>, Vec<LoadBalancerService>) OKDLoadBalancerScore::new() becomes a 4-line wrapper; Day-1 bootstrap output is byte-identical. The two private helpers (control_plane_to_backend_server, nodes_to_backend_server) collapse into a single private `backends_for(hosts, port)`; the tests that referenced the old helpers now assert against services_for_nodes output directly. Source of truth: live cluster, not topology.control_plane. The static topology vecs don't grow past bootstrap — reading from the cluster means a later add-node run sees the nodes we added in this one. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 08:22:04 -04:00
Sylvain Tremblay	d85c18c61e	feat(add-node): auto-approve kubelet CSRs and wait for Ready After the SSH reboot step, OKDAddNodeScore now polls the cluster every 5s (up to 15min) until the new Node reports Ready, approving any pending kubelet CSRs that come up in the meantime. This closes the last manual step for Day-2 control-plane adds — previously the score's success footer told the operator to go run `oc adm certificate approve` themselves. Two CSR primitives on K8sClient (new harmony-k8s/src/csr.rs): - list_pending_kubelet_csrs(): CSRs whose signer is kubelet-serving or kube-apiserver-client-kubelet AND whose status carries no Approved/Denied condition yet. - approve_csr(name, reason, message): PATCH /approval with an Approved=True condition via kube-rs' Api::patch_approval. The score's approval loop checks Ready first each iteration so a stray CSR appearing after the node is healthy is not approved. CSR matching: - kubelet-serving: spec.username == "system:node:<hostname>" (the kubelet signs this field itself; trust it). - kube-apiserver-client-kubelet: every pending one is approved. The node identity lives in the embedded CSR PEM's subject CN and parsing it would pull a new x509 dep for little gain; a Day-2 add is an operator-triggered, single-window operation and this matches what `oc adm certificate approve` of the pending set does. CP success footer now only flags API serving-cert rotation as the remaining manual step; etcd membership is explicitly noted as auto-reconciled by cluster-etcd-operator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 07:48:02 -04:00
Sylvain Tremblay	bd8f0c405b	feat(add-node): auto-reboot the discovered host via SSH instead of prompting After DHCP/byMAC/ignition are published, OKDAddNodeScore now offers to reboot the newly-discovered node over SSH using the cluster SshKeyPair (the same secret Harmony already baked into the discovery image's /root/.ssh/authorized_keys via inventory.kickstart.j2). The operator still gets a confirm prompt; declining falls back to the old "power-cycle manually" path with the IP printed. New plumbing: - harmony_types::ssh::SshCredentials — plain enum (SshKey with PEM + passphrase \| Password). No derives; not a Harmony Secret. Built on-the-fly from existing secrets the caller already holds. - harmony::infra::ssh::run_command(host, port, creds, cmd) — single russh round-trip, fresh connection per call, no trait or state. Returns stdout or SshError; caller tolerates NonZeroExit when the command is expected to sever the session (reboot). Harmony's Cargo.toml now uses russh/russh-keys via workspace instead of a pinned 0.45.0 literal to match opnsense-config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 07:16:19 -04:00
Sylvain Tremblay	cb78b111a7	fix: opnsense port is 8443 All checks were successful Run Check Script / check (pull_request) Successful in 2m9s Details	2026-04-21 16:03:42 -04:00
Sylvain Tremblay	a906a51ef6	refactor(okd): rename AddOkdNodeScore to OKDAddNodeScore Aligns with the rest of the OKD score namespace (OKDSetup01InventoryScore, OKDBootstrap02…, OKDLoadBalancerScore, …). The Add prefix made this one the odd name out. Renames the companion interpret (AddOkdNodeInterpret → OKDAddNodeInterpret), the Score::name() debug string, and the InterpretName::Custom tag for symmetry. AddNodeRole stays as-is (not a Score, and its prefix still reads fine). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 15:41:40 -04:00
Sylvain Tremblay	ca115559a9	chore(logging): prefix interpret start with score name; quiet inventory probes Two small log tweaks surfaced during Day-2 sttest runs: - The interpret start log was `Http running...` — ambiguous when several scores push through the same Http interpret. Match the success/fail format by prefixing with the score name: [HttpScore] Http starting... - CIDR-based inventory discovery logs `Error querying inventory agent on X:port : ...` at info level for every IP in the subnet that doesn't answer with a proper agent response. Drop to debug — most IPs on a /24 are noise; timeouts already log at debug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 15:37:03 -04:00
Sylvain Tremblay	b86dfc6968	fix(add-node): pick the latest role mapping instead of diffing snapshots AddOkdNodeScore identified the just-discovered host by diffing "mappings before discovery" against "mappings after". On a re-run where the operator re-picks the same host (DiscoverHostForRoleScore's overwrite prompt does DELETE+INSERT, keeping the same host_id), the diff is empty and the score bails with "no new host mapped for this role". Add InventoryRepository::get_latest_host_for_role(role) that returns the highest-id row for a role, and use it after discovery. The auto-increment id makes this robust whether the host is brand new or a replacement, and drops the HashSet snapshot plumbing from the caller. SQLite impl mirrors get_role_mapping's shape (ORDER BY id DESC LIMIT 1, deserialize NetworkConfig JSON). .sqlx/ offline cache regenerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 15:28:29 -04:00
Sylvain Tremblay	9e3523dc24	fix(opnsense): fail ensure_initialized when caddy install state cant be verified CaddyConfig::is_installed() used to return bool, collapsing every error — 404, 401, 5xx, timeout, TLS mismatch — into "not installed". That's what caused the Day-2 sttest run to stall on a 30s GET /api/caddy/General/get timeout, then fall into a second 30s install_package timeout, and finally die after ~4 minutes. Change the signature to Result<bool, Error>: - Ok(true) on 2xx - Ok(false) only on HTTP 404 (MVC route doesn't exist -> plugin truly not installed) - Err(_) for everything else (can't tell) The single caller in OPNSenseFirewall::ensure_initialized now matches on the Result: Ok(true) skips install, Ok(false) runs install_package, and Err(_) fails the score immediately with a pointed message asking the operator whether the API is reachable at the configured port. This also restores the auto-install branch that commit `9b6bdaf` removed — that commit was rejected; this replaces it with a correctly-gated install. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 15:15:17 -04:00
Sylvain Tremblay	9b6bdafc39	fix(opnsense): stop auto-installing os-caddy in ensure_initialized The OPNsense HttpServer impl was calling install_package("os-caddy") whenever CaddyConfig::is_installed() returned false. is_installed() maps any API failure (timeout, connection error, wrong port) to "not installed", so on Day-2 — when caddy is obviously already there and serving — a flaky `GET /api/caddy/General/get` would trigger a pointless reinstall attempt that itself stalled on a 30s timeout and brought the whole score down. Trust the operator: if caddy isn't installed, Harmony logs a warn and lets the next upload_files / enable call surface a pointed error. Anyone bootstrapping OPNsense from scratch installs os-caddy via the plugin manager once, same as they provision API credentials. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 15:04:05 -04:00
Sylvain Tremblay	9670b5474f	feat(sttest): modernize OPNsense wiring + append AddOkdNodeScore for cp3 Two intertwined updates to the sttest example: Modernize the OPNsense construction to match affilium2: - drop the local OPNSenseFirewallConfig that reused one username/password for both API and SSH - use the canonical OPNSenseFirewallCredentials + OPNSenseApiCredentials from harmony::config::secret (each prompted / stored separately) - extract get_opnsense() -> Arc<OPNSenseFirewall> as a reusable helper - call OPNSenseFirewall::with_api_port(..., 9443, ...) so the client hits the OPNsense web GUI on 9443 (HAProxy owns 443) - trim Cargo.toml to the deps the crate actually imports Exercise the new Day-2 add-node flow end to end: - init harmony_cli::cli_logger so the run is debuggable - replace the lone HarmonyDiscoveryStrategy::MDNS with a SUBNET scan over 192.168.40.0/24:25000, defined once and threaded through both OKDInstallationPipeline::get_all_scores(...) and the appended AddOkdNodeScore { role: ControlPlane, ... } - env.sh now documents the two OPNsense secrets and reminds the operator to point KUBECONFIG at ./data/okd/installation_files_sttest0/auth/kubeconfig after the installer finishes (AddOkdNodeScore needs a reachable k8s API) The add-node tail only publishes cp3's ignition + byMAC + DHCP; etcd membership, serving certs, and CSR approval remain manual per the score's own footer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 14:30:11 -04:00
Sylvain Tremblay	6a672eb4ae	refactor(okd): drop /etc/hostname injection, rely on DHCP All checks were successful Run Check Script / check (pull_request) Successful in 2m10s Details AddOkdNodeScore was baking /etc/hostname into the stub ignition as belt-and-suspenders. The initial bootstrap flow (cp0/cp1/cp2) never did this and it works fine: RHCOS/SCOS NetworkManager honors the DHCP-provided hostname, and the score already creates a DHCP static reservation with the chosen name before the operator reboots. Removing the redundant path (function, call site, and its unit test) so the add-node flow matches bootstrap and has one less code path to audit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 14:03:56 -04:00
Sylvain Tremblay	2a6ecd6d7c	chore(examples): target ControlPlane in okd_add_node The example was adding a worker; switch it to ControlPlane since that's the case the operator most often needs a runbook for. AddOkdNodeScore already prints the CP follow-up caveat (etcd, serving certs, CSR) in its success output, so this also makes the caveat visible on a plain `cargo run`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 13:49:58 -04:00
Sylvain Tremblay	9944b6c02b	feat(examples): add runnable okd_add_node example All checks were successful Run Check Script / check (pull_request) Successful in 2m8s Details Wires AddOkdNodeScore into a minimal example that assumes an already-running OKD cluster and an OPNsense firewall serving PXE. The topology builder mirrors examples/sttest's shape but leaves control_plane/workers/bootstrap_host empty — the score reads existing nodes directly from the cluster via kube-rs, so static topology lists aren't consulted. env.sh documents that KUBECONFIG must point at the target cluster's kubeconfig; the score relies on K8sClient::try_default() to pick it up when HAClusterTopology.kubeconfig is None. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 13:11:43 -04:00
Sylvain Tremblay	b4372f4d72	feat(okd): add AddOkdNodeScore for Day-2 node adds New score that publishes all the artifacts needed to add a single control-plane or worker node to an already-running OKD cluster: - reads existing nodes of the target role from the live cluster via kube-rs (no oc shell-out) - derives the next hostname from their naming pattern (cp0,cp1,cp2 -> cp3; cp0-harmony -> cp3-harmony) and the next IPv4 by inferring a linear seq->last-octet mapping from the existing InternalIPs - runs DiscoverHostForRoleScore for installation disk + bond + blacklist - pulls the live user-data secret (master/worker-user-data-managed in openshift-machine-api) and injects the operator's captured NetworkManager config (.nmconnection bond master + slaves + blacklist) plus /etc/hostname straight into the stub ignition JSON - pushes <role>-<hostname>.ign to the firewall HTTP root, renders a per-MAC iPXE pointing at it, and creates the DHCP reservation - returns a detail summary; for ControlPlane, prints a loud reminder that etcd membership / API serving certs / CSR approval are still manual follow-ups Hostname and IP are auto-derived — not user-editable — because that was the user's explicit decision. Fails loudly if the cluster has zero existing nodes of the target role or if the IP assignment isn't linear-stride-1. Includes unit tests for hostname parsing, next-hostname derivation, IP deriver (standard case + error cases), and ignition injection (preserves existing storage.files, handles a stub with no storage key, emits mode 0600 + overwrite). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 13:09:47 -04:00
Sylvain Tremblay	b8b5816721	refactor(okd): relax BootstrapIpxeTpl.ignition_file_name lifetime Was &'static str because bootstrap only ever rendered bootstrap.ign / master.ign / worker.ign. The upcoming add-node flow generates per-host filenames like master-cp3.ign, so the lifetime has to be bound to the caller's String. All existing callers pass string literals, which coerce through unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 13:03:12 -04:00
Sylvain Tremblay	05d77fa41b	refactor(network-manager): extract .nmconnection builders into a shared module Pulls the two Askama bond templates out of OpenShiftNmStateNetworkManager into a new pub(crate) networkmanager_cfg module and parameterizes the bond master on LaggProtocol (plus a new disabled-interface builder for blacklisted NICs). LACP output is kept byte-identical to the previous template so existing post-install bonding is unchanged; the new builders unblock the upcoming Day-2 add-node flow which needs to pick modes per operator input. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 13:02:49 -04:00

wip feat/add-new-node #268

27 Commits