Files
harmony/ROADMAP/08-ha-okd-production.md
Jean-Gabriel Gill-Couture 6c664e9f34 docs(roadmap): add phases 7-8 for OPNsense and HA OKD production
Add Phase 7 (OPNsense & Bare-Metal Network Automation) tracking current
progress on OPNsense Scores, codegen, and Brocade integration. Details
the UpdateHostScore requirement and HostNetworkConfigurationScore rework
needed for LAGG LACP 802.3ad.

Add Phase 8 (HA OKD Production Deployment) describing the target
architecture with LAGG/CARP/multi-WAN/BINAT and validation checklist.

Update current state section to reflect opnsense-codegen branch progress.
2026-03-25 23:20:35 -04:00

2.3 KiB

Phase 8: HA OKD Production Deployment

Goal

Deploy a production HAClusterTopology OKD cluster in UPI mode with full LAGG LACP 802.3ad, CARP VIP, multi-WAN, and BINAT for customer traffic — entirely automated through Harmony Scores.

Status: Not Started

Prerequisites

  • Phase 7 (OPNsense & Bare-Metal) substantially complete
  • Brocade branch merged and adapted
  • UpdateHostScore implemented and tested

Deployment Stack

Network Layer (OPNsense)

  • LAGG interfaces (802.3ad LACP) for all cluster hosts — redundant links via LaggScore
  • CARP VIPs for high availability — failover IPs via VipScore
  • Multi-WAN configuration — multiple uplinks with gateway groups
  • BINAT for customer-facing IPs — 1:1 NAT via BinatScore
  • Firewall rules per-customer with proper source/dest filtering via FirewallRuleScore
  • Outbound NAT for cluster egress via OutboundNatScore

Switch Layer (Brocade)

  • VLAN per network segment (management, cluster, customer, storage)
  • Port-channels (LACP) matching OPNsense LAGG interfaces
  • Interface speed configuration for 10G/40G links

Host Layer

  • PXE boot via UpdateHostScore (MAC → DHCP → TFTP → iPXE → SCOS)
  • Network bonds (LACP) via reworked HostNetworkConfigurationScore
  • NMState for persistent bond configuration on OpenShift nodes

Cluster Layer

  • OKD UPI installation via existing OKDSetup01-04 Scores
  • HAProxy load balancer for API and ingress via LoadBalancerScore
  • DNS via OKDDnsScore
  • Monitoring via NodeExporterScore + Prometheus stack

New Scores Needed

  1. UpdateHostScore — Update MAC in DHCP, configure PXE boot, prepare host network for LAGG LACP
  2. MultiWanScore — Configure OPNsense gateway groups for multi-WAN failover
  3. CustomerBinatScore (optional) — Higher-level Score combining BinatScore + FirewallRuleScore + DnatScore per customer

Validation Checklist

  • All hosts PXE boot successfully after MAC update
  • LAGG/LACP active on all host links (verify via teamdctl or nmcli)
  • CARP VIPs fail over within expected time window
  • BINAT customers reachable from external networks
  • Multi-WAN failover tested (pull one uplink, verify traffic shifts)
  • Full OKD installation completes end-to-end
  • Cluster API accessible via CARP VIP
  • Customer workloads routable via BINAT