Files
harmony/ROADMAP/08-ha-okd-production.md
Jean-Gabriel Gill-Couture 6c664e9f34 docs(roadmap): add phases 7-8 for OPNsense and HA OKD production
Add Phase 7 (OPNsense & Bare-Metal Network Automation) tracking current
progress on OPNsense Scores, codegen, and Brocade integration. Details
the UpdateHostScore requirement and HostNetworkConfigurationScore rework
needed for LAGG LACP 802.3ad.

Add Phase 8 (HA OKD Production Deployment) describing the target
architecture with LAGG/CARP/multi-WAN/BINAT and validation checklist.

Update current state section to reflect opnsense-codegen branch progress.
2026-03-25 23:20:35 -04:00

57 lines
2.3 KiB
Markdown

# Phase 8: HA OKD Production Deployment
## Goal
Deploy a production HAClusterTopology OKD cluster in UPI mode with full LAGG LACP 802.3ad, CARP VIP, multi-WAN, and BINAT for customer traffic — entirely automated through Harmony Scores.
## Status: Not Started
## Prerequisites
- Phase 7 (OPNsense & Bare-Metal) substantially complete
- Brocade branch merged and adapted
- UpdateHostScore implemented and tested
## Deployment Stack
### Network Layer (OPNsense)
- **LAGG interfaces** (802.3ad LACP) for all cluster hosts — redundant links via LaggScore
- **CARP VIPs** for high availability — failover IPs via VipScore
- **Multi-WAN** configuration — multiple uplinks with gateway groups
- **BINAT** for customer-facing IPs — 1:1 NAT via BinatScore
- **Firewall rules** per-customer with proper source/dest filtering via FirewallRuleScore
- **Outbound NAT** for cluster egress via OutboundNatScore
### Switch Layer (Brocade)
- **VLAN** per network segment (management, cluster, customer, storage)
- **Port-channels** (LACP) matching OPNsense LAGG interfaces
- **Interface speed** configuration for 10G/40G links
### Host Layer
- **PXE boot** via UpdateHostScore (MAC → DHCP → TFTP → iPXE → SCOS)
- **Network bonds** (LACP) via reworked HostNetworkConfigurationScore
- **NMState** for persistent bond configuration on OpenShift nodes
### Cluster Layer
- OKD UPI installation via existing OKDSetup01-04 Scores
- HAProxy load balancer for API and ingress via LoadBalancerScore
- DNS via OKDDnsScore
- Monitoring via NodeExporterScore + Prometheus stack
## New Scores Needed
1. **UpdateHostScore** — Update MAC in DHCP, configure PXE boot, prepare host network for LAGG LACP
2. **MultiWanScore** — Configure OPNsense gateway groups for multi-WAN failover
3. **CustomerBinatScore** (optional) — Higher-level Score combining BinatScore + FirewallRuleScore + DnatScore per customer
## Validation Checklist
- [ ] All hosts PXE boot successfully after MAC update
- [ ] LAGG/LACP active on all host links (verify via `teamdctl` or `nmcli`)
- [ ] CARP VIPs fail over within expected time window
- [ ] BINAT customers reachable from external networks
- [ ] Multi-WAN failover tested (pull one uplink, verify traffic shifts)
- [ ] Full OKD installation completes end-to-end
- [ ] Cluster API accessible via CARP VIP
- [ ] Customer workloads routable via BINAT