wip(failover): Started implementation of the FailoverTopology with PostgreSQL capability

This is our first Higher Order Topology (see ADR-015)
feat(k8s_app): OperatorhubCatalogSourceScore can now install the operatorhub catalogsource on a cluster that already has operator lifecycle manager installed
2025-12-10 17:00:28 -05:00 · 2025-12-10 16:58:58 -05:00 · 2025-12-10 13:12:53 -05:00 · 2025-12-10 13:12:53 -05:00 · 2025-12-10 18:12:23 +00:00 · 2025-12-09 23:04:15 -05:00
22 changed files with 1035 additions and 4 deletions
--- a/adr/015-higher-order-topologies.md
+++ b/adr/015-higher-order-topologies.md
@@ -0,0 +1,114 @@
+# Architecture Decision Record: Higher-Order Topologies
+
+**Initial Author:** Jean-Gabriel Gill-Couture  
+**Initial Date:** 2025-12-08  
+**Last Updated Date:** 2025-12-08  
+
+## Status
+
+Implemented
+
+## Context
+
+Harmony models infrastructure as **Topologies** (deployment targets like `K8sAnywhereTopology`, `LinuxHostTopology`) implementing **Capabilities** (tech traits like `PostgreSQL`, `Docker`).
+
+**Higher-Order Topologies** (e.g., `FailoverTopology<T>`) compose/orchestrate capabilities *across* multiple underlying topologies (e.g., primary+replica `T`).
+
+Naive design requires manual `impl Capability for HigherOrderTopology<T>` *per T per capability*, causing:
+- **Impl explosion**: N topologies × M capabilities = N×M boilerplate.
+- **ISP violation**: Topologies forced to impl unrelated capabilities.
+- **Maintenance hell**: New topology needs impls for *all* orchestrated capabilities; new capability needs impls for *all* topologies/higher-order.
+- **Barrier to extension**: Users can't easily add topologies without todos/panics.
+
+This makes scaling Harmony impractical as ecosystem grows.
+
+## Decision
+
+Use **blanket trait impls** on higher-order topologies to *automatically* derive orchestration:
+
+````rust
+/// Higher-Order Topology: Orchestrates capabilities across sub-topologies.
+pub struct FailoverTopology<T> {
+    /// Primary sub-topology.
+    primary: T,
+    /// Replica sub-topology.
+    replica: T,
+}
+
+/// Automatically provides PostgreSQL failover for *any* `T: PostgreSQL`.
+/// Delegates to primary for queries; orchestrates deploy across both.
+#[async_trait]
+impl<T: PostgreSQL> PostgreSQL for FailoverTopology<T> {
+    async fn deploy(&self, config: &PostgreSQLConfig) -> Result<String, String> {
+        // Deploy primary; extract certs/endpoint;
+        // deploy replica with pg_basebackup + TLS passthrough.
+        // (Full impl logged/elaborated.)
+    }
+
+    // Delegate queries to primary.
+    async fn get_replication_certs(&self, cluster_name: &str) -> Result<ReplicationCerts, String> {
+        self.primary.get_replication_certs(cluster_name).await
+    }
+    // ...
+}
+
+/// Similarly for other capabilities.
+#[async_trait]
+impl<T: Docker> Docker for FailoverTopology<T> {
+    // Failover Docker orchestration.
+}
+````
+
+**Key properties:**
+- **Auto-derivation**: `Failover<K8sAnywhere>` gets `PostgreSQL` iff `K8sAnywhere: PostgreSQL`.
+- **No boilerplate**: One blanket impl per capability *per higher-order type*.
+
+## Rationale
+
+- **Composition via generics**: Rust trait solver auto-selects impls; zero runtime cost.
+- **Compile-time safety**: Missing `T: Capability` → compile error (no panics).
+- **Scalable**: O(capabilities) impls per higher-order; new `T` auto-works.
+- **ISP-respecting**: Capabilities only surface if sub-topology provides.
+- **Centralized logic**: Orchestration (e.g., cert propagation) in one place.
+
+**Example usage:**
+````rust
+// ✅ Works: K8sAnywhere: PostgreSQL → Failover provides failover PG
+let pg_failover: FailoverTopology<K8sAnywhereTopology> = ...;
+pg_failover.deploy_pg(config).await;
+
+// ✅ Works: LinuxHost: Docker → Failover provides failover Docker
+let docker_failover: FailoverTopology<LinuxHostTopology> = ...;
+docker_failover.deploy_docker(...).await;
+
+// ❌ Compile fail: K8sAnywhere !: Docker
+let invalid: FailoverTopology<K8sAnywhereTopology>;
+invalid.deploy_docker(...); // `T: Docker` bound unsatisfied
+````
+
+## Consequences
+
+**Pros:**
+- **Extensible**: New topology `AWSTopology: PostgreSQL` → instant `Failover<AWSTopology>: PostgreSQL`.
+- **Lean**: No useless impls (e.g., no `K8sAnywhere: Docker`).
+- **Observable**: Logs trace every step.
+
+**Cons:**
+- **Monomorphization**: Generics generate code per T (mitigated: few Ts).
+- **Delegation opacity**: Relies on rustdoc/logs for internals.
+
+## Alternatives considered
+
+| Approach | Pros | Cons |
+|----------|------|------|
+| **Manual per-T impls**<br>`impl PG for Failover<K8s> {..}`<br>`impl PG for Failover<Linux> {..}` | Explicit control | N×M explosion; violates ISP; hard to extend. |
+| **Dynamic trait objects**<br>`Box<dyn AnyCapability>` | Runtime flex | Perf hit; type erasure; error-prone dispatch. |
+| **Mega-topology trait**<br>All-in-one `OrchestratedTopology` | Simple wiring | Monolithic; poor composition. |
+| **Registry dispatch**<br>Runtime capability lookup | Decoupled | Complex; no compile safety; perf/debug overhead. |
+
+**Selected**: Blanket impls leverage Rust generics for safe, zero-cost composition.
+
+## Additional Notes
+
+- Applies to `MultisiteTopology<T>`, `ShardedTopology<T>`, etc.
+- `FailoverTopology` in `failover.rs` is first implementation.
--- a/adr/015-higher-order-topologies/example.rs
+++ b/adr/015-higher-order-topologies/example.rs
@@ -0,0 +1,153 @@
+//! Example of Higher-Order Topologies in Harmony.
+//! Demonstrates how `FailoverTopology<T>` automatically provides failover for *any* capability
+//! supported by a sub-topology `T` via blanket trait impls.
+//! 
+//! Key insight: No manual impls per T or capability -- scales effortlessly.
+//! Users can:
+//! - Write new `Topology` (impl capabilities on a struct).
+//! - Compose with `FailoverTopology` (gets capabilities if T has them).
+//! - Compile fails if capability missing (safety).
+
+use async_trait::async_trait;
+use tokio;
+
+/// Capability trait: Deploy and manage PostgreSQL.
+#[async_trait]
+pub trait PostgreSQL {
+    async fn deploy(&self, config: &PostgreSQLConfig) -> Result<String, String>;
+    async fn get_replication_certs(&self, cluster_name: &str) -> Result<ReplicationCerts, String>;
+}
+
+/// Capability trait: Deploy Docker.
+#[async_trait]
+pub trait Docker {
+    async fn deploy_docker(&self) -> Result<String, String>;
+}
+
+/// Configuration for PostgreSQL deployments.
+#[derive(Clone)]
+pub struct PostgreSQLConfig;
+
+/// Replication certificates.
+#[derive(Clone)]
+pub struct ReplicationCerts;
+
+/// Concrete topology: Kubernetes Anywhere (supports PostgreSQL).
+#[derive(Clone)]
+pub struct K8sAnywhereTopology;
+
+#[async_trait]
+impl PostgreSQL for K8sAnywhereTopology {
+    async fn deploy(&self, _config: &PostgreSQLConfig) -> Result<String, String> {
+        // Real impl: Use k8s helm chart, operator, etc.
+        Ok("K8sAnywhere PostgreSQL deployed".to_string())
+    }
+
+    async fn get_replication_certs(&self, _cluster_name: &str) -> Result<ReplicationCerts, String> {
+        Ok(ReplicationCerts)
+    }
+}
+
+/// Concrete topology: Linux Host (supports Docker).
+#[derive(Clone)]
+pub struct LinuxHostTopology;
+
+#[async_trait]
+impl Docker for LinuxHostTopology {
+    async fn deploy_docker(&self) -> Result<String, String> {
+        // Real impl: Install/configure Docker on host.
+        Ok("LinuxHost Docker deployed".to_string())
+    }
+}
+
+/// Higher-Order Topology: Composes multiple sub-topologies (primary + replica).
+/// Automatically derives *all* capabilities of `T` with failover orchestration.
+/// 
+/// - If `T: PostgreSQL`, then `FailoverTopology<T>: PostgreSQL` (blanket impl).
+/// - Same for `Docker`, etc. No boilerplate!
+/// - Compile-time safe: Missing `T: Capability` → error.
+#[derive(Clone)]
+pub struct FailoverTopology<T> {
+    /// Primary sub-topology.
+    pub primary: T,
+    /// Replica sub-topology.
+    pub replica: T,
+}
+
+/// Blanket impl: Failover PostgreSQL if T provides PostgreSQL.
+/// Delegates reads to primary; deploys to both.
+#[async_trait]
+impl<T: PostgreSQL + Send + Sync + Clone> PostgreSQL for FailoverTopology<T> {
+    async fn deploy(&self, config: &PostgreSQLConfig) -> Result<String, String> {
+        // Orchestrate: Deploy primary first, then replica (e.g., via pg_basebackup).
+        let primary_result = self.primary.deploy(config).await?;
+        let replica_result = self.replica.deploy(config).await?;
+        Ok(format!("Failover PG deployed: {} | {}", primary_result, replica_result))
+    }
+
+    async fn get_replication_certs(&self, cluster_name: &str) -> Result<ReplicationCerts, String> {
+        // Delegate to primary (replica follows).
+        self.primary.get_replication_certs(cluster_name).await
+    }
+}
+
+/// Blanket impl: Failover Docker if T provides Docker.
+#[async_trait]
+impl<T: Docker + Send + Sync + Clone> Docker for FailoverTopology<T> {
+    async fn deploy_docker(&self) -> Result<String, String> {
+        // Orchestrate across primary + replica.
+        let primary_result = self.primary.deploy_docker().await?;
+        let replica_result = self.replica.deploy_docker().await?;
+        Ok(format!("Failover Docker deployed: {} | {}", primary_result, replica_result))
+    }
+}
+
+#[tokio::main]
+async fn main() {
+    let config = PostgreSQLConfig;
+
+    println!("=== ✅ PostgreSQL Failover (K8sAnywhere supports PG) ===");
+    let pg_failover = FailoverTopology {
+        primary: K8sAnywhereTopology,
+        replica: K8sAnywhereTopology,
+    };
+    let result = pg_failover.deploy(&config).await.unwrap();
+    println!("Result: {}", result);
+
+    println!("\n=== ✅ Docker Failover (LinuxHost supports Docker) ===");
+    let docker_failover = FailoverTopology {
+        primary: LinuxHostTopology,
+        replica: LinuxHostTopology,
+    };
+    let result = docker_failover.deploy_docker().await.unwrap();
+    println!("Result: {}", result);
+
+    println!("\n=== ❌ Would fail to compile (K8sAnywhere !: Docker) ===");
+    // let invalid = FailoverTopology {
+    //     primary: K8sAnywhereTopology,
+    //     replica: K8sAnywhereTopology,
+    // };
+    // invalid.deploy_docker().await.unwrap(); // Error: `K8sAnywhereTopology: Docker` not satisfied!
+    // Very clear error message :
+    // error[E0599]: the method `deploy_docker` exists for struct `FailoverTopology<K8sAnywhereTopology>`, but its trait bounds were not satisfied
+    //   --> src/main.rs:90:9
+    //    |
+    //  4 | pub struct FailoverTopology<T> {
+    //    | ------------------------------ method `deploy_docker` not found for this struct because it doesn't satisfy `FailoverTopology<K8sAnywhereTopology>: Docker`
+    // ...
+    // 37 | struct K8sAnywhereTopology;
+    //    | -------------------------- doesn't satisfy `K8sAnywhereTopology: Docker`
+    // ...
+    // 90 | invalid.deploy_docker(); // `T: Docker` bound unsatisfied
+    //    |         ^^^^^^^^^^^^^ method cannot be called on `FailoverTopology<K8sAnywhereTopology>` due to unsatisfied trait bounds
+    //    |
+    // note: trait bound `K8sAnywhereTopology: Docker` was not satisfied
+    //   --> src/main.rs:61:9
+    //    |
+    // 61 | impl<T: Docker + Send + Sync> Docker for FailoverTopology<T> {
+    //    |         ^^^^^^                ------     -------------------
+    //    |         |
+    //    |         unsatisfied trait bound introduced here
+    // note: the trait `Docker` must be implemented
+}
+
--- a/docs/modules/Multisite_PostgreSQL.md
+++ b/docs/modules/Multisite_PostgreSQL.md
@@ -0,0 +1,105 @@
+# Design Document: Harmony PostgreSQL Module
+
+**Status:** Draft
+**Last Updated:** 2025-12-01
+**Context:** Multi-site Data Replication & Orchestration
+
+## 1. Overview
+
+The Harmony PostgreSQL Module provides a high-level abstraction for deploying and managing high-availability PostgreSQL clusters across geographically distributed Kubernetes/OKD sites.
+
+Instead of manually configuring complex replication slots, firewalls, and operator settings on each cluster, users define a single intent (a **Score**), and Harmony orchestrates the underlying infrastructure (the **Arrangement**) to establish a Primary-Replica architecture.
+
+Currently, the implementation relies on the **CloudNativePG (CNPG)** operator as the backing engine.
+
+## 2. Architecture
+
+### 2.1 The Abstraction Model
+Following **ADR 003 (Infrastructure Abstraction)**, Harmony separates the *intent* from the *implementation*.
+
+1.  **The Score (Intent):** The user defines a `MultisitePostgreSQL` resource. This describes *what* is needed (e.g., "A Postgres 15 cluster with 10GB storage, Primary on Site A, Replica on Site B").
+2.  **The Interpret (Action):** Harmony MultisitePostgreSQLInterpret processes this Score and orchestrates the deployment on both sites to reach the state defined in the Score.
+3.  **The Capability (Implementation):** The PostgreSQL Capability is implemented by the K8sTopology and the interpret can deploy it, configure it and fetch information about it. The concrete implementation will rely on the mature CloudnativePG operator to manage all the Kubernetes resources required.
+
+### 2.2 Network Connectivity (TLS Passthrough)
+
+One of the critical challenges in multi-site orchestration is secure connectivity between clusters that may have dynamic IPs or strict firewalls.
+
+To solve this, we utilize **OKD/OpenShift Routes with TLS Passthrough**.
+
+*   **Mechanism:** The Primary site exposes a `Route` configured for `termination: passthrough`.
+*   **Routing:** The OpenShift HAProxy router inspects the **SNI (Server Name Indication)** header of the incoming TCP connection to route traffic to the correct PostgreSQL Pod.
+*   **Security:** SSL is **not** terminated at the ingress router. The encrypted stream is passed directly to the PostgreSQL instance. Mutual TLS (mTLS) authentication is handled natively by CNPG between the Primary and Replica instances.
+*   **Dynamic IPs:** Because connections are established via DNS hostnames (the Route URL), this architecture is resilient to dynamic IP changes at the Primary site.
+
+#### Traffic Flow Diagram
+
+```text
+[ Site B: Replica ]                 [ Site A: Primary ]
+      |                                     |
+(CNPG Instance) --[Encrypted TCP]--> (OKD HAProxy Router)
+      |           (Port 443)                |
+      |                                     |
+      |                            [SNI Inspection]
+      |                                     |
+      |                                     v
+      |                            (PostgreSQL Primary Pod)
+      |                                   (Port 5432)
+```
+
+## 3. Design Decisions
+
+### Why CloudNativePG?
+We selected CloudNativePG because it relies exclusively on standard Kubernetes primitives and uses the native PostgreSQL replication protocol (WAL shipping/Streaming). This aligns with Harmony's goal of being "K8s Native."
+
+### Why TLS Passthrough instead of VPN/NodePort?
+*   **NodePort:** Requires static IPs and opening non-standard ports on the firewall, which violates our security constraints.
+*   **VPN (e.g., Wireguard/Tailscale):** While secure, it introduces significant complexity (sidecars, key management) and external dependencies.
+*   **TLS Passthrough:** Leverages the existing Ingress/Router infrastructure already present in OKD. It requires zero additional software and respects multi-tenancy (Routes are namespaced).
+
+### Configuration Philosophy (YAGNI)
+The current design exposes a **generic configuration surface**. Users can configure standard parameters (Storage size, CPU/Memory requests, Postgres version).
+
+**We explicitly do not expose advanced CNPG or PostgreSQL configurations at this stage.**
+
+*   **Reasoning:** We aim to keep the API surface small and manageable.
+*   **Future Path:** We plan to implement a "pass-through" mechanism to allow sending raw config maps or custom parameters to the underlying engine (CNPG) *only when a concrete use case arises*. Until then, we adhere to the **YAGNI (You Ain't Gonna Need It)** principle to avoid premature optimization and API bloat.
+
+## 4. Usage Guide
+
+To deploy a multi-site cluster, apply the `MultisitePostgreSQL` resource to the Harmony Control Plane.
+
+### Example Manifest
+
+```yaml
+apiVersion: harmony.io/v1alpha1
+kind: MultisitePostgreSQL
+metadata:
+  name: finance-db
+  namespace: tenant-a
+spec:
+  version: "15"
+  storage: "10Gi"
+  resources:
+    requests:
+      cpu: "500m"
+      memory: "1Gi"
+  
+  # Topology Definition
+  topology:
+    primary:
+      site: "site-paris" # The name of the cluster in Harmony
+    replicas:
+      - site: "site-newyork"
+```
+
+### What happens next?
+1.  Harmony detects the CR.
+2.  **On Site Paris:** It deploys a CNPG Cluster (Primary) and creates a Passthrough Route `postgres-finance-db.apps.site-paris.example.com`.
+3.  **On Site New York:** It deploys a CNPG Cluster (Replica) configured with `externalClusters` pointing to the Paris Route.
+4.  Data begins replicating immediately over the encrypted channel.
+
+## 5. Troubleshooting
+
+*   **Connection Refused:** Ensure the Primary site's Route is successfully admitted by the Ingress Controller.
+*   **Certificate Errors:** CNPG manages mTLS automatically. If errors persist, ensure the CA secrets were correctly propagated by Harmony from Primary to Replica namespaces.
--- a/examples/operatorhub_catalog/Cargo.toml
+++ b/examples/operatorhub_catalog/Cargo.toml
@@ -0,0 +1,18 @@
+[package]
+name = "example-operatorhub-catalogsource"
+edition = "2024"
+version.workspace = true
+readme.workspace = true
+license.workspace = true
+publish = false
+
+[dependencies]
+harmony = { path = "../../harmony" }
+harmony_cli = { path = "../../harmony_cli" }
+harmony_types = { path = "../../harmony_types" }
+cidr = { workspace = true }
+tokio = { workspace = true }
+harmony_macros = { path = "../../harmony_macros" }
+log = { workspace = true }
+env_logger = { workspace = true }
+url = { workspace = true }
--- a/examples/operatorhub_catalog/src/main.rs
+++ b/examples/operatorhub_catalog/src/main.rs
@@ -0,0 +1,23 @@
+use std::str::FromStr;
+
+use harmony::{
+    inventory::Inventory,
+    modules::{k8s::apps::OperatorHubCatalogSourceScore, tenant::TenantScore},
+    topology::{tenant::TenantConfig, K8sAnywhereTopology},
+};
+use harmony_types::id::Id;
+
+#[tokio::main]
+async fn main() {
+
+    let operatorhub_catalog = OperatorHubCatalogSourceScore::default();
+
+    harmony_cli::run(
+        Inventory::autoload(),
+        K8sAnywhereTopology::from_env(),
+        vec![Box::new(operatorhub_catalog)],
+        None,
+    )
+    .await
+    .unwrap();
+}
--- a/harmony/src/domain/topology/failover.rs
+++ b/harmony/src/domain/topology/failover.rs
@@ -0,0 +1,19 @@
+use async_trait::async_trait;
+
+use crate::topology::{PreparationError, PreparationOutcome, Topology};
+
+pub struct FailoverTopology<T> {
+    pub primary: T,
+    pub replica: T,
+}
+
+#[async_trait]
+impl<T: Send + Sync> Topology for FailoverTopology<T> {
+    fn name(&self) -> &str {
+        "FailoverTopology"
+    }
+
+    async fn ensure_ready(&self) -> Result<PreparationOutcome, PreparationError> {
+        todo!()
+    }
+}
--- a/harmony/src/domain/topology/mod.rs
+++ b/harmony/src/domain/topology/mod.rs
@@ -1,5 +1,7 @@
 mod ha_cluster;
 pub mod ingress;
+mod failover;
+pub use failover::*;
 use harmony_types::net::IpAddress;
 mod host_binding;
 mod http;
--- a/harmony/src/infra/network_manager.rs
+++ b/harmony/src/infra/network_manager.rs
@@ -17,6 +17,12 @@ use crate::{
    topology::{HostNetworkConfig, NetworkError, NetworkManager, k8s::K8sClient},
 };

+/// TODO document properly the non-intuitive behavior or "roll forward only" of nmstate in general
+/// It is documented in nmstate official doc, but worth mentionning here :
+///
+/// - You create a bond, nmstate will apply it
+/// - You delete de bond from nmstate, it will NOT delete it
+/// - To delete it you have to update it with configuration set to null
 pub struct OpenShiftNmStateNetworkManager {
    k8s_client: Arc<K8sClient>,
 }
@@ -31,6 +37,7 @@ impl std::fmt::Debug for OpenShiftNmStateNetworkManager {
 impl NetworkManager for OpenShiftNmStateNetworkManager {
    async fn ensure_network_manager_installed(&self) -> Result<(), NetworkError> {
        debug!("Installing NMState controller...");
+        // TODO use operatorhub maybe?
        self.k8s_client.apply_url(url::Url::parse("https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.84.0/nmstate.io_nmstates.yaml
 ").unwrap(), Some("nmstate"))
            .await?;
@@ -135,8 +142,6 @@ impl OpenShiftNmStateNetworkManager {
                description: Some(format!("Member of bond {bond_name}")),
                r#type: nmstate::InterfaceType::Ethernet,
                state: "up".to_string(),
-                mtu: Some(switch_port.interface.mtu),
-                mac_address: Some(switch_port.interface.mac_address.to_string()),
                ipv4: Some(nmstate::IpStackSpec {
                    enabled: Some(false),
                    ..Default::default()
@@ -162,7 +167,7 @@ impl OpenShiftNmStateNetworkManager {

        interfaces.push(nmstate::Interface {
            name: bond_name.to_string(),
-            description: Some(format!("Network bond for host {host}")),
+            description: Some(format!("HARMONY - Network bond for host {host}")),
            r#type: nmstate::InterfaceType::Bond,
            state: "up".to_string(),
            copy_mac_from,
--- a/harmony/src/modules/k8s/apps/crd/catalogsources_operators_coreos_com.rs
+++ b/harmony/src/modules/k8s/apps/crd/catalogsources_operators_coreos_com.rs
@@ -0,0 +1,157 @@
+use std::collections::BTreeMap;
+
+use k8s_openapi::{
+    api::core::v1::{Affinity, Toleration},
+    apimachinery::pkg::apis::meta::v1::ObjectMeta,
+};
+use kube::CustomResource;
+use schemars::JsonSchema;
+use serde::{Deserialize, Serialize};
+use serde_json::Value;
+
+#[derive(CustomResource, Deserialize, Serialize, Clone, Debug)]
+#[kube(
+    group = "operators.coreos.com",
+    version = "v1alpha1",
+    kind = "CatalogSource",
+    plural = "catalogsources",
+    namespaced = true,
+    schema = "disabled"
+)]
+#[serde(rename_all = "camelCase")]
+pub struct CatalogSourceSpec {
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub address: Option<String>,
+
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub config_map: Option<String>,
+
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub description: Option<String>,
+
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub display_name: Option<String>,
+
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub grpc_pod_config: Option<GrpcPodConfig>,
+
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub icon: Option<Icon>,
+
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub image: Option<String>,
+
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub priority: Option<i64>,
+
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub publisher: Option<String>,
+
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub run_as_root: Option<bool>,
+
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub secrets: Option<Vec<String>>,
+
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub source_type: Option<String>,
+
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub update_strategy: Option<UpdateStrategy>,
+}
+
+#[derive(Deserialize, Serialize, Clone, Debug)]
+#[serde(rename_all = "camelCase")]
+pub struct GrpcPodConfig {
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub affinity: Option<Affinity>,
+
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub extract_content: Option<ExtractContent>,
+
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub memory_target: Option<Value>,
+
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub node_selector: Option<BTreeMap<String, String>>,
+
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub priority_class_name: Option<String>,
+
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub security_context_config: Option<String>,
+
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub tolerations: Option<Vec<Toleration>>,
+}
+
+#[derive(Deserialize, Serialize, Clone, Debug, JsonSchema)]
+#[serde(rename_all = "camelCase")]
+pub struct ExtractContent {
+    pub cache_dir: String,
+    pub catalog_dir: String,
+}
+
+#[derive(Deserialize, Serialize, Clone, Debug, JsonSchema)]
+#[serde(rename_all = "camelCase")]
+pub struct Icon {
+    pub base64data: String,
+    pub mediatype: String,
+}
+
+#[derive(Deserialize, Serialize, Clone, Debug, JsonSchema)]
+#[serde(rename_all = "camelCase")]
+pub struct UpdateStrategy {
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub registry_poll: Option<RegistryPoll>,
+}
+
+#[derive(Deserialize, Serialize, Clone, Debug, JsonSchema)]
+#[serde(rename_all = "camelCase")]
+pub struct RegistryPoll {
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub interval: Option<String>,
+}
+
+impl Default for CatalogSource {
+    fn default() -> Self {
+        Self {
+            metadata: ObjectMeta::default(),
+            spec: CatalogSourceSpec {
+                address: None,
+                config_map: None,
+                description: None,
+                display_name: None,
+                grpc_pod_config: None,
+                icon: None,
+                image: None,
+                priority: None,
+                publisher: None,
+                run_as_root: None,
+                secrets: None,
+                source_type: None,
+                update_strategy: None,
+            },
+        }
+    }
+}
+
+impl Default for CatalogSourceSpec {
+    fn default() -> Self {
+        Self {
+            address: None,
+            config_map: None,
+            description: None,
+            display_name: None,
+            grpc_pod_config: None,
+            icon: None,
+            image: None,
+            priority: None,
+            publisher: None,
+            run_as_root: None,
+            secrets: None,
+            source_type: None,
+            update_strategy: None,
+        }
+    }
+}
--- a/harmony/src/modules/k8s/apps/crd/mod.rs
+++ b/harmony/src/modules/k8s/apps/crd/mod.rs
@@ -0,0 +1,4 @@
+
+mod catalogsources_operators_coreos_com;
+pub use catalogsources_operators_coreos_com::*;
+
--- a/harmony/src/modules/k8s/apps/mod.rs
+++ b/harmony/src/modules/k8s/apps/mod.rs
@@ -0,0 +1,4 @@
+mod operatorhub;
+pub use operatorhub::*;
+pub mod crd;
+
--- a/harmony/src/modules/k8s/apps/operatorhub.rs
+++ b/harmony/src/modules/k8s/apps/operatorhub.rs
@@ -0,0 +1,107 @@
+// Write operatorhub catalog score
+// for now this will only support on OKD with the default catalog and operatorhub setup and does not verify OLM state or anything else. Very opinionated and bare-bones to start
+
+use k8s_openapi::apimachinery::pkg::apis::meta::v1::ObjectMeta;
+use serde::Serialize;
+
+use crate::interpret::Interpret;
+use crate::modules::k8s::apps::crd::{
+    CatalogSource, CatalogSourceSpec, RegistryPoll, UpdateStrategy,
+};
+use crate::modules::k8s::resource::K8sResourceScore;
+use crate::score::Score;
+use crate::topology::{K8sclient, Topology};
+
+/// Installs the CatalogSource in a cluster which already has the required services and CRDs installed.
+///
+/// ```rust
+/// use harmony::modules::k8s::apps::OperatorHubCatalogSourceScore;
+///
+/// let score = OperatorHubCatalogSourceScore::default();
+/// ```
+///
+/// Required services:
+/// - catalog-operator
+/// - olm-operator
+///
+/// They are installed by default with OKD/Openshift
+///
+/// **Warning** : this initial implementation does not manage the dependencies. They must already
+/// exist in the cluster.
+#[derive(Debug, Clone, Serialize)]
+pub struct OperatorHubCatalogSourceScore {
+    pub name: String,
+    pub namespace: String,
+    pub image: String,
+}
+
+impl OperatorHubCatalogSourceScore {
+    pub fn new(name: &str, namespace: &str, image: &str) -> Self {
+        Self {
+            name: name.to_string(),
+            namespace: namespace.to_string(),
+            image: image.to_string(),
+        }
+    }
+}
+
+impl Default for OperatorHubCatalogSourceScore {
+    /// This default implementation will create this k8s resource :
+    ///
+    /// ```yaml
+    /// apiVersion: operators.coreos.com/v1alpha1
+    /// kind: CatalogSource
+    /// metadata:
+    ///   name: operatorhubio-catalog
+    ///   namespace: openshift-marketplace
+    /// spec:
+    ///   sourceType: grpc
+    ///   image: quay.io/operatorhubio/catalog:latest
+    ///   displayName: Operatorhub Operators
+    ///   publisher: OperatorHub.io
+    ///   updateStrategy:
+    ///     registryPoll:
+    ///       interval: 60m
+    /// ```
+    fn default() -> Self {
+        OperatorHubCatalogSourceScore {
+            name: "operatorhubio-catalog".to_string(),
+            namespace: "openshift-marketplace".to_string(),
+            image: "quay.io/operatorhubio/catalog:latest".to_string(),
+        }
+    }
+}
+
+impl<T: Topology + K8sclient> Score<T> for OperatorHubCatalogSourceScore {
+    fn create_interpret(&self) -> Box<dyn Interpret<T>> {
+        let metadata = ObjectMeta {
+            name: Some(self.name.clone()),
+            namespace: Some(self.namespace.clone()),
+            ..ObjectMeta::default()
+        };
+
+        let spec = CatalogSourceSpec {
+            source_type: Some("grpc".to_string()),
+            image: Some(self.image.clone()),
+            display_name: Some("Operatorhub Operators".to_string()),
+            publisher: Some("OperatorHub.io".to_string()),
+            update_strategy: Some(UpdateStrategy {
+                registry_poll: Some(RegistryPoll {
+                    interval: Some("60m".to_string()),
+                }),
+            }),
+            ..CatalogSourceSpec::default()
+        };
+
+        let catalog_source = CatalogSource {
+            metadata,
+            spec: spec,
+        };
+
+        K8sResourceScore::single(catalog_source, Some(self.namespace.clone())).create_interpret()
+    }
+
+    fn name(&self) -> String {
+        format!("OperatorHubCatalogSourceScore({})", self.name)
+    }
+}
--- a/harmony/src/modules/k8s/mod.rs
+++ b/harmony/src/modules/k8s/mod.rs
@@ -2,3 +2,4 @@ pub mod deployment;
 pub mod ingress;
 pub mod namespace;
 pub mod resource;
+pub mod apps;
--- a/harmony/src/modules/mod.rs
+++ b/harmony/src/modules/mod.rs
@@ -17,3 +17,4 @@ pub mod prometheus;
 pub mod storage;
 pub mod tenant;
 pub mod tftp;
+pub mod postgresql;
--- a/harmony/src/modules/okd/crd/nmstate.rs
+++ b/harmony/src/modules/okd/crd/nmstate.rs
@@ -417,6 +417,7 @@ pub struct EthernetSpec {
 #[serde(rename_all = "kebab-case")]
 pub struct BondSpec {
    pub mode: String,
+    #[serde(alias = "port")]
    pub ports: Vec<String>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub options: Option<BTreeMap<String, Value>>,
--- a/harmony/src/modules/postgresql/capability.rs
+++ b/harmony/src/modules/postgresql/capability.rs
@@ -0,0 +1,82 @@
+use async_trait::async_trait;
+use harmony_types::storage::StorageSize;
+use serde::Serialize;
+use std::collections::HashMap;
+
+#[async_trait]
+pub trait PostgreSQL: Send + Sync {
+    async fn deploy(&self, config: &PostgreSQLConfig) -> Result<String, String>;
+
+    /// Extracts PostgreSQL-specific replication certs (PEM format) from a deployed primary cluster.
+    /// Abstracts away storage/retrieval details (e.g., secrets, files).
+    async fn get_replication_certs(&self, cluster_name: &str) -> Result<ReplicationCerts, String>;
+
+    /// Gets the internal/private endpoint (e.g., k8s service FQDN:5432) for the cluster.
+    async fn get_endpoint(&self, cluster_name: &str) -> Result<PostgreSQLEndpoint, String>;
+
+    /// Gets the public/externally routable endpoint if configured (e.g., OKD Route:443 for TLS passthrough).
+    /// Returns None if no public endpoint (internal-only cluster).
+    /// UNSTABLE: This is opinionated for initial multisite use cases. Networking abstraction is complex
+    /// (cf. k8s Ingress -> Gateway API evolution); may move to higher-order Networking/PostgreSQLNetworking trait.
+    async fn get_public_endpoint(&self, cluster_name: &str) -> Result<Option<PostgreSQLEndpoint>, String>;
+}
+
+#[derive(Clone, Debug, Serialize)]
+pub struct PostgreSQLConfig {
+    pub cluster_name: String,
+    pub instances: u32,
+    pub storage_size: StorageSize,
+    pub role: PostgreSQLClusterRole,
+}
+
+#[derive(Clone, Debug, Serialize)]
+pub enum PostgreSQLClusterRole {
+    Primary,
+    Replica(ReplicaConfig),
+}
+
+#[derive(Clone, Debug, Serialize)]
+pub struct ReplicaConfig {
+    /// Name of the primary cluster this replica will sync from
+    pub primary_cluster_name: String,
+    /// Certs extracted from primary via Topology::get_replication_certs()
+    pub replication_certs: ReplicationCerts,
+    /// Bootstrap method (e.g., pg_basebackup from primary)
+    pub bootstrap: BootstrapConfig,
+    /// External cluster connection details for CNPG spec.externalClusters
+    pub external_cluster: ExternalClusterConfig,
+}
+
+#[derive(Clone, Debug, Serialize)]
+pub struct BootstrapConfig {
+    pub strategy: BootstrapStrategy,
+}
+
+#[derive(Clone, Debug, Serialize)]
+pub enum BootstrapStrategy {
+    PgBasebackup,
+}
+
+#[derive(Clone, Debug, Serialize)]
+pub struct ExternalClusterConfig {
+    /// Name used in CNPG externalClusters list
+    pub name: String,
+    /// Connection params (host/port set by multisite logic, sslmode='verify-ca', etc.)
+    pub connection_parameters: HashMap<String, String>,
+}
+
+#[derive(Clone, Debug, Serialize)]
+pub struct ReplicationCerts {
+    /// PEM-encoded CA cert from primary
+    pub ca_cert_pem: String,
+    /// PEM-encoded streaming_replica client cert (tls.crt)
+    pub streaming_replica_cert_pem: String,
+    /// PEM-encoded streaming_replica client key (tls.key)
+    pub streaming_replica_key_pem: String,
+}
+
+#[derive(Clone, Debug)]
+pub struct PostgreSQLEndpoint {
+    pub host: String,
+    pub port: u16,
+}
--- a/harmony/src/modules/postgresql/failover.rs
+++ b/harmony/src/modules/postgresql/failover.rs
@@ -0,0 +1,125 @@
+use async_trait::async_trait;
+use log::debug;
+use log::info;
+use std::collections::HashMap;
+
+use crate::{
+    modules::postgresql::capability::{
+        BootstrapConfig, BootstrapStrategy, ExternalClusterConfig, PostgreSQL,
+        PostgreSQLClusterRole, PostgreSQLConfig, PostgreSQLEndpoint, ReplicaConfig,
+        ReplicationCerts,
+    },
+    topology::FailoverTopology,
+};
+
+#[async_trait]
+impl<T: PostgreSQL> PostgreSQL for FailoverTopology<T> {
+    async fn deploy(&self, config: &PostgreSQLConfig) -> Result<String, String> {
+        info!(
+            "Starting deployment of failover topology '{}'",
+            config.cluster_name
+        );
+
+        let primary_config = PostgreSQLConfig {
+            cluster_name: config.cluster_name.clone(),
+            instances: config.instances,
+            storage_size: config.storage_size.clone(),
+            role: PostgreSQLClusterRole::Primary,
+        };
+
+        info!(
+            "Deploying primary cluster '{{}}' ({} instances, {:?} storage)",
+            primary_config.cluster_name, primary_config.storage_size
+        );
+
+        let primary_cluster_name = self.primary.deploy(&primary_config).await?;
+
+        info!("Primary cluster '{primary_cluster_name}' deployed successfully");
+
+        info!("Retrieving replication certificates for primary '{primary_cluster_name}'");
+
+        let certs = self
+            .primary
+            .get_replication_certs(&primary_cluster_name)
+            .await?;
+
+        info!("Replication certificates retrieved successfully");
+
+        info!("Retrieving public endpoint for primary '{primary_cluster_name}");
+
+        let endpoint = self
+            .primary
+            .get_public_endpoint(&primary_cluster_name)
+            .await?
+            .ok_or_else(|| "No public endpoint configured on primary cluster".to_string())?;
+
+        info!(
+            "Public endpoint '{}:{}' retrieved for primary",
+            endpoint.host, endpoint.port
+        );
+
+        info!("Configuring replica connection parameters and bootstrap");
+
+        let mut connection_parameters = HashMap::new();
+        connection_parameters.insert("host".to_string(), endpoint.host);
+        connection_parameters.insert("port".to_string(), endpoint.port.to_string());
+        connection_parameters.insert("dbname".to_string(), "postgres".to_string());
+        connection_parameters.insert("user".to_string(), "streaming_replica".to_string());
+        connection_parameters.insert("sslmode".to_string(), "verify-ca".to_string());
+        connection_parameters.insert("sslnegotiation".to_string(), "direct".to_string());
+
+        debug!("Replica connection parameters: {:?}", connection_parameters);
+
+        let external_cluster = ExternalClusterConfig {
+            name: primary_cluster_name.clone(),
+            connection_parameters,
+        };
+
+        let bootstrap_config = BootstrapConfig {
+            strategy: BootstrapStrategy::PgBasebackup,
+        };
+
+        let replica_cluster_config = ReplicaConfig {
+            primary_cluster_name: primary_cluster_name.clone(),
+            replication_certs: certs,
+            bootstrap: bootstrap_config,
+            external_cluster,
+        };
+
+        let replica_config = PostgreSQLConfig {
+            cluster_name: format!("{}-replica", primary_cluster_name),
+            instances: config.instances,
+            storage_size: config.storage_size.clone(),
+            role: PostgreSQLClusterRole::Replica(replica_cluster_config),
+        };
+
+        info!(
+            "Deploying replica cluster '{}' ({} instances, {:?} storage) on replica topology",
+            replica_config.cluster_name, replica_config.instances, replica_config.storage_size
+        );
+
+        self.replica.deploy(&replica_config).await?;
+
+        info!(
+            "Replica cluster '{}' deployed successfully; failover topology '{}' ready",
+            replica_config.cluster_name, config.cluster_name
+        );
+
+        Ok(primary_cluster_name)
+    }
+
+    async fn get_replication_certs(&self, cluster_name: &str) -> Result<ReplicationCerts, String> {
+        self.primary.get_replication_certs(cluster_name).await
+    }
+
+    async fn get_endpoint(&self, cluster_name: &str) -> Result<PostgreSQLEndpoint, String> {
+        self.primary.get_endpoint(cluster_name).await
+    }
+
+    async fn get_public_endpoint(
+        &self,
+        cluster_name: &str,
+    ) -> Result<Option<PostgreSQLEndpoint>, String> {
+        self.primary.get_public_endpoint(cluster_name).await
+    }
+}
--- a/harmony/src/modules/postgresql/mod.rs
+++ b/harmony/src/modules/postgresql/mod.rs
@@ -0,0 +1,7 @@
+
+pub mod capability;
+mod score;
+
+
+pub mod failover;
+
--- a/harmony/src/modules/postgresql/score.rs
+++ b/harmony/src/modules/postgresql/score.rs
@@ -0,0 +1,88 @@
+use crate::{
+    domain::{data::Version, interpret::InterpretStatus},
+    interpret::{Interpret, InterpretError, InterpretName, Outcome},
+    inventory::Inventory,
+    modules::postgresql::capability::PostgreSQL,
+    score::Score,
+    topology::Topology,
+};
+
+use super::capability::*;
+
+use harmony_types::id::Id;
+
+use async_trait::async_trait;
+use log::info;
+use serde::Serialize;
+
+#[derive(Clone, Debug, Serialize)]
+pub struct PostgreSQLScore {
+    config: PostgreSQLConfig,
+}
+
+#[derive(Debug, Clone)]
+pub struct PostgreSQLInterpret {
+    config: PostgreSQLConfig,
+    version: Version,
+    status: InterpretStatus,
+}
+
+impl PostgreSQLInterpret {
+    pub fn new(config: PostgreSQLConfig) -> Self {
+        let version = Version::from("1.0.0").expect("Version should be valid");
+        Self {
+            config,
+            version,
+            status: InterpretStatus::QUEUED,
+        }
+    }
+}
+
+impl<T: Topology + PostgreSQL> Score<T> for PostgreSQLScore {
+    fn name(&self) -> String {
+        "PostgreSQLScore".to_string()
+    }
+
+    fn create_interpret(&self) -> Box<dyn Interpret<T>> {
+        Box::new(PostgreSQLInterpret::new(self.config.clone()))
+    }
+}
+
+#[async_trait]
+impl<T: Topology + PostgreSQL> Interpret<T> for PostgreSQLInterpret {
+    fn get_name(&self) -> InterpretName {
+        InterpretName::Custom("PostgreSQLInterpret")
+    }
+
+    fn get_version(&self) -> crate::domain::data::Version {
+        self.version.clone()
+    }
+
+    fn get_status(&self) -> InterpretStatus {
+        self.status.clone()
+    }
+
+    fn get_children(&self) -> Vec<Id> {
+        todo!()
+    }
+
+    async fn execute(
+        &self,
+        _inventory: &Inventory,
+        topology: &T,
+    ) -> Result<Outcome, InterpretError> {
+        info!(
+            "Executing PostgreSQLInterpret with config {:?}",
+            self.config
+        );
+
+        let cluster_name = topology
+            .deploy(&self.config)
+            .await
+            .map_err(|e| InterpretError::from(e))?;
+
+        Ok(Outcome::success(format!(
+            "Deployed PostgreSQL cluster `{cluster_name}`"
+        )))
+    }
+}
--- a/harmony_types/src/lib.rs
+++ b/harmony_types/src/lib.rs
@@ -1,3 +1,4 @@
 pub mod id;
 pub mod net;
 pub mod switch;
+pub mod storage;
--- a/harmony_types/src/net.rs
+++ b/harmony_types/src/net.rs
@@ -1,6 +1,6 @@
 use serde::{Deserialize, Serialize};

-#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash, Serialize, Deserialize, PartialOrd, Ord)]
+#[derive(Copy, Clone, PartialEq, Eq, Hash, Serialize, Deserialize, PartialOrd, Ord)]
 pub struct MacAddress(pub [u8; 6]);

 impl MacAddress {
@@ -19,6 +19,14 @@ impl From<&MacAddress> for String {
    }
 }

+impl std::fmt::Debug for MacAddress {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        f.debug_tuple("MacAddress")
+            .field(&String::from(self))
+            .finish()
+    }
+}
+
 impl std::fmt::Display for MacAddress {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.write_str(&String::from(self))
--- a/harmony_types/src/storage.rs
+++ b/harmony_types/src/storage.rs
@@ -0,0 +1,6 @@
+use serde::{Deserialize, Serialize};
+
+#[derive(Copy, Clone, PartialEq, Eq, Hash, Serialize, Deserialize, PartialOrd, Ord, Debug)]
+pub struct StorageSize {
+    size_bytes: u64,
+}
Author	SHA1	Message	Date
Jean-Gabriel Gill-Couture	bea386a2e2	wip(failover): Started implementation of the FailoverTopology with PostgreSQL capability Some checks failed Run Check Script / check (pull_request) Failing after 45s Details This is our first Higher Order Topology (see ADR-015)	2025-12-10 17:00:28 -05:00
Jean-Gabriel Gill-Couture	d39b1957cd	feat(k8s_app): OperatorhubCatalogSourceScore can now install the operatorhub catalogsource on a cluster that already has operator lifecycle manager installed	2025-12-10 16:58:58 -05:00
Jean-Gabriel Gill-Couture	357ca93d90	wip: FailoverTopology implementation for PostgreSQL on the way!	2025-12-10 13:12:53 -05:00
Jean-Gabriel Gill-Couture	8103932f23	doc: Initial documentation for the MultisitePostgreSQL module	2025-12-10 13:12:53 -05:00
johnride	9617e1cfde	Merge pull request 'adr: Higher order topologies' (#197 ) from adr/015-higher-order-topologies into master Some checks failed Compile and package harmony_composer / package_harmony_composer (push) Failing after 2m54s Details Run Check Script / check (push) Successful in 1m47s Details Reviewed-on: #197	2025-12-10 18:12:23 +00:00
Jean-Gabriel Gill-Couture	a953284386	doc: Add note about counter-intuitive behavior of nmstate Some checks failed Run Check Script / check (push) Successful in 1m33s Details Compile and package harmony_composer / package_harmony_composer (push) Failing after 2m28s Details	2025-12-09 23:04:15 -05:00
Jean-Gabriel Gill-Couture	bfde5f58ed	adr: Higher order topologies All checks were successful Run Check Script / check (pull_request) Successful in 1m33s Details These types of Topologies will orchestrate behavior in regular Topologies. For example, a FailoverTopology is a Higher Order, it will orchestrate its capabilities between a primary and a replica topology. A great use case for this is a database deployment. The FailoverTopology will deploy both instances, connect them, and the able to execute the appropriate actions to promote de replica to primary and revert back to original state. Other use cases are ShardedTopology, DecentralizedTopology, etc.	2025-12-09 11:23:30 -05:00
Ian Letourneau	83c1cc82b6	fix(host_network): remove extra fields from bond config to prevent clashes (#186 ) All checks were successful Run Check Script / check (push) Successful in 1m36s Details Compile and package harmony_composer / package_harmony_composer (push) Successful in 8m16s Details Also alias `port` to support both `port` and `ports` as per the nmstate spec. Reviewed-on: #186	2025-11-11 14:12:56 +00:00