diff --git a/adr/000-ADR-Template.md b/adr/000-ADR-Template.md index b9592cb..87f9479 100644 --- a/adr/000-ADR-Template.md +++ b/adr/000-ADR-Template.md @@ -1,6 +1,6 @@ # Architecture Decision Record: \ -Name: \ +Initial Author: \ Initial Date: \ diff --git a/adr/009-helm-and-kustomize-handling.md b/adr/009-helm-and-kustomize-handling.md index dbebb8a..24bb48d 100644 --- a/adr/009-helm-and-kustomize-handling.md +++ b/adr/009-helm-and-kustomize-handling.md @@ -1,6 +1,6 @@ # Architecture Decision Record: Helm and Kustomize Handling -Name: Taha Hawa +Initial Author: Taha Hawa Initial Date: 2025-04-15 diff --git a/adr/010-monitoring-and-alerting.md b/adr/010-monitoring-and-alerting.md index a91968b..ed0651e 100644 --- a/adr/010-monitoring-and-alerting.md +++ b/adr/010-monitoring-and-alerting.md @@ -1,7 +1,7 @@ # Architecture Decision Record: Monitoring and Alerting -Proposed by: Willem Rolleman -Date: April 28 2025 +Initial Author : Willem Rolleman +Date : April 28 2025 ## Status diff --git a/adr/011-multi-tenant-cluster.md b/adr/011-multi-tenant-cluster.md new file mode 100644 index 0000000..73cd824 --- /dev/null +++ b/adr/011-multi-tenant-cluster.md @@ -0,0 +1,160 @@ +# Architecture Decision Record: Multi-Tenancy Strategy for Harmony Managed Clusters + +Initial Author: Jean-Gabriel Gill-Couture + +Initial Date: 2025-05-26 + +## Status + +Proposed + +## Context + +Harmony manages production OKD/Kubernetes clusters that serve multiple clients with varying trust levels and operational requirements. We need a multi-tenancy strategy that provides: + +1. **Strong isolation** between client workloads while maintaining operational simplicity +2. **Controlled API access** allowing clients self-service capabilities within defined boundaries +3. **Security-first approach** protecting both the cluster infrastructure and tenant data +4. **Harmony-native implementation** using our Score/Interpret pattern for automated tenant provisioning +5. **Scalable management** supporting both small trusted clients and larger enterprise customers + +The official Kubernetes multi-tenancy documentation identifies two primary models: namespace-based isolation and virtual control planes per tenant. Given Harmony's focus on operational simplicity, provider-agnostic abstractions (ADR-003), and hexagonal architecture (ADR-002), we must choose an approach that balances security, usability, and maintainability. + +Our clients represent a hybrid tenancy model: +- **Customer multi-tenancy**: Each client operates independently with no cross-tenant trust +- **Team multi-tenancy**: Individual clients may have multiple team members requiring coordinated access +- **API access requirement**: Unlike pure SaaS scenarios, clients need controlled Kubernetes API access for self-service operations + +The official kubernetes documentation on multi tenancy heavily inspired this ADR : https://kubernetes.io/docs/concepts/security/multi-tenancy/ + +## Decision + +Implement **namespace-based multi-tenancy** with the following architecture: + +### 1. Network Security Model +- **Private cluster access**: Kubernetes API and OpenShift console accessible only via WireGuard VPN +- **No public exposure**: Control plane endpoints remain internal to prevent unauthorized access attempts +- **VPN-based authentication**: Initial access control through WireGuard client certificates + +### 2. Tenant Isolation Strategy +- **Dedicated namespace per tenant**: Each client receives an isolated namespace with access limited only to the required resources and operations +- **Complete network isolation**: NetworkPolicies prevent cross-namespace communication while allowing full egress to public internet +- **Resource governance**: ResourceQuotas and LimitRanges enforce CPU, memory, and storage consumption limits +- **Storage access control**: Clients can create PersistentVolumeClaims but cannot directly manipulate PersistentVolumes or access other tenants' storage + +### 3. Access Control Framework +- **Principle of Least Privilege**: RBAC grants only necessary permissions within tenant namespace scope +- **Namespace-scoped**: Clients can create/modify/delete resources within their namespace +- **Cluster-level restrictions**: No access to cluster-wide resources, other namespaces, or sensitive cluster operations +- **Whitelisted operations**: Controlled self-service capabilities for ingress, secrets, configmaps, and workload management + +### 4. Identity Management Evolution +- **Phase 1**: Manual provisioning of VPN access and Kubernetes ServiceAccounts/Users +- **Phase 2**: Migration to Keycloak-based identity management (aligning with ADR-006) for centralized authentication and lifecycle management + +### 5. Harmony Integration +- **TenantScore implementation**: Declarative tenant provisioning using Harmony's Score/Interpret pattern +- **Topology abstraction**: Tenant configuration abstracted from underlying Kubernetes implementation details +- **Automated deployment**: Complete tenant setup automated through Harmony's orchestration capabilities + +## Rationale + +### Network Security Through VPN Access +- **Defense in depth**: VPN requirement adds critical security layer preventing unauthorized cluster access +- **Simplified firewall rules**: No need for complex public endpoint protections or rate limiting +- **Audit capability**: VPN access provides clear audit trail of cluster connections +- **Aligns with enterprise practices**: Most enterprise customers already use VPN infrastructure + +### Namespace Isolation vs Virtual Control Planes +Following Kubernetes official guidance, namespace isolation provides: +- **Lower resource overhead**: Virtual control planes require dedicated etcd, API server, and controller manager per tenant +- **Operational simplicity**: Single control plane to maintain, upgrade, and monitor +- **Cross-tenant service integration**: Enables future controlled cross-tenant communication if required +- **Proven stability**: Namespace-based isolation is well-tested and widely deployed +- **Cost efficiency**: Significantly lower infrastructure costs compared to dedicated control planes + +### Hybrid Tenancy Model Suitability +Our approach addresses both customer and team multi-tenancy requirements: +- **Customer isolation**: Strong network and RBAC boundaries prevent cross-tenant interference +- **Team collaboration**: Multiple team members can share namespace access through group-based RBAC +- **Self-service balance**: Controlled API access enables client autonomy without compromising security + +### Harmony Architecture Alignment +- **Provider agnostic**: TenantScore abstracts multi-tenancy concepts, enabling future support for other Kubernetes distributions +- **Hexagonal architecture**: Tenant management becomes an infrastructure capability accessed through well-defined ports +- **Declarative automation**: Tenant lifecycle fully managed through Harmony's Score execution model + +## Consequences + +### Positive Consequences +- **Strong security posture**: VPN + namespace isolation provides robust tenant separation +- **Operational efficiency**: Single cluster management with automated tenant provisioning +- **Client autonomy**: Self-service capabilities reduce operational support burden +- **Scalable architecture**: Can support hundreds of tenants per cluster without architectural changes +- **Future flexibility**: Foundation supports evolution to more sophisticated multi-tenancy models +- **Cost optimization**: Shared infrastructure maximizes resource utilization + +### Negative Consequences +- **VPN operational overhead**: Requires VPN infrastructure management +- **Manual provisioning complexity**: Phase 1 manual user management creates administrative burden +- **Network policy dependency**: Requires CNI with NetworkPolicy support (OVN-Kubernetes provides this and is the OKD/Openshift default) +- **Cluster-wide resource limitations**: Some advanced Kubernetes features require cluster-wide access +- **Single point of failure**: Cluster outage affects all tenants simultaneously + +### Migration Challenges +- **Legacy client integration**: Existing clients may need VPN client setup and credential migration +- **Monitoring complexity**: Per-tenant observability requires careful metric and log segmentation +- **Backup considerations**: Tenant data backup must respect isolation boundaries + +## Alternatives Considered + +### Alternative 1: Virtual Control Plane Per Tenant +**Pros**: Complete control plane isolation, full Kubernetes API access per tenant +**Cons**: 3-5x higher resource usage, complex cross-tenant networking, operational complexity scales linearly with tenants + +**Rejected**: Resource overhead incompatible with cost-effective multi-tenancy goals + +### Alternative 2: Dedicated Clusters Per Tenant +**Pros**: Maximum isolation, independent upgrade cycles, simplified security model +**Cons**: Exponential operational complexity, prohibitive costs, resource waste + +**Rejected**: Operational overhead makes this approach unsustainable for multiple clients + +### Alternative 3: Public API with Advanced Authentication +**Pros**: No VPN requirement, potentially simpler client access +**Cons**: Larger attack surface, complex rate limiting and DDoS protection, increased security monitoring requirements + +**Rejected**: Risk/benefit analysis favors VPN-based access control + +### Alternative 4: Service Mesh Based Isolation +**Pros**: Fine-grained traffic control, encryption, advanced observability +**Cons**: Significant operational complexity, performance overhead, steep learning curve + +**Rejected**: Complexity overhead outweighs benefits for current requirements; remains option for future enhancement + +## Additional Notes + +### Implementation Roadmap +1. **Phase 1**: Implement VPN access and manual tenant provisioning +2. **Phase 2**: Deploy TenantScore automation for namespace, RBAC, and NetworkPolicy management +3. **Phase 3**: Integrate Keycloak for centralized identity management +4. **Phase 4**: Add advanced monitoring and per-tenant observability + +### TenantScore Structure Preview +```rust +pub struct TenantScore { + pub tenant_config: TenantConfig, + pub resource_quotas: ResourceQuotaConfig, + pub network_isolation: NetworkIsolationPolicy, + pub storage_access: StorageAccessConfig, + pub rbac_config: RBACConfig, +} +``` + +### Future Enhancements +- **Cross-tenant service mesh**: For approved inter-tenant communication +- **Advanced monitoring**: Per-tenant Prometheus/Grafana instances +- **Backup automation**: Tenant-scoped backup policies +- **Cost allocation**: Detailed per-tenant resource usage tracking + +This ADR establishes the foundation for secure, scalable multi-tenancy in Harmony-managed clusters while maintaining operational simplicity and cost effectiveness. A follow-up ADR will detail the Tenant abstraction and user management mechanisms within the Harmony framework.