adr: Add ADR on multi tenancy using namespace based customer isolation #41
| @ -1,6 +1,6 @@ | |||||||
| # Architecture Decision Record: \<Title\> | # Architecture Decision Record: \<Title\> | ||||||
| 
 | 
 | ||||||
| Name: \<Name\> | Initial Author: \<Name\> | ||||||
| 
 | 
 | ||||||
| Initial Date: \<Date\> | Initial Date: \<Date\> | ||||||
| 
 | 
 | ||||||
|  | |||||||
| @ -1,6 +1,6 @@ | |||||||
| # Architecture Decision Record: Helm and Kustomize Handling | # Architecture Decision Record: Helm and Kustomize Handling | ||||||
| 
 | 
 | ||||||
| Name: Taha Hawa | Initial Author: Taha Hawa | ||||||
| 
 | 
 | ||||||
| Initial Date: 2025-04-15 | Initial Date: 2025-04-15 | ||||||
| 
 | 
 | ||||||
|  | |||||||
| @ -1,7 +1,7 @@ | |||||||
| # Architecture Decision Record: Monitoring and Alerting | # Architecture Decision Record: Monitoring and Alerting | ||||||
| 
 | 
 | ||||||
| Proposed by: Willem Rolleman | Initial Author : Willem Rolleman | ||||||
| Date: April 28 2025 | Date : April 28 2025 | ||||||
| 
 | 
 | ||||||
| ## Status | ## Status | ||||||
| 
 | 
 | ||||||
|  | |||||||
							
								
								
									
										160
									
								
								adr/011-multi-tenant-cluster.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										160
									
								
								adr/011-multi-tenant-cluster.md
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,160 @@ | |||||||
|  | # Architecture Decision Record: Multi-Tenancy Strategy for Harmony Managed Clusters | ||||||
|  | 
 | ||||||
|  | Initial Author: Jean-Gabriel Gill-Couture | ||||||
|  | 
 | ||||||
|  | Initial Date: 2025-05-26 | ||||||
|  | 
 | ||||||
|  | ## Status | ||||||
|  | 
 | ||||||
|  | Proposed | ||||||
|  | 
 | ||||||
|  | ## Context | ||||||
|  | 
 | ||||||
|  | Harmony manages production OKD/Kubernetes clusters that serve multiple clients with varying trust levels and operational requirements. We need a multi-tenancy strategy that provides: | ||||||
|  | 
 | ||||||
|  | 1. **Strong isolation** between client workloads while maintaining operational simplicity | ||||||
|  | 2. **Controlled API access** allowing clients self-service capabilities within defined boundaries | ||||||
|  | 3. **Security-first approach** protecting both the cluster infrastructure and tenant data | ||||||
|  | 4. **Harmony-native implementation** using our Score/Interpret pattern for automated tenant provisioning | ||||||
|  | 5. **Scalable management** supporting both small trusted clients and larger enterprise customers | ||||||
|  | 
 | ||||||
|  | The official Kubernetes multi-tenancy documentation identifies two primary models: namespace-based isolation and virtual control planes per tenant. Given Harmony's focus on operational simplicity, provider-agnostic abstractions (ADR-003), and hexagonal architecture (ADR-002), we must choose an approach that balances security, usability, and maintainability. | ||||||
|  | 
 | ||||||
|  | Our clients represent a hybrid tenancy model: | ||||||
|  | - **Customer multi-tenancy**: Each client operates independently with no cross-tenant trust | ||||||
|  | - **Team multi-tenancy**: Individual clients may have multiple team members requiring coordinated access | ||||||
|  | - **API access requirement**: Unlike pure SaaS scenarios, clients need controlled Kubernetes API access for self-service operations | ||||||
|  | 
 | ||||||
|  | The official kubernetes documentation on multi tenancy heavily inspired this ADR : https://kubernetes.io/docs/concepts/security/multi-tenancy/ | ||||||
|  | 
 | ||||||
|  | ## Decision | ||||||
|  | 
 | ||||||
|  | Implement **namespace-based multi-tenancy** with the following architecture: | ||||||
|  | 
 | ||||||
|  | ### 1. Network Security Model | ||||||
|  | - **Private cluster access**: Kubernetes API and OpenShift console accessible only via WireGuard VPN | ||||||
|  | - **No public exposure**: Control plane endpoints remain internal to prevent unauthorized access attempts | ||||||
|  | - **VPN-based authentication**: Initial access control through WireGuard client certificates | ||||||
|  | 
 | ||||||
|  | ### 2. Tenant Isolation Strategy | ||||||
|  | - **Dedicated namespace per tenant**: Each client receives an isolated namespace with access limited only to the required resources and operations | ||||||
|  | - **Complete network isolation**: NetworkPolicies prevent cross-namespace communication while allowing full egress to public internet | ||||||
|  | - **Resource governance**: ResourceQuotas and LimitRanges enforce CPU, memory, and storage consumption limits | ||||||
|  | - **Storage access control**: Clients can create PersistentVolumeClaims but cannot directly manipulate PersistentVolumes or access other tenants' storage | ||||||
|  | 
 | ||||||
|  | ### 3. Access Control Framework | ||||||
|  | - **Principle of Least Privilege**: RBAC grants only necessary permissions within tenant namespace scope | ||||||
|  | - **Namespace-scoped**: Clients can create/modify/delete resources within their namespace | ||||||
|  | - **Cluster-level restrictions**: No access to cluster-wide resources, other namespaces, or sensitive cluster operations | ||||||
|  | - **Whitelisted operations**: Controlled self-service capabilities for ingress, secrets, configmaps, and workload management | ||||||
|  | 
 | ||||||
|  | ### 4. Identity Management Evolution | ||||||
|  | - **Phase 1**: Manual provisioning of VPN access and Kubernetes ServiceAccounts/Users | ||||||
|  | - **Phase 2**: Migration to Keycloak-based identity management (aligning with ADR-006) for centralized authentication and lifecycle management | ||||||
|  | 
 | ||||||
|  | ### 5. Harmony Integration | ||||||
|  | - **TenantScore implementation**: Declarative tenant provisioning using Harmony's Score/Interpret pattern | ||||||
|  | - **Topology abstraction**: Tenant configuration abstracted from underlying Kubernetes implementation details | ||||||
|  | - **Automated deployment**: Complete tenant setup automated through Harmony's orchestration capabilities | ||||||
|  | 
 | ||||||
|  | ## Rationale | ||||||
|  | 
 | ||||||
|  | ### Network Security Through VPN Access | ||||||
|  | - **Defense in depth**: VPN requirement adds critical security layer preventing unauthorized cluster access | ||||||
|  | - **Simplified firewall rules**: No need for complex public endpoint protections or rate limiting | ||||||
|  | - **Audit capability**: VPN access provides clear audit trail of cluster connections | ||||||
|  | - **Aligns with enterprise practices**: Most enterprise customers already use VPN infrastructure | ||||||
|  | 
 | ||||||
|  | ### Namespace Isolation vs Virtual Control Planes | ||||||
|  | Following Kubernetes official guidance, namespace isolation provides: | ||||||
|  | - **Lower resource overhead**: Virtual control planes require dedicated etcd, API server, and controller manager per tenant | ||||||
|  | - **Operational simplicity**: Single control plane to maintain, upgrade, and monitor | ||||||
|  | - **Cross-tenant service integration**: Enables future controlled cross-tenant communication if required | ||||||
|  | - **Proven stability**: Namespace-based isolation is well-tested and widely deployed | ||||||
|  | - **Cost efficiency**: Significantly lower infrastructure costs compared to dedicated control planes | ||||||
|  | 
 | ||||||
|  | ### Hybrid Tenancy Model Suitability | ||||||
|  | Our approach addresses both customer and team multi-tenancy requirements: | ||||||
|  | - **Customer isolation**: Strong network and RBAC boundaries prevent cross-tenant interference | ||||||
|  | - **Team collaboration**: Multiple team members can share namespace access through group-based RBAC | ||||||
|  | - **Self-service balance**: Controlled API access enables client autonomy without compromising security | ||||||
|  | 
 | ||||||
|  | ### Harmony Architecture Alignment | ||||||
|  | - **Provider agnostic**: TenantScore abstracts multi-tenancy concepts, enabling future support for other Kubernetes distributions | ||||||
|  | - **Hexagonal architecture**: Tenant management becomes an infrastructure capability accessed through well-defined ports | ||||||
|  | - **Declarative automation**: Tenant lifecycle fully managed through Harmony's Score execution model | ||||||
|  | 
 | ||||||
|  | ## Consequences | ||||||
|  | 
 | ||||||
|  | ### Positive Consequences | ||||||
|  | - **Strong security posture**: VPN + namespace isolation provides robust tenant separation | ||||||
|  | - **Operational efficiency**: Single cluster management with automated tenant provisioning | ||||||
|  | - **Client autonomy**: Self-service capabilities reduce operational support burden | ||||||
|  | - **Scalable architecture**: Can support hundreds of tenants per cluster without architectural changes | ||||||
|  | - **Future flexibility**: Foundation supports evolution to more sophisticated multi-tenancy models | ||||||
|  | - **Cost optimization**: Shared infrastructure maximizes resource utilization | ||||||
|  | 
 | ||||||
|  | ### Negative Consequences | ||||||
|  | - **VPN operational overhead**: Requires VPN infrastructure management | ||||||
|  | - **Manual provisioning complexity**: Phase 1 manual user management creates administrative burden | ||||||
|  | - **Network policy dependency**: Requires CNI with NetworkPolicy support (OVN-Kubernetes provides this and is the OKD/Openshift default) | ||||||
|  | - **Cluster-wide resource limitations**: Some advanced Kubernetes features require cluster-wide access | ||||||
|  | - **Single point of failure**: Cluster outage affects all tenants simultaneously | ||||||
|  | 
 | ||||||
|  | ### Migration Challenges | ||||||
|  | - **Legacy client integration**: Existing clients may need VPN client setup and credential migration | ||||||
|  | - **Monitoring complexity**: Per-tenant observability requires careful metric and log segmentation | ||||||
|  | - **Backup considerations**: Tenant data backup must respect isolation boundaries | ||||||
|  | 
 | ||||||
|  | ## Alternatives Considered | ||||||
|  | 
 | ||||||
|  | ### Alternative 1: Virtual Control Plane Per Tenant | ||||||
|  | **Pros**: Complete control plane isolation, full Kubernetes API access per tenant | ||||||
|  | **Cons**: 3-5x higher resource usage, complex cross-tenant networking, operational complexity scales linearly with tenants | ||||||
|  | 
 | ||||||
|  | **Rejected**: Resource overhead incompatible with cost-effective multi-tenancy goals | ||||||
|  | 
 | ||||||
|  | ### Alternative 2: Dedicated Clusters Per Tenant | ||||||
|  | **Pros**: Maximum isolation, independent upgrade cycles, simplified security model | ||||||
|  | **Cons**: Exponential operational complexity, prohibitive costs, resource waste | ||||||
|  | 
 | ||||||
|  | **Rejected**: Operational overhead makes this approach unsustainable for multiple clients | ||||||
|  | 
 | ||||||
|  | ### Alternative 3: Public API with Advanced Authentication | ||||||
|  | **Pros**: No VPN requirement, potentially simpler client access | ||||||
|  | **Cons**: Larger attack surface, complex rate limiting and DDoS protection, increased security monitoring requirements | ||||||
|  | 
 | ||||||
|  | **Rejected**: Risk/benefit analysis favors VPN-based access control | ||||||
|  | 
 | ||||||
|  | ### Alternative 4: Service Mesh Based Isolation | ||||||
|  | **Pros**: Fine-grained traffic control, encryption, advanced observability | ||||||
|  | **Cons**: Significant operational complexity, performance overhead, steep learning curve | ||||||
|  | 
 | ||||||
|  | **Rejected**: Complexity overhead outweighs benefits for current requirements; remains option for future enhancement | ||||||
|  | 
 | ||||||
|  | ## Additional Notes | ||||||
|  | 
 | ||||||
|  | ### Implementation Roadmap | ||||||
|  | 1. **Phase 1**: Implement VPN access and manual tenant provisioning | ||||||
|  | 2. **Phase 2**: Deploy TenantScore automation for namespace, RBAC, and NetworkPolicy management | ||||||
|  | 3. **Phase 3**: Integrate Keycloak for centralized identity management | ||||||
|  | 4. **Phase 4**: Add advanced monitoring and per-tenant observability | ||||||
|  | 
 | ||||||
|  | ### TenantScore Structure Preview | ||||||
|  | ```rust | ||||||
|  | pub struct TenantScore { | ||||||
|  |     pub tenant_config: TenantConfig, | ||||||
|  |     pub resource_quotas: ResourceQuotaConfig, | ||||||
|  |     pub network_isolation: NetworkIsolationPolicy, | ||||||
|  |     pub storage_access: StorageAccessConfig, | ||||||
|  |     pub rbac_config: RBACConfig, | ||||||
|  | } | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | ### Future Enhancements | ||||||
|  | - **Cross-tenant service mesh**: For approved inter-tenant communication | ||||||
|  | - **Advanced monitoring**: Per-tenant Prometheus/Grafana instances | ||||||
|  | - **Backup automation**: Tenant-scoped backup policies | ||||||
|  | - **Cost allocation**: Detailed per-tenant resource usage tracking | ||||||
|  | 
 | ||||||
|  | This ADR establishes the foundation for secure, scalable multi-tenancy in Harmony-managed clusters while maintaining operational simplicity and cost effectiveness. A follow-up ADR will detail the Tenant abstraction and user management mechanisms within the Harmony framework. | ||||||
		Loading…
	
		Reference in New Issue
	
	Block a user