Merge pull request 'adr: Add ADR on multi tenancy using namespace based customer isolation' (#41) from adr/multi-tenancy into master
Reviewed-on: https://git.nationtech.io/NationTech/harmony/pulls/41
This commit is contained in:
commit
6e7148a945
@ -1,6 +1,6 @@
|
||||
# Architecture Decision Record: \<Title\>
|
||||
|
||||
Name: \<Name\>
|
||||
Initial Author: \<Name\>
|
||||
|
||||
Initial Date: \<Date\>
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
# Architecture Decision Record: Helm and Kustomize Handling
|
||||
|
||||
Name: Taha Hawa
|
||||
Initial Author: Taha Hawa
|
||||
|
||||
Initial Date: 2025-04-15
|
||||
|
||||
|
@ -1,7 +1,7 @@
|
||||
# Architecture Decision Record: Monitoring and Alerting
|
||||
|
||||
Proposed by: Willem Rolleman
|
||||
Date: April 28 2025
|
||||
Initial Author : Willem Rolleman
|
||||
Date : April 28 2025
|
||||
|
||||
## Status
|
||||
|
||||
|
160
adr/011-multi-tenant-cluster.md
Normal file
160
adr/011-multi-tenant-cluster.md
Normal file
@ -0,0 +1,160 @@
|
||||
# Architecture Decision Record: Multi-Tenancy Strategy for Harmony Managed Clusters
|
||||
|
||||
Initial Author: Jean-Gabriel Gill-Couture
|
||||
|
||||
Initial Date: 2025-05-26
|
||||
|
||||
## Status
|
||||
|
||||
Proposed
|
||||
|
||||
## Context
|
||||
|
||||
Harmony manages production OKD/Kubernetes clusters that serve multiple clients with varying trust levels and operational requirements. We need a multi-tenancy strategy that provides:
|
||||
|
||||
1. **Strong isolation** between client workloads while maintaining operational simplicity
|
||||
2. **Controlled API access** allowing clients self-service capabilities within defined boundaries
|
||||
3. **Security-first approach** protecting both the cluster infrastructure and tenant data
|
||||
4. **Harmony-native implementation** using our Score/Interpret pattern for automated tenant provisioning
|
||||
5. **Scalable management** supporting both small trusted clients and larger enterprise customers
|
||||
|
||||
The official Kubernetes multi-tenancy documentation identifies two primary models: namespace-based isolation and virtual control planes per tenant. Given Harmony's focus on operational simplicity, provider-agnostic abstractions (ADR-003), and hexagonal architecture (ADR-002), we must choose an approach that balances security, usability, and maintainability.
|
||||
|
||||
Our clients represent a hybrid tenancy model:
|
||||
- **Customer multi-tenancy**: Each client operates independently with no cross-tenant trust
|
||||
- **Team multi-tenancy**: Individual clients may have multiple team members requiring coordinated access
|
||||
- **API access requirement**: Unlike pure SaaS scenarios, clients need controlled Kubernetes API access for self-service operations
|
||||
|
||||
The official kubernetes documentation on multi tenancy heavily inspired this ADR : https://kubernetes.io/docs/concepts/security/multi-tenancy/
|
||||
|
||||
## Decision
|
||||
|
||||
Implement **namespace-based multi-tenancy** with the following architecture:
|
||||
|
||||
### 1. Network Security Model
|
||||
- **Private cluster access**: Kubernetes API and OpenShift console accessible only via WireGuard VPN
|
||||
- **No public exposure**: Control plane endpoints remain internal to prevent unauthorized access attempts
|
||||
- **VPN-based authentication**: Initial access control through WireGuard client certificates
|
||||
|
||||
### 2. Tenant Isolation Strategy
|
||||
- **Dedicated namespace per tenant**: Each client receives an isolated namespace with access limited only to the required resources and operations
|
||||
- **Complete network isolation**: NetworkPolicies prevent cross-namespace communication while allowing full egress to public internet
|
||||
- **Resource governance**: ResourceQuotas and LimitRanges enforce CPU, memory, and storage consumption limits
|
||||
- **Storage access control**: Clients can create PersistentVolumeClaims but cannot directly manipulate PersistentVolumes or access other tenants' storage
|
||||
|
||||
### 3. Access Control Framework
|
||||
- **Principle of Least Privilege**: RBAC grants only necessary permissions within tenant namespace scope
|
||||
- **Namespace-scoped**: Clients can create/modify/delete resources within their namespace
|
||||
- **Cluster-level restrictions**: No access to cluster-wide resources, other namespaces, or sensitive cluster operations
|
||||
- **Whitelisted operations**: Controlled self-service capabilities for ingress, secrets, configmaps, and workload management
|
||||
|
||||
### 4. Identity Management Evolution
|
||||
- **Phase 1**: Manual provisioning of VPN access and Kubernetes ServiceAccounts/Users
|
||||
- **Phase 2**: Migration to Keycloak-based identity management (aligning with ADR-006) for centralized authentication and lifecycle management
|
||||
|
||||
### 5. Harmony Integration
|
||||
- **TenantScore implementation**: Declarative tenant provisioning using Harmony's Score/Interpret pattern
|
||||
- **Topology abstraction**: Tenant configuration abstracted from underlying Kubernetes implementation details
|
||||
- **Automated deployment**: Complete tenant setup automated through Harmony's orchestration capabilities
|
||||
|
||||
## Rationale
|
||||
|
||||
### Network Security Through VPN Access
|
||||
- **Defense in depth**: VPN requirement adds critical security layer preventing unauthorized cluster access
|
||||
- **Simplified firewall rules**: No need for complex public endpoint protections or rate limiting
|
||||
- **Audit capability**: VPN access provides clear audit trail of cluster connections
|
||||
- **Aligns with enterprise practices**: Most enterprise customers already use VPN infrastructure
|
||||
|
||||
### Namespace Isolation vs Virtual Control Planes
|
||||
Following Kubernetes official guidance, namespace isolation provides:
|
||||
- **Lower resource overhead**: Virtual control planes require dedicated etcd, API server, and controller manager per tenant
|
||||
- **Operational simplicity**: Single control plane to maintain, upgrade, and monitor
|
||||
- **Cross-tenant service integration**: Enables future controlled cross-tenant communication if required
|
||||
- **Proven stability**: Namespace-based isolation is well-tested and widely deployed
|
||||
- **Cost efficiency**: Significantly lower infrastructure costs compared to dedicated control planes
|
||||
|
||||
### Hybrid Tenancy Model Suitability
|
||||
Our approach addresses both customer and team multi-tenancy requirements:
|
||||
- **Customer isolation**: Strong network and RBAC boundaries prevent cross-tenant interference
|
||||
- **Team collaboration**: Multiple team members can share namespace access through group-based RBAC
|
||||
- **Self-service balance**: Controlled API access enables client autonomy without compromising security
|
||||
|
||||
### Harmony Architecture Alignment
|
||||
- **Provider agnostic**: TenantScore abstracts multi-tenancy concepts, enabling future support for other Kubernetes distributions
|
||||
- **Hexagonal architecture**: Tenant management becomes an infrastructure capability accessed through well-defined ports
|
||||
- **Declarative automation**: Tenant lifecycle fully managed through Harmony's Score execution model
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive Consequences
|
||||
- **Strong security posture**: VPN + namespace isolation provides robust tenant separation
|
||||
- **Operational efficiency**: Single cluster management with automated tenant provisioning
|
||||
- **Client autonomy**: Self-service capabilities reduce operational support burden
|
||||
- **Scalable architecture**: Can support hundreds of tenants per cluster without architectural changes
|
||||
- **Future flexibility**: Foundation supports evolution to more sophisticated multi-tenancy models
|
||||
- **Cost optimization**: Shared infrastructure maximizes resource utilization
|
||||
|
||||
### Negative Consequences
|
||||
- **VPN operational overhead**: Requires VPN infrastructure management
|
||||
- **Manual provisioning complexity**: Phase 1 manual user management creates administrative burden
|
||||
- **Network policy dependency**: Requires CNI with NetworkPolicy support (OVN-Kubernetes provides this and is the OKD/Openshift default)
|
||||
- **Cluster-wide resource limitations**: Some advanced Kubernetes features require cluster-wide access
|
||||
- **Single point of failure**: Cluster outage affects all tenants simultaneously
|
||||
|
||||
### Migration Challenges
|
||||
- **Legacy client integration**: Existing clients may need VPN client setup and credential migration
|
||||
- **Monitoring complexity**: Per-tenant observability requires careful metric and log segmentation
|
||||
- **Backup considerations**: Tenant data backup must respect isolation boundaries
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### Alternative 1: Virtual Control Plane Per Tenant
|
||||
**Pros**: Complete control plane isolation, full Kubernetes API access per tenant
|
||||
**Cons**: 3-5x higher resource usage, complex cross-tenant networking, operational complexity scales linearly with tenants
|
||||
|
||||
**Rejected**: Resource overhead incompatible with cost-effective multi-tenancy goals
|
||||
|
||||
### Alternative 2: Dedicated Clusters Per Tenant
|
||||
**Pros**: Maximum isolation, independent upgrade cycles, simplified security model
|
||||
**Cons**: Exponential operational complexity, prohibitive costs, resource waste
|
||||
|
||||
**Rejected**: Operational overhead makes this approach unsustainable for multiple clients
|
||||
|
||||
### Alternative 3: Public API with Advanced Authentication
|
||||
**Pros**: No VPN requirement, potentially simpler client access
|
||||
**Cons**: Larger attack surface, complex rate limiting and DDoS protection, increased security monitoring requirements
|
||||
|
||||
**Rejected**: Risk/benefit analysis favors VPN-based access control
|
||||
|
||||
### Alternative 4: Service Mesh Based Isolation
|
||||
**Pros**: Fine-grained traffic control, encryption, advanced observability
|
||||
**Cons**: Significant operational complexity, performance overhead, steep learning curve
|
||||
|
||||
**Rejected**: Complexity overhead outweighs benefits for current requirements; remains option for future enhancement
|
||||
|
||||
## Additional Notes
|
||||
|
||||
### Implementation Roadmap
|
||||
1. **Phase 1**: Implement VPN access and manual tenant provisioning
|
||||
2. **Phase 2**: Deploy TenantScore automation for namespace, RBAC, and NetworkPolicy management
|
||||
3. **Phase 3**: Integrate Keycloak for centralized identity management
|
||||
4. **Phase 4**: Add advanced monitoring and per-tenant observability
|
||||
|
||||
### TenantScore Structure Preview
|
||||
```rust
|
||||
pub struct TenantScore {
|
||||
pub tenant_config: TenantConfig,
|
||||
pub resource_quotas: ResourceQuotaConfig,
|
||||
pub network_isolation: NetworkIsolationPolicy,
|
||||
pub storage_access: StorageAccessConfig,
|
||||
pub rbac_config: RBACConfig,
|
||||
}
|
||||
```
|
||||
|
||||
### Future Enhancements
|
||||
- **Cross-tenant service mesh**: For approved inter-tenant communication
|
||||
- **Advanced monitoring**: Per-tenant Prometheus/Grafana instances
|
||||
- **Backup automation**: Tenant-scoped backup policies
|
||||
- **Cost allocation**: Detailed per-tenant resource usage tracking
|
||||
|
||||
This ADR establishes the foundation for secure, scalable multi-tenancy in Harmony-managed clusters while maintaining operational simplicity and cost effectiveness. A follow-up ADR will detail the Tenant abstraction and user management mechanisms within the Harmony framework.
|
Loading…
Reference in New Issue
Block a user