docs/guides/writing-a-score.md: - Add Design Principles section: capabilities are industry concepts not tools, Scores encapsulate operational complexity, idempotency rules, no execution order dependencies CLAUDE.md: - Add Capability and Score Design Rules section with the swap test: if swapping the underlying tool breaks Scores, the capability boundary is wrong
8.9 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Build & Test Commands
# Full CI check (check + fmt + clippy + test)
./build/check.sh
# Individual commands
cargo check --all-targets --all-features --keep-going
cargo fmt --check # Check formatting
cargo clippy # Lint
cargo test # Run all tests
# Run a single test
cargo test -p <crate_name> <test_name>
# Run a specific example
cargo run -p <example_crate_name>
# Build the mdbook documentation
mdbook build
What Harmony Is
Harmony is the orchestration framework powering NationTech's vision of decentralized micro datacenters — small computing clusters deployed in homes, offices, and community spaces instead of hyperscaler facilities. The goal: make computing cleaner, more resilient, locally beneficial, and resistant to centralized points of failure (including geopolitical threats).
Harmony exists because existing IaC tools (Terraform, Ansible, Helm) are trapped in a YAML mud pit: static configuration files validated only at runtime, fragmented across tools, with errors surfacing at 3 AM instead of at compile time. Harmony replaces this entire class of tools with a single Rust codebase where the compiler catches infrastructure misconfigurations before anything is deployed.
This is not a wrapper around existing tools. It is a paradigm shift: infrastructure-as-real-code with compile-time safety guarantees that no YAML/HCL/DSL-based tool can provide.
The Score-Topology-Interpret Pattern
This is the core design pattern. Understand it before touching the codebase.
Score — declarative desired state. A Rust struct generic over T: Topology that describes what you want (e.g., "a PostgreSQL cluster", "DNS records for these hosts"). Scores are serializable, cloneable, idempotent.
Topology — infrastructure capabilities. Represents where things run and what the environment can do. Exposes capabilities as traits (DnsServer, K8sclient, HelmCommand, LoadBalancer, Firewall, etc.). Examples: K8sAnywhereTopology (local K3D or any K8s cluster), HAClusterTopology (bare-metal HA with redundant firewalls/switches).
Interpret — execution glue. Translates a Score into concrete operations against a Topology's capabilities. Returns an Outcome (SUCCESS, NOOP, FAILURE, RUNNING, QUEUED, BLOCKED).
The key insight — compile-time safety through trait bounds:
impl<T: Topology + DnsServer + DhcpServer> Score<T> for DnsScore { ... }
The compiler rejects any attempt to use DnsScore with a Topology that doesn't implement DnsServer and DhcpServer. Invalid infrastructure configurations become compilation errors, not runtime surprises.
Higher-order topologies compose transparently:
FailoverTopology<T>— primary/replica orchestrationDecentralizedTopology<T>— multi-site coordination
If T: PostgreSQL, then FailoverTopology<T>: PostgreSQL automatically via blanket impls. Zero boilerplate.
Architecture (Hexagonal)
harmony/src/
├── domain/ # Core domain — the heart of the framework
│ ├── score.rs # Score trait (desired state)
│ ├── topology/ # Topology trait + implementations
│ ├── interpret/ # Interpret trait + InterpretName enum (25+ variants)
│ ├── inventory/ # Physical infrastructure metadata (hosts, switches, mgmt interfaces)
│ ├── executors/ # Executor trait definitions
│ └── maestro/ # Orchestration engine (registers scores, manages topology state, executes)
├── infra/ # Infrastructure adapters (driven ports)
│ ├── opnsense/ # OPNsense firewall adapter
│ ├── brocade.rs # Brocade switch adapter
│ ├── kube.rs # Kubernetes executor
│ └── sqlx.rs # Database executor
└── modules/ # Concrete deployment modules (23+)
├── k8s/ # Kubernetes (namespaces, deployments, ingress)
├── postgresql/ # CloudNativePG clusters + multi-site failover
├── okd/ # OpenShift bare-metal from scratch
├── helm/ # Helm chart inflation → vanilla K8s YAML
├── opnsense/ # OPNsense (DHCP, DNS, etc.)
├── monitoring/ # Prometheus, Alertmanager, Grafana
├── kvm/ # KVM virtual machine management
├── network/ # Network services (iPXE, TFTP, bonds)
└── ...
Domain types to know: Inventory (read-only physical infra context), Maestro<T> (orchestrator — calls topology.ensure_ready() then executes scores), Outcome / InterpretError (execution results).
Key Crates
| Crate | Purpose |
|---|---|
harmony |
Core framework: domain, infra adapters, deployment modules |
harmony_cli |
CLI + optional TUI (--features tui) |
harmony_config |
Unified config+secret management (env → SQLite → OpenBao → interactive prompt) |
harmony_secret / harmony_secret_derive |
Secret backends (LocalFile, OpenBao, Infisical) |
harmony_execution |
Execution engine |
harmony_agent / harmony_inventory_agent |
Persistent agent framework (NATS JetStream mesh), hardware discovery |
harmony_assets |
Asset management (URLs, local cache, S3) |
harmony_composer |
Infrastructure composition tool |
harmony-k8s |
Kubernetes utilities |
k3d |
Local K3D cluster management |
brocade |
Brocade network switch integration |
OPNsense Crates
The opnsense-codegen and opnsense-api crates exist because OPNsense's automation ecosystem is poor — no typed API client exists. These are support crates, not the core of Harmony.
opnsense-codegen: XML model files → IR → Rust structs with serde helpers for OPNsense wire format quirks (opn_boolfor "0"/"1" strings,opn_u16/opn_u32for string-encoded numbers). Vendor sources are git submodules underopnsense-codegen/vendor/.opnsense-api: Hand-writtenOpnsenseClient+ generated model types insrc/generated/.
Key Design Decisions (ADRs in docs/adr/)
- ADR-001: Rust chosen for type system, refactoring safety, and performance
- ADR-002: Hexagonal architecture — domain isolated from adapters
- ADR-003: Infrastructure abstractions at domain level, not provider level (no vendor lock-in)
- ADR-005: Custom Rust DSL over YAML/Score-spec — real language, Cargo deps, composable
- ADR-007: K3D as default runtime (K8s-certified, lightweight, cross-platform)
- ADR-009: Helm charts inflated to vanilla K8s YAML, then deployed via existing code paths
- ADR-015: Higher-order topologies via blanket trait impls (zero-cost composition)
- ADR-016: Agent-based architecture with NATS JetStream for real-time failover and distributed consensus
- ADR-020: Unified config+secret management — Rust struct is the schema, resolution chain: env → store → prompt
Capability and Score Design Rules
Capabilities are industry concepts, not tools. A capability trait represents a standard infrastructure need (e.g., DnsServer, LoadBalancer, Router, CertificateManagement) that can be fulfilled by different products. OPNsense provides DnsServer today; CoreDNS or Route53 could provide it tomorrow. Scores must not break when the backend changes.
Exception: When the developer fundamentally needs to know the implementation. PostgreSQL is a capability (not Database) because the developer writes PostgreSQL-specific SQL and replication configs. Swapping to MariaDB would break the application, not just the infrastructure.
Test: If you could swap the underlying tool without rewriting any Score that uses the capability, the boundary is correct.
Don't name capabilities after tools. SecretVault not OpenbaoStore. IdentityProvider not ZitadelAuth. Think: what is the core developer need that leads to using this tool?
Scores encapsulate operational complexity. Move procedural knowledge (init sequences, retry logic, distribution-specific config) into Scores. A high-level example should be ~15 lines, not ~400 lines of imperative orchestration.
Scores must be idempotent. Running twice = same result as once. Use create-or-update, handle "already exists" gracefully.
Scores must not depend on execution order. Declare capability requirements via trait bounds, don't assume another Score ran first. If Score B needs what Score A provides, Score B should declare that capability as a trait bound.
See docs/guides/writing-a-score.md for the full guide.
Conventions
- Rust edition 2024, resolver v2
- Conventional commits:
feat:,fix:,chore:,docs:,refactor: - Small PRs: max ~200 lines (excluding generated code), single-purpose
- License: GNU AGPL v3
- Quality bar: This framework demands high-quality engineering. The type system is a feature, not a burden. Leverage it. Prefer compile-time guarantees over runtime checks. Abstractions should be domain-level, not provider-specific.