harmony/docs/coding-guide.md

# Harmony Coding Guide

Harmony is an infrastructure automation framework. It is **code-first and code-only**: operators write Rust programs to declare and drive infrastructure, rather than YAML files or DSL configs. Good code here means a good operator experience.

### Concrete context

We use here the context of the KVM module to explain the coding style. This will make it very easy to understand and should translate quite well to other modules/contexts managed by Harmony like OPNSense and Kubernetes.

## Core Philosophy

### High-level functions over raw primitives

Callers should not need to know about underlying protocols, XML schemas, or API quirks. A function that deploys a VM should accept meaningful parameters like CPU count, memory, and network name — not XML strings.

```rust
// Bad: caller constructs XML and passes it to a thin wrapper
let xml = format!(r#"<domain type='kvm'>...</domain>"#, name, memory_kb, ...);
executor.create_vm(&xml).await?;

// Good: caller describes intent, the module handles representation
executor.define_vm(&VmConfig::builder("my-vm")
    .cpu(4)
    .memory_gb(8)
    .disk(DiskConfig::new(50))
    .network(NetworkRef::named("mylan"))
    .boot_order([BootDevice::Network, BootDevice::Disk])
    .build())
    .await?;
```

The module owns the XML, the virsh invocations, the API calls — not the caller.

### Use the right abstraction layer

Prefer native library bindings over shelling out to CLI tools. The `virt` crate provides direct libvirt bindings and should be used instead of spawning `virsh` subprocesses.

- CLI subprocess calls are fragile: stdout/stderr parsing, exit codes, quoting, PATH differences
- Native bindings give typed errors, no temp files, no shell escaping
- `virt::connect::Connect` opens a connection; `virt::domain::Domain` manages VMs; `virt::network::Network` manages virtual networks

### Keep functions small and well-named

Each function should do one thing. If a function is doing two conceptually separate things, split it. Function names should read like plain English: `ensure_network_active`, `define_vm`, `vm_is_running`.

### Prefer short modules over large files

Group related types and functions by concept. A module that handles one resource (e.g., network, domain, storage) is better than a single file for everything.

---

## Error Handling

### Use `thiserror` for all error types

Define error types with `thiserror::Error`. This removes the boilerplate of implementing `Display` and `std::error::Error` by hand, keeps error messages close to their variants, and makes types easy to extend.

```rust
// Bad: hand-rolled Display + std::error::Error
#[derive(Debug)]
pub enum KVMError {
    ConnectionError(String),
    VMNotFound(String),
}

impl std::fmt::Display for KVMError { ... }
impl std::error::Error for KVMError {}

// Good: derive Display via thiserror
#[derive(thiserror::Error, Debug)]
pub enum KVMError {
    #[error("connection failed: {0}")]
    ConnectionFailed(String),
    #[error("VM not found: {name}")]
    VmNotFound { name: String },
}
```

### Make bubbling errors easy with `?` and `From`

`?` works on any error type for which there is a `From` impl. Add `From` conversions from lower-level errors into your module's error type so callers can use `?` without boilerplate.

With `thiserror`, wrapping a foreign error is one line:

```rust
#[derive(thiserror::Error, Debug)]
pub enum KVMError {
    #[error("libvirt error: {0}")]
    Libvirt(#[from] virt::error::Error),

    #[error("IO error: {0}")]
    Io(#[from] std::io::Error),
}
```

This means a call that returns `virt::error::Error` can be `?`-propagated into a `Result<_, KVMError>` without any `.map_err(...)`.

### Typed errors over stringly-typed errors

Avoid `Box<dyn Error>` or `String` as error return types in library code. Callers need to distinguish errors programmatically — `KVMError::VmAlreadyExists` is actionable, `"VM already exists: foo"` as a `String` is not.

At binary entry points (e.g., `main`) it is acceptable to convert to `String` or `anyhow::Error` for display.

---

## Logging

### Use the `log` crate macros

All log output must go through the `log` crate. Never use `println!`, `eprintln!`, or `dbg!` in library code. This makes output compatible with any logging backend (env_logger, tracing, structured logging, etc.).

```rust
// Bad
println!("Creating VM: {}", name);

// Good
use log::{info, debug, warn};
info!("Creating VM: {name}");
debug!("VM XML:\n{xml}");
warn!("Network already active, skipping creation");
```

Use the right level:

| Level   | When to use |
|---------|-------------|
| `error` | Unrecoverable failures (before returning Err) |
| `warn`  | Recoverable issues, skipped steps |
| `info`  | High-level progress events visible in normal operation |
| `debug` | Detailed operational info useful for debugging |
| `trace` | Very granular, per-iteration or per-call data |

Log before significant operations and after unexpected conditions. Do not log inside tight loops at `info` level.

---

## Types and Builders

### Derive `Serialize` on all public domain types

All public structs and enums that represent configuration or state should derive `serde::Serialize`. Add `Deserialize` when round-trip serialization is needed.

### Builder pattern for complex configs

When a type has more than three fields or optional fields, provide a builder. The builder pattern allows named, incremental construction without positional arguments.

```rust
let config = VmConfig::builder("bootstrap")
    .cpu(4)
    .memory_gb(8)
    .disk(DiskConfig::new(50).labeled("os"))
    .disk(DiskConfig::new(100).labeled("data"))
    .network(NetworkRef::named("harmonylan"))
    .boot_order([BootDevice::Network, BootDevice::Disk])
    .build();
```

### Avoid `pub` fields on config structs

Expose data through methods or the builder, not raw field access. This preserves the ability to validate, rename, or change representation without breaking callers.

---

## Async

### Use `tokio` for all async runtime needs

All async code runs on tokio. Use `tokio::spawn`, `tokio::time`, etc. Use `#[async_trait]` for traits with async methods.

### No blocking in async context

Never call blocking I/O (file I/O, network, process spawn) directly in an async function. Use `tokio::fs`, `tokio::process`, or `tokio::task::spawn_blocking` as appropriate.

---

## Module Structure

### Follow the `Score` / `Interpret` pattern

Modules that represent deployable infrastructure should implement `Score<T: Topology>` and `Interpret<T>`:

- `Score` is the serializable, clonable configuration declaring *what* to deploy
- `Interpret` does the actual work when `execute()` is called

```rust
pub struct KvmScore {
    network: NetworkConfig,
    vms: Vec<VmConfig>,
}

impl<T: Topology + KvmHost> Score<T> for KvmScore {
    fn create_interpret(&self) -> Box<dyn Interpret<T>> {
        Box::new(KvmInterpret::new(self.clone()))
    }
    fn name(&self) -> String { "KvmScore".to_string() }
}
```

### Flatten the public API in `mod.rs`

Internal submodules are implementation detail. Re-export what callers need at the module root:

```rust
// modules/kvm/mod.rs
mod connection;
mod domain;
mod network;
mod error;
mod xml;

pub use connection::KvmConnection;
pub use domain::{VmConfig, VmConfigBuilder, VmStatus, DiskConfig, BootDevice};
pub use error::KvmError;
pub use network::NetworkConfig;
```

---

## Commit Style

Follow [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/):

```
feat(kvm): add network isolation support
fix(kvm): correct memory unit conversion for libvirt
refactor(kvm): replace virsh subprocess calls with virt crate bindings
docs: add coding guide
```

Keep pull requests small and single-purpose (under ~200 lines excluding generated code). Do not mix refactoring, bug fixes, and new features in one PR.