harmony/iobench/README.md

# iobench

A command-line I/O benchmarking tool using fio. Runs locally, over SSH, or on Kubernetes pods. Includes a Redpanda storage characterization profile for validating Ceph RBD suitability.

## Build

```bash
cargo build -p iobench --release
```

## Simple profile (default)

Runs standard fio benchmarks (sequential/random read/write, single/multi-job) against a local or remote target.

```bash
# Local
iobench

# Over SSH
iobench --target ssh/user@host

# On a Kubernetes pod
iobench --target k8s/namespace/pod-name

# Custom parameters
iobench --duration 60 --size 4G --block-size 128k --tests read,write,randwrite
```

Available tests: `read`, `write`, `randread`, `randwrite`, `multiread`, `multiwrite`, `multirandread`, `multirandwrite`.

Results are saved to `./iobench-{timestamp}/summary.csv`.

## Redpanda profile

Characterizes whether a Kubernetes storage backend (e.g. Ceph RBD) can sustain Redpanda's I/O patterns. Deploys a 3-pod StatefulSet with per-node PVCs and runs four fio workloads that model Redpanda's actual storage behavior.

### Quick start

```bash
# Run the full profile (sequential baselines, then parallel contention test)
iobench redpanda --storage-class ceph-block

# Sequential baselines only (single-node, no cluster impact)
iobench redpanda --mode sequential --storage-class ceph-block

# Parallel contention test only
iobench redpanda --mode parallel --storage-class ceph-block

# Keep pods alive after the run (useful for re-running or debugging)
iobench redpanda --storage-class ceph-block --keep-deployment

# Deploy pods without running any workloads
iobench deploy --storage-class ceph-block

# Remove all iobench pods and PVCs
iobench undeploy
```

### Options

| Flag | Default | Description |
|------|---------|-------------|
| `--mode` | `both` | `sequential`, `parallel`, or `both` |
| `--storage-class` | `ceph-block` | Kubernetes StorageClass for the PVCs |
| `--pvc-size` | `50Gi` | Size of each PVC |
| `--replicas` | `3` | Number of pods (should match cluster node count) |
| `--namespace` | `default` | Kubernetes namespace |
| `--output-dir` | auto-timestamped | Directory for results |
| `--keep-deployment` | `false` | Don't delete pods/PVCs after the run |

### Workloads

The profile runs four workloads, each targeting a different aspect of Redpanda's I/O:

| Workload | What it models | Key params | Runtime |
|----------|---------------|------------|---------|
| `throughput` | Bulk segment writes (optimistic upper bound) | 16K seq write, no fsync, 4 jobs, iodepth=16 | 5 min |
| `fsync_hot_path` | Raft commit path with `acks=all` | 16K seq write, fdatasync per op, 4 jobs, iodepth=4 | 10 min |
| `selftest_512k` | `rpk cluster self-test` 512K phase | 512K seq write, fdatasync, 1 job, iodepth=4 | 2 min |
| `selftest_4k_qd1` | Worst-case single-stream commit | 4K seq write, fdatasync, 1 job, iodepth=1 | 2 min |

All workloads use `direct=1` (bypass page cache) and `ioengine=libaio`.

### Execution modes

- **`sequential`** -- Runs all four workloads back-to-back on a single node. Clean per-pattern baselines, comparable to Redpanda's published hardware requirements (16,000 IOPS minimum per broker).
- **`parallel`** -- Runs all four workloads concurrently across all nodes simultaneously. Each node writes to its own PVC. A wall-clock barrier synchronizes start times across pods so contention windows overlap. This is the production-shape test that exposes the Ceph OSD fan-in pattern.
- **`both`** (default) -- Sequential first, then parallel.

### Interpreting results

The **headline metric** is the worst p99.9 fdatasync latency on the `fsync_hot_path` workload during parallel execution:

- **<= 100 ms**: PASS. Storage can sustain Raft heartbeats under load.
- **> 100 ms**: FAIL. Raft heartbeats (150 ms interval, 1.5 s election timeout) will not survive load spikes, leading to election storms.

Reference values from healthy NVMe (from `rpk cluster self-test`):
- `selftest_512k`: ~1182 IOPS, ~591 MiB/s
- `selftest_4k_qd1`: ~406 IOPS

### Output

Results are saved to `./iobench-redpanda-{timestamp}/`:

```
iobench-redpanda-2026-05-03-143022/
  redpanda_summary.csv          # Full results: all metrics, all nodes, all workloads
  iobench.csv                   # Dashboard-compatible format
  sequential/
    throughput_node-0.json       # Raw fio JSON per workload per node
    fsync_hot_path_node-0.json
    ...
    node-0/                      # Per-node time-series logs
      throughput_lat.1.log
      throughput_iops.1.log
      ...
  parallel/
    throughput_node-0.json
    throughput_node-1.json
    throughput_node-2.json
    ...
```

The CSV includes per-workload per-node: IOPS, bandwidth, completion latency percentiles (p50-p99.99, max), and fdatasync latency percentiles (p50-p99.99, max).

Per-second time-series logs (`*_lat.*.log`, `*_iops.*.log`) capture the bimodal/spiky behavior of Ceph under load that summary statistics miss.

### Warning

Parallel mode generates heavy fdatasync workloads across all nodes simultaneously. This **will impact other workloads on the same Ceph pool**. Run during a maintenance window or off-peak.

## Dashboard

A Plotly Dash app for visualizing results. Supports both simple and Redpanda profile data.

```bash
cd iobench/dash
virtualenv venv
source venv/bin/activate
pip install -r requirements_freeze.txt

# Copy result CSVs into the dash directory, then:
python iobench-dash.py
```

The dashboard reads `iobench.csv` and/or `redpanda_summary.csv` from its working directory.