Files
harmony/iobench/README.md
2026-05-03 09:47:21 -04:00

147 lines
5.4 KiB
Markdown

# iobench
A command-line I/O benchmarking tool using fio. Runs locally, over SSH, or on Kubernetes pods. Includes a Redpanda storage characterization profile for validating Ceph RBD suitability.
## Build
```bash
cargo build -p iobench --release
```
## Simple profile (default)
Runs standard fio benchmarks (sequential/random read/write, single/multi-job) against a local or remote target.
```bash
# Local
iobench
# Over SSH
iobench --target ssh/user@host
# On a Kubernetes pod
iobench --target k8s/namespace/pod-name
# Custom parameters
iobench --duration 60 --size 4G --block-size 128k --tests read,write,randwrite
```
Available tests: `read`, `write`, `randread`, `randwrite`, `multiread`, `multiwrite`, `multirandread`, `multirandwrite`.
Results are saved to `./iobench-{timestamp}/summary.csv`.
## Redpanda profile
Characterizes whether a Kubernetes storage backend (e.g. Ceph RBD) can sustain Redpanda's I/O patterns. Deploys a 3-pod StatefulSet with per-node PVCs and runs four fio workloads that model Redpanda's actual storage behavior.
### Quick start
```bash
# Run the full profile (sequential baselines, then parallel contention test)
iobench redpanda --storage-class ceph-block
# Sequential baselines only (single-node, no cluster impact)
iobench redpanda --mode sequential --storage-class ceph-block
# Parallel contention test only
iobench redpanda --mode parallel --storage-class ceph-block
# Keep pods alive after the run (useful for re-running or debugging)
iobench redpanda --storage-class ceph-block --keep-deployment
# Deploy pods without running any workloads
iobench deploy --storage-class ceph-block
# Remove all iobench pods and PVCs
iobench undeploy
```
### Options
| Flag | Default | Description |
|------|---------|-------------|
| `--mode` | `both` | `sequential`, `parallel`, or `both` |
| `--storage-class` | `ceph-block` | Kubernetes StorageClass for the PVCs |
| `--pvc-size` | `50Gi` | Size of each PVC |
| `--replicas` | `3` | Number of pods (should match cluster node count) |
| `--namespace` | `default` | Kubernetes namespace |
| `--output-dir` | auto-timestamped | Directory for results |
| `--keep-deployment` | `false` | Don't delete pods/PVCs after the run |
### Workloads
The profile runs four workloads, each targeting a different aspect of Redpanda's I/O:
| Workload | What it models | Key params | Runtime |
|----------|---------------|------------|---------|
| `throughput` | Bulk segment writes (optimistic upper bound) | 16K seq write, no fsync, 4 jobs, iodepth=16 | 5 min |
| `fsync_hot_path` | Raft commit path with `acks=all` | 16K seq write, fdatasync per op, 4 jobs, iodepth=4 | 10 min |
| `selftest_512k` | `rpk cluster self-test` 512K phase | 512K seq write, fdatasync, 1 job, iodepth=4 | 2 min |
| `selftest_4k_qd1` | Worst-case single-stream commit | 4K seq write, fdatasync, 1 job, iodepth=1 | 2 min |
All workloads use `direct=1` (bypass page cache) and `ioengine=libaio`.
### Execution modes
- **`sequential`** -- Runs all four workloads back-to-back on a single node. Clean per-pattern baselines, comparable to Redpanda's published hardware requirements (16,000 IOPS minimum per broker).
- **`parallel`** -- Runs all four workloads concurrently across all nodes simultaneously. Each node writes to its own PVC. A wall-clock barrier synchronizes start times across pods so contention windows overlap. This is the production-shape test that exposes the Ceph OSD fan-in pattern.
- **`both`** (default) -- Sequential first, then parallel.
### Interpreting results
The **headline metric** is the worst p99.9 fdatasync latency on the `fsync_hot_path` workload during parallel execution:
- **<= 100 ms**: PASS. Storage can sustain Raft heartbeats under load.
- **> 100 ms**: FAIL. Raft heartbeats (150 ms interval, 1.5 s election timeout) will not survive load spikes, leading to election storms.
Reference values from healthy NVMe (from `rpk cluster self-test`):
- `selftest_512k`: ~1182 IOPS, ~591 MiB/s
- `selftest_4k_qd1`: ~406 IOPS
### Output
Results are saved to `./iobench-redpanda-{timestamp}/`:
```
iobench-redpanda-2026-05-03-143022/
redpanda_summary.csv # Full results: all metrics, all nodes, all workloads
iobench.csv # Dashboard-compatible format
sequential/
throughput_node-0.json # Raw fio JSON per workload per node
fsync_hot_path_node-0.json
...
node-0/ # Per-node time-series logs
throughput_lat.1.log
throughput_iops.1.log
...
parallel/
throughput_node-0.json
throughput_node-1.json
throughput_node-2.json
...
```
The CSV includes per-workload per-node: IOPS, bandwidth, completion latency percentiles (p50-p99.99, max), and fdatasync latency percentiles (p50-p99.99, max).
Per-second time-series logs (`*_lat.*.log`, `*_iops.*.log`) capture the bimodal/spiky behavior of Ceph under load that summary statistics miss.
### Warning
Parallel mode generates heavy fdatasync workloads across all nodes simultaneously. This **will impact other workloads on the same Ceph pool**. Run during a maintenance window or off-peak.
## Dashboard
A Plotly Dash app for visualizing results. Supports both simple and Redpanda profile data.
```bash
cd iobench/dash
virtualenv venv
source venv/bin/activate
pip install -r requirements_freeze.txt
# Copy result CSVs into the dash directory, then:
python iobench-dash.py
```
The dashboard reads `iobench.csv` and/or `redpanda_summary.csv` from its working directory.