iobench
A command-line I/O benchmarking tool using fio. Runs locally, over SSH, or on Kubernetes pods. Includes a Redpanda storage characterization profile for validating Ceph RBD suitability.
Build
cargo build -p iobench --release
Simple profile (default)
Runs standard fio benchmarks (sequential/random read/write, single/multi-job) against a local or remote target.
# Local
iobench
# Over SSH
iobench --target ssh/user@host
# On a Kubernetes pod
iobench --target k8s/namespace/pod-name
# Custom parameters
iobench --duration 60 --size 4G --block-size 128k --tests read,write,randwrite
Available tests: read, write, randread, randwrite, multiread, multiwrite, multirandread, multirandwrite.
Results are saved to ./iobench-{timestamp}/summary.csv.
Redpanda profile
Characterizes whether a Kubernetes storage backend (e.g. Ceph RBD) can sustain Redpanda's I/O patterns. Deploys a 3-pod StatefulSet with per-node PVCs and runs four fio workloads that model Redpanda's actual storage behavior.
Quick start
# Run the full profile (sequential baselines, then parallel contention test)
iobench redpanda --storage-class ceph-block
# Sequential baselines only (single-node, no cluster impact)
iobench redpanda --mode sequential --storage-class ceph-block
# Parallel contention test only
iobench redpanda --mode parallel --storage-class ceph-block
# Keep pods alive after the run (useful for re-running or debugging)
iobench redpanda --storage-class ceph-block --keep-deployment
# Deploy pods without running any workloads
iobench deploy --storage-class ceph-block
# Remove all iobench pods and PVCs
iobench undeploy
Options
| Flag | Default | Description |
|---|---|---|
--mode |
both |
sequential, parallel, or both |
--storage-class |
ceph-block |
Kubernetes StorageClass for the PVCs |
--pvc-size |
50Gi |
Size of each PVC |
--replicas |
3 |
Number of pods (should match cluster node count) |
--namespace |
default |
Kubernetes namespace |
--output-dir |
auto-timestamped | Directory for results |
--keep-deployment |
false |
Don't delete pods/PVCs after the run |
Workloads
The profile runs four workloads, each targeting a different aspect of Redpanda's I/O:
| Workload | What it models | Key params | Runtime |
|---|---|---|---|
throughput |
Bulk segment writes (optimistic upper bound) | 16K seq write, no fsync, 4 jobs, iodepth=16 | 5 min |
fsync_hot_path |
Raft commit path with acks=all |
16K seq write, fdatasync per op, 4 jobs, iodepth=4 | 10 min |
selftest_512k |
rpk cluster self-test 512K phase |
512K seq write, fdatasync, 1 job, iodepth=4 | 2 min |
selftest_4k_qd1 |
Worst-case single-stream commit | 4K seq write, fdatasync, 1 job, iodepth=1 | 2 min |
All workloads use direct=1 (bypass page cache) and ioengine=libaio.
Execution modes
sequential-- Runs all four workloads back-to-back on a single node. Clean per-pattern baselines, comparable to Redpanda's published hardware requirements (16,000 IOPS minimum per broker).parallel-- Runs all four workloads concurrently across all nodes simultaneously. Each node writes to its own PVC. A wall-clock barrier synchronizes start times across pods so contention windows overlap. This is the production-shape test that exposes the Ceph OSD fan-in pattern.both(default) -- Sequential first, then parallel.
Interpreting results
The headline metric is the worst p99.9 fdatasync latency on the fsync_hot_path workload during parallel execution:
- <= 100 ms: PASS. Storage can sustain Raft heartbeats under load.
- > 100 ms: FAIL. Raft heartbeats (150 ms interval, 1.5 s election timeout) will not survive load spikes, leading to election storms.
Reference values from healthy NVMe (from rpk cluster self-test):
selftest_512k: ~1182 IOPS, ~591 MiB/sselftest_4k_qd1: ~406 IOPS
Output
Results are saved to ./iobench-redpanda-{timestamp}/:
iobench-redpanda-2026-05-03-143022/
redpanda_summary.csv # Full results: all metrics, all nodes, all workloads
iobench.csv # Dashboard-compatible format
sequential/
throughput_node-0.json # Raw fio JSON per workload per node
fsync_hot_path_node-0.json
...
node-0/ # Per-node time-series logs
throughput_lat.1.log
throughput_iops.1.log
...
parallel/
throughput_node-0.json
throughput_node-1.json
throughput_node-2.json
...
The CSV includes per-workload per-node: IOPS, bandwidth, completion latency percentiles (p50-p99.99, max), and fdatasync latency percentiles (p50-p99.99, max).
Per-second time-series logs (*_lat.*.log, *_iops.*.log) capture the bimodal/spiky behavior of Ceph under load that summary statistics miss.
Warning
Parallel mode generates heavy fdatasync workloads across all nodes simultaneously. This will impact other workloads on the same Ceph pool. Run during a maintenance window or off-peak.
Dashboard
A Plotly Dash app for visualizing results. Supports both simple and Redpanda profile data.
cd iobench/dash
virtualenv venv
source venv/bin/activate
pip install -r requirements_freeze.txt
# Copy result CSVs into the dash directory, then:
python iobench-dash.py
The dashboard reads iobench.csv and/or redpanda_summary.csv from its working directory.