Some checks failed
Run Check Script / check (pull_request) Failing after -44h57m24s
147 lines
5.4 KiB
Markdown
147 lines
5.4 KiB
Markdown
# iobench
|
|
|
|
A command-line I/O benchmarking tool using fio. Runs locally, over SSH, or on Kubernetes pods. Includes a Redpanda storage characterization profile for validating Ceph RBD suitability.
|
|
|
|
## Build
|
|
|
|
```bash
|
|
cargo build -p iobench --release
|
|
```
|
|
|
|
## Simple profile (default)
|
|
|
|
Runs standard fio benchmarks (sequential/random read/write, single/multi-job) against a local or remote target.
|
|
|
|
```bash
|
|
# Local
|
|
iobench
|
|
|
|
# Over SSH
|
|
iobench --target ssh/user@host
|
|
|
|
# On a Kubernetes pod
|
|
iobench --target k8s/namespace/pod-name
|
|
|
|
# Custom parameters
|
|
iobench --duration 60 --size 4G --block-size 128k --tests read,write,randwrite
|
|
```
|
|
|
|
Available tests: `read`, `write`, `randread`, `randwrite`, `multiread`, `multiwrite`, `multirandread`, `multirandwrite`.
|
|
|
|
Results are saved to `./iobench-{timestamp}/summary.csv`.
|
|
|
|
## Redpanda profile
|
|
|
|
Characterizes whether a Kubernetes storage backend (e.g. Ceph RBD) can sustain Redpanda's I/O patterns. Deploys a 3-pod StatefulSet with per-node PVCs and runs four fio workloads that model Redpanda's actual storage behavior.
|
|
|
|
### Quick start
|
|
|
|
```bash
|
|
# Run the full profile (sequential baselines, then parallel contention test)
|
|
iobench redpanda --storage-class ceph-block
|
|
|
|
# Sequential baselines only (single-node, no cluster impact)
|
|
iobench redpanda --mode sequential --storage-class ceph-block
|
|
|
|
# Parallel contention test only
|
|
iobench redpanda --mode parallel --storage-class ceph-block
|
|
|
|
# Keep pods alive after the run (useful for re-running or debugging)
|
|
iobench redpanda --storage-class ceph-block --keep-deployment
|
|
|
|
# Deploy pods without running any workloads
|
|
iobench deploy --storage-class ceph-block
|
|
|
|
# Remove all iobench pods and PVCs
|
|
iobench undeploy
|
|
```
|
|
|
|
### Options
|
|
|
|
| Flag | Default | Description |
|
|
|------|---------|-------------|
|
|
| `--mode` | `both` | `sequential`, `parallel`, or `both` |
|
|
| `--storage-class` | `ceph-block` | Kubernetes StorageClass for the PVCs |
|
|
| `--pvc-size` | `50Gi` | Size of each PVC |
|
|
| `--replicas` | `3` | Number of pods (should match cluster node count) |
|
|
| `--namespace` | `default` | Kubernetes namespace |
|
|
| `--output-dir` | auto-timestamped | Directory for results |
|
|
| `--keep-deployment` | `false` | Don't delete pods/PVCs after the run |
|
|
|
|
### Workloads
|
|
|
|
The profile runs four workloads, each targeting a different aspect of Redpanda's I/O:
|
|
|
|
| Workload | What it models | Key params | Runtime |
|
|
|----------|---------------|------------|---------|
|
|
| `throughput` | Bulk segment writes (optimistic upper bound) | 16K seq write, no fsync, 4 jobs, iodepth=16 | 5 min |
|
|
| `fsync_hot_path` | Raft commit path with `acks=all` | 16K seq write, fdatasync per op, 4 jobs, iodepth=4 | 10 min |
|
|
| `selftest_512k` | `rpk cluster self-test` 512K phase | 512K seq write, fdatasync, 1 job, iodepth=4 | 2 min |
|
|
| `selftest_4k_qd1` | Worst-case single-stream commit | 4K seq write, fdatasync, 1 job, iodepth=1 | 2 min |
|
|
|
|
All workloads use `direct=1` (bypass page cache) and `ioengine=libaio`.
|
|
|
|
### Execution modes
|
|
|
|
- **`sequential`** -- Runs all four workloads back-to-back on a single node. Clean per-pattern baselines, comparable to Redpanda's published hardware requirements (16,000 IOPS minimum per broker).
|
|
- **`parallel`** -- Runs all four workloads concurrently across all nodes simultaneously. Each node writes to its own PVC. A wall-clock barrier synchronizes start times across pods so contention windows overlap. This is the production-shape test that exposes the Ceph OSD fan-in pattern.
|
|
- **`both`** (default) -- Sequential first, then parallel.
|
|
|
|
### Interpreting results
|
|
|
|
The **headline metric** is the worst p99.9 fdatasync latency on the `fsync_hot_path` workload during parallel execution:
|
|
|
|
- **<= 100 ms**: PASS. Storage can sustain Raft heartbeats under load.
|
|
- **> 100 ms**: FAIL. Raft heartbeats (150 ms interval, 1.5 s election timeout) will not survive load spikes, leading to election storms.
|
|
|
|
Reference values from healthy NVMe (from `rpk cluster self-test`):
|
|
- `selftest_512k`: ~1182 IOPS, ~591 MiB/s
|
|
- `selftest_4k_qd1`: ~406 IOPS
|
|
|
|
### Output
|
|
|
|
Results are saved to `./iobench-redpanda-{timestamp}/`:
|
|
|
|
```
|
|
iobench-redpanda-2026-05-03-143022/
|
|
redpanda_summary.csv # Full results: all metrics, all nodes, all workloads
|
|
iobench.csv # Dashboard-compatible format
|
|
sequential/
|
|
throughput_node-0.json # Raw fio JSON per workload per node
|
|
fsync_hot_path_node-0.json
|
|
...
|
|
node-0/ # Per-node time-series logs
|
|
throughput_lat.1.log
|
|
throughput_iops.1.log
|
|
...
|
|
parallel/
|
|
throughput_node-0.json
|
|
throughput_node-1.json
|
|
throughput_node-2.json
|
|
...
|
|
```
|
|
|
|
The CSV includes per-workload per-node: IOPS, bandwidth, completion latency percentiles (p50-p99.99, max), and fdatasync latency percentiles (p50-p99.99, max).
|
|
|
|
Per-second time-series logs (`*_lat.*.log`, `*_iops.*.log`) capture the bimodal/spiky behavior of Ceph under load that summary statistics miss.
|
|
|
|
### Warning
|
|
|
|
Parallel mode generates heavy fdatasync workloads across all nodes simultaneously. This **will impact other workloads on the same Ceph pool**. Run during a maintenance window or off-peak.
|
|
|
|
## Dashboard
|
|
|
|
A Plotly Dash app for visualizing results. Supports both simple and Redpanda profile data.
|
|
|
|
```bash
|
|
cd iobench/dash
|
|
virtualenv venv
|
|
source venv/bin/activate
|
|
pip install -r requirements_freeze.txt
|
|
|
|
# Copy result CSVs into the dash directory, then:
|
|
python iobench-dash.py
|
|
```
|
|
|
|
The dashboard reads `iobench.csv` and/or `redpanda_summary.csv` from its working directory.
|