{"id":34723761,"url":"https://github.com/russfellows/sai3-bench","last_synced_at":"2026-04-28T23:04:20.843Z","repository":{"id":306502709,"uuid":"1026393410","full_name":"russfellows/sai3-bench","owner":"russfellows","description":"A multi-protocol storage performance testing tool, inspired by vdbench, fio and warp.  Part of the SAI3 project.  Leverages the s3dlio Rust library","archived":false,"fork":false,"pushed_at":"2026-04-23T21:54:49.000Z","size":4205,"stargazers_count":2,"open_issues_count":3,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-23T23:33:52.591Z","etag":null,"topics":["azure-blob","benchmarking","google-cloud-storage","object-storage","performance","rust-lang","s3","sai3","storage","testing","testing-tools"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/russfellows.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-25T20:07:51.000Z","updated_at":"2026-04-23T21:49:23.000Z","dependencies_parsed_at":"2025-11-08T01:06:51.837Z","dependency_job_id":null,"html_url":"https://github.com/russfellows/sai3-bench","commit_stats":null,"previous_names":["russfellows/warp-test","russfellows/s3-test","russfellows/sai3-bench"],"tags_count":65,"template":false,"template_full_name":null,"purl":"pkg:github/russfellows/sai3-bench","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/russfellows%2Fsai3-bench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/russfellows%2Fsai3-bench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/russfellows%2Fsai3-bench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/russfellows%2Fsai3-bench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/russfellows","download_url":"https://codeload.github.com/russfellows/sai3-bench/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/russfellows%2Fsai3-bench/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32402685,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-28T19:38:08.556Z","status":"ssl_error","status_checked_at":"2026-04-28T19:37:55.688Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure-blob","benchmarking","google-cloud-storage","object-storage","performance","rust-lang","s3","sai3","storage","testing","testing-tools"],"created_at":"2025-12-25T02:13:05.817Z","updated_at":"2026-04-28T23:04:20.836Z","avatar_url":"https://github.com/russfellows.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# sai3-bench: Multi-Protocol I/O Benchmarking Suite\n\n[![Version](https://img.shields.io/badge/version-0.8.96-blue.svg)](https://github.com/russfellows/sai3-bench/releases)\n[![Build Status](https://img.shields.io/badge/build-passing-brightgreen.svg)](https://github.com/russfellows/sai3-bench)\n[![Tests](https://img.shields.io/badge/tests-713%20passing-success.svg)](https://github.com/russfellows/sai3-bench)\n[![License](https://img.shields.io/badge/license-GPL--3.0-blue.svg)](LICENSE)\n[![Rust](https://img.shields.io/badge/rust-1.90%2B-green.svg)](https://www.rust-lang.org/)\n\n**🚀 NEW (v0.8.96)**: **Multi-endpoint S3 routing fixed + `S3_ENDPOINT_URIS` env var support** — All operation types (GET, PUT, LIST, STAT, DELETE) now correctly route through `MultiEndpointStore` when a `multi_endpoint:` block is present; previously only PUT was routed, causing GETs and LISTs to fall back to a single endpoint. A new `S3_ENDPOINT_URIS` environment variable (comma-separated URIs) lets you enable multi-endpoint load balancing at runtime without editing YAML — YAML config always takes precedence. Validation output clearly shows which endpoint source is active. Updated to s3dlio v0.9.96. +1 test (713 total). See [tests/configs/MULTI_ENDPOINT_README.md](tests/configs/MULTI_ENDPOINT_README.md) for usage.\n\n**🚀 NEW (v0.8.94)**: **jemalloc global allocator + s3dlio v0.9.92** — Replaced the default glibc allocator with [tikv-jemallocator](https://crates.io/crates/tikv-jemallocator) v0.6 (`#[global_allocator]`), eliminating glibc arena contention and fragmentation. Profiling showed `malloc_consolidate` at ~3% and the allocator frame at ~52% of CPU cycles under load; jemalloc removes both bottlenecks. Measured improvement: **+3.6% throughput at t=32** (32,634 → 33,812 ops/s). Updated s3dlio dependency from v0.9.90 → **v0.9.92** (pinned tag).\n\n**🚀 NEW (v0.8.92)**: **Credential forwarding + HTTP/2 + pre-flight improvements** — Controller now forwards cloud credentials (`AWS_*`, `GOOGLE_APPLICATION_CREDENTIALS`, `AZURE_STORAGE_*`) to agents over gRPC via `--env-file \u003cpath\u003e` or from its own environment (disable with `--no-forward-env`). Agents apply credentials before pre-flight, eliminating the manual step of copying secrets to each host. See [docs/CREDENTIAL_FORWARDING.md](docs/CREDENTIAL_FORWARDING.md). s3dlio v0.9.90 adds **HTTP/2 (h2c) support** for S3-protocol `http://` endpoints: auto-probes h2c on first connection and falls back to HTTP/1.1 if refused; `https://` endpoints negotiate via TLS ALPN transparently. Enable via `S3DLIO_H2C=1` or `s3dlio_optimization.h2c: true` in YAML. Pre-flight fixes: per-agent endpoint filtering (fixes 64-error agent-2 regression), bucket-grouped output, actionable `[PERM]`/`[AUTH]`/`[CONF]`/`[NET]` error classification, agent version check table, and two new config validation warnings (redundant multi_endpoint, missing credentials hint). +57 tests (712 total).\n\n**🚀 NEW (v0.8.90)**: **`populate_ledger.tsv` + dgen-data zero-copy fills** — Every prepare phase now writes a lightweight `populate_ledger.tsv` (always-on, independent of the KV cache) recording object counts, bytes, and throughput — usable even at trillion-object scale where listing is infeasible. Data generation rewritten using [`dgen-data`](https://crates.io/crates/dgen-data) v0.2.3: a rolling-pointer pool generates one 1 MB buffer and vends zero-copy `Bytes::slice()` windows per PUT, eliminating per-object allocation. PUT latency now split into **setup** vs. **I/O** histograms for better profiling.\n\n**🚀 NEW (v0.8.89)**: **`enable_metadata_cache` config option** — disables the internal Fjall KV metadata cache for very large or simple workloads (\u003e ~1 B objects/batch). Default `true` (fully backward-compatible). Set `false` to eliminate ~3.4 GB disk usage per 50 M objects and the ~15 s resume scan, at the cost of crash-resume capability. Both standalone and distributed dry-runs print a clear banner showing the current setting. Reference config: [`tests/configs/test_prepare_no_kvcache.yaml`](tests/configs/test_prepare_no_kvcache.yaml).\n\n**🚀 NEW (v0.8.88)**: **KV cache compact encoding + coverage observability** — KV cache entries now use [postcard](https://crates.io/crates/postcard) binary encoding (56% smaller, 2× faster scans). At startup, a one-line cache summary reports object count and total logical storage (`📊 Cache summary: N objects | X.XX GiB`). Progressive WARN messages fire if a coverage scan exceeds 10 s. Preflight now queries the cache per-spec and logs coverage. Safe write-probe cycle validates writable endpoints before any benchmark I/O. Agent port changed to **7167** (was 7761) with automatic port-conflict detection on startup. +17 new tests.\n\n**🚀 NEW (v0.8.86)**: **GCS RAPID storage fully working** (s3dlio v0.9.86) — `BidiWriteObject` PUTs and `BidiReadObject` GETs verified against Hyperdisk ML RAPID buckets. RAPID mode is auto-detected per bucket or forced via `gcs_rapid_mode: true`. Worker drain deadline bug fixed (execute stage now runs its full configured duration). Timer observability logs added.\n\n**🚀 NEW (v0.8.70)**: **GCS RAPID gRPC support** (s3dlio v0.9.70) — per-trial channel count, range-download control, and write-chunk-size control sent over RPC to agents. **Autotune redesign** — all tuning parameters are YAML-only; new `channels_per_thread` parameter scales gRPC subchannels with thread count; `--dry-run` prints the full sweep plan (computed sizes, loop order, total cases, I/O estimate) before executing.\n\n**🚀 NEW (v0.8.63)**: **Multi-endpoint checkpoint race condition fix** - Eliminates fatal workload aborts at 99% completion for shared storage. **s3dlio optimization support** - +76% GET throughput for large objects (≥64MB).\n\n**🚀 NEW (v0.8.62)**: **Streaming prepare + dry-run memory sampling + stage-aligned perf-log timing** for safer large-scale runs.\n\n**🚀 NEW (v0.8.61)**: **Explicit distributed stages + numeric barrier indices** for consistent orchestration across multi-agent runs. Use the new `convert` command to upgrade legacy YAML files.\n\n**🚀 NEW (v0.8.60)**: **KV cache checkpoint restoration** - Complete resume capability! Checkpoints now automatically restored on startup, enabling agents to resume long-running prepare operations after crashes/restarts. Works for both standalone and distributed modes.\n\n**🚀 (v0.8.53)**: **Critical multi-endpoint + directory tree fix** - GET/PUT/STAT/DELETE operations now correctly route to round-robin endpoints, fixing 0-ops workload failures. Enhanced dry-run shows ALL endpoints with full URIs.\n\n**🚀 (v0.8.52)**: **Deferred retry for prepare failures** eliminates \"missing object\" errors during execution. Failed creates are automatically retried after the main loop with aggressive exponential backoff (10 attempts, up to 30s delay), ensuring completeness without impacting fast path performance. **Thousand separator display** in dry-run (64,032,768 files) and optional YAML input support (\"64,032,768\"). **Human-readable time units** in YAML: use \"5m\", \"2h\", \"30s\" instead of seconds (300, 7200, 30).\n\n**🚀 NEW (v0.8.51)**: **Critical blocking I/O fixes** for large-scale deployments (\u003e100K files). Configurable `agent_ready_timeout` (default 120s), non-blocking glob operations, and periodic yielding in prepare loops prevent executor starvation. [See docs/CHANGELOG.md](docs/CHANGELOG.md) for details.\n\n**🚀 NEW (v0.8.50)**: **YAML-driven stage orchestration** with 6 stage types, **barrier synchronization** for coordinated multi-host testing, and **comprehensive timeout configuration** (global/stage/barrier hierarchy).\n\n**🚀 NEW (v0.8.23)**: Pre-flight distributed configuration validation prevents common misconfigurations (base_uri with isolated mode) before execution.\n\n**🚀 NEW (v0.8.22)**: Multi-endpoint load balancing with per-agent static endpoint mapping for shared storage systems with multiple endpoints (NFS, S3, or object storage).\n\nA comprehensive storage performance testing tool supporting multiple backends through a unified interface. Built on the [s3dlio Rust library](https://github.com/russfellows/s3dlio) for multi-protocol support.\n\n## 🚀 What Makes sai3-bench Unique?\n\n1. **Universal Storage Testing**: Unified interface across 5 storage protocols (file://, direct://, s3://, az://, gs://)\n2. **Directory Tree Workloads**: Configurable hierarchical structures for realistic shared filesystem testing\n3. **Filesystem Operations**: Full support for nested paths and directory operations across all backends\n4. **Pre-flight Validation**: Detect configuration errors before execution (filesystem access, distributed config mismatches)\n5. **Workload Replay**: Capture production traffic and replay with microsecond fidelity (1→1, 1→N, N→1 remapping)\n6. **Op-Log Management**: Sort, validate, and merge operation logs for analysis and replay\n7. **Robust Distributed Execution**: Bidirectional streaming with sub-millisecond agent synchronization (v0.8.5+)\n8. **Production-Grade Metrics**: HDR histograms with size-bucketed analysis and aggregate summaries\n9. **Realistic Data Patterns**: Lognormal size distributions, configurable deduplication and compression\n10. **Machine-Readable Output**: TSV export with per-bucket and aggregate rows for automated analysis\n11. **Performance Logging**: Time-series perf-log with 31 columns including mean/p50/p90/p99 latencies, CPU metrics, and warmup filtering (v0.8.17+)\n12. **Results Analysis Tool**: Excel spreadsheet generation consolidating multiple test results (sai3-analyze, v0.8.17+)\n13. **Automatic Credential Distribution**: Controller forwards cloud credentials to agents over gRPC so each host needs no manual secret setup — with an allow-list, local-wins policy, and audit logging (v0.8.92+)\n\n## 🎯 Supported Storage Backends\n\nAll operations work identically across protocols - just change the URI scheme:\n\n- **File System** (`file://`) - Local filesystem with standard POSIX operations\n- **Direct I/O** (`direct://`) - High-performance direct I/O bypassing page cache (optimized chunked reads)\n- **Amazon S3** (`s3://`) - S3 and S3-compatible storage\n- **Azure Blob** (`az://`) - Microsoft Azure Blob Storage\n- **Google Cloud Storage** (`gs://` or `gcs://`) - Google Cloud Storage with native GCS API, including RAPID (Hyperdisk ML) storage (v0.8.86+)\n\nSee [Cloud Storage Setup](docs/CLOUD_STORAGE_SETUP.md) for authentication guides.\n\n## 🚀 Quick Start\n\n### One-Time Setup\n\n**1. Install Rust** (if not already installed):\n```bash\ncurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y\nsource $HOME/.cargo/env\n```\n\n**2. Clone and Build sai3-bench**:\n```bash\ngit clone https://github.com/russfellows/sai3-bench.git\ncd sai3-bench\ncargo build --release\n```\n\nThe build creates 4 executables in `target/release/`:\n- `sai3-bench` - Single-node testing CLI\n- `sai3bench-agent` - Distributed agent (runs on each test host)\n- `sai3bench-ctl` - Distributed controller (coordinates agents)\n- `sai3-analyze` - Results analysis tool (Excel export)\n\n**3. Install Executables** (optional):\n\nChoose one of the following installation methods:\n\n**Option A: User-local install** (recommended, no sudo required):\n```bash\ncargo install --path .\n```\nInstalls to `~/.cargo/bin/` (already in your PATH from Rust installation).\n\n**Option B: System-wide install**:\n```bash\nsudo install -m 755 target/release/{sai3-bench,sai3bench-ctl,sai3bench-agent,sai3-analyze} /usr/local/bin/\n```\nInstalls to `/usr/local/bin/` for all users.\n\n**Option C: Run from build directory**:\n```bash\n# No installation needed - use full path:\n./target/release/sai3-bench --version\n./target/release/sai3bench-ctl --version\n```\n\n### Testing Modes\n\nsai3-bench supports two testing modes: **Single-Node** and **Distributed**.\n\n```\n┌───────────────────────────────────────────────────────────────────┐\n│                        SINGLE-NODE MODE                           │\n├───────────────────────────────────────────────────────────────────┤\n│                                                                   │\n│  ┌──────────────┐                                                 │\n│  │              │        I/O Operations                           │\n│  │  sai3-bench  │  ─────────────────────►  Storage System         │\n│  │              │                          (S3/NFS/Azure/etc)     │\n│  └──────────────┘                                                 │\n│                                                                   │\n│  • Simple: One command to run workloads                           │\n│  • Use for: Single host testing, development, quick validation    │\n│  • Command: ./sai3-bench run --config workload.yaml               │\n│                                                                   │\n└───────────────────────────────────────────────────────────────────┘\n\n┌──────────────────────────────────────────────────────────────────┐\n│                       DISTRIBUTED MODE                           │\n├──────────────────────────────────────────────────────────────────┤\n│                                                                  │\n│  ┌──────────────────┐                                            │\n│  │                  │  gRPC: Config, Start/Stop, Stats           │\n│  │  sai3bench-ctl   │────────┬──────────┬──────────┐             │\n│  │  (Controller)    │        │          │          │             │\n│  └──────────────────┘        ▼          ▼          ▼             │\n│                                                                  │\n│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐        │\n│  │sai3bench-    │    │sai3bench-    │    │sai3bench-    │        │\n│  │agent (Host 1)│    │agent (Host 2)│    │agent (Host N)│        │\n│  └──────┬───────┘    └──────┬───────┘    └──────┬───────┘        │\n│         │ I/O               │ I/O               │ I/O            │\n│         ▼                    ▼                    ▼              │\n│    ┌────────────────────────────────────────────────────┐        │\n│    │         Storage System (NFS/S3/Azure/etc)          │        │\n│    │  • Multiple endpoints for load balancing           │        │\n│    │  • Unified namespace across all endpoints          │        │\n│    └────────────────────────────────────────────────────┘        │\n│                                                                  │\n│  • Scalable: Generate load from multiple hosts                   │\n│  • Use for: Large-scale testing, multi-endpoint storage          │\n│  • Command: ./sai3bench-ctl run --config distributed.yaml        │\n│                                                                  │\n└──────────────────────────────────────────────────────────────────┘\n```\n\n### Running Your First Workload\n\n**Single-Node Mode** - Test local filesystem:\n```bash\n# Create a simple config file\ncat \u003e my-test.yaml \u003c\u003cEOF\ntarget: \"file:///shared/benchmark/\"\nduration: \"60s\"\nconcurrency: 16\n\ndistributed:\n  shared_filesystem: true\n  tree_creation_mode: coordinator\n  path_selection: random\n  agents:\n    - address: \"host1:7167\"\n      id: \"agent-1\"\n    - address: \"host2:7167\"\n      id: \"agent-2\"\n\nprepare:\n  ensure_objects:\n    - base_uri: \"data/\"\n      count: 1000\n      min_size: 1048576\n      max_size: 1048576\n      fill: random\n\nworkload:\n  - op: get\n    path: \"data/*\"\n    weight: 70\n  - op: put\n    path: \"data/\"\n    size_spec: 1048576\n    weight: 30\nEOF\n\n# Validate config (dry-run)\n./target/release/sai3-bench run --config my-test.yaml --dry-run\n\n# Run the workload\n./target/release/sai3-bench run --config my-test.yaml\n```\n\n**Distributed Mode** - Multi-host testing:\n```bash\n# On each test host (Host 1, Host 2, etc.), start an agent:\n./target/release/sai3bench-agent --listen 0.0.0.0:7167\n\n# On the controller host, create a distributed config:\ncat \u003e distributed-test.yaml \u003c\u003cEOF\ntarget: \"file:///shared/benchmark/\"\nduration: \"120s\"\nconcurrency: 32\n\nperf_log:\n  enabled: true\n  interval: 1s\n\ndistributed:\n  shared_filesystem: true\n  tree_creation_mode: coordinator\n  path_selection: random\n  agents:\n    - address: \"host1:7167\"\n      id: \"agent-1\"\n    - address: \"host2:7167\"\n      id: \"agent-2\"\n\nprepare:\n  ensure_objects:\n    - base_uri: \"data/\"\n      count: 5000\n      min_size: 524288\n      max_size: 10485760\n      fill: random\n\nworkload:\n  - op: get\n    path: \"data/*\"\n    weight: 60\n  - op: put\n    path: \"data/\"\n    size_spec: 2097152\n    weight: 30\n  - op: list\n    path: \"data/\"\n    weight: 10\nEOF\n\n# Run distributed workload (controller coordinates agents)\n./target/release/sai3bench-ctl run --config distributed-test.yaml\n```\n\n**Common Operations**:\n```bash\n# Test storage connectivity\n./target/release/sai3-bench util health --uri \"s3://my-bucket/\"\n\n# Distributed autotune with YAML matrix (--dry-run to preview sweep plan)\n./target/release/sai3bench-ctl autotune --config examples/distributed-autotune-minimal.yaml --dry-run\n./target/release/sai3bench-ctl autotune --config examples/distributed-autotune-minimal.yaml\n\n# Capture workload for replay\n./target/release/sai3-bench --op-log trace.tsv.zst run --config my-test.yaml\n\n# Replay captured workload\n./target/release/sai3-bench replay --op-log trace.tsv.zst --target \"s3://test-bucket/\"\n\n# Analyze results (generate Excel report)\n./target/release/sai3-analyze --pattern \"sai3-*\" --output results.xlsx\n```\n\nMinimal autotune YAML example: `examples/distributed-autotune-minimal.yaml`\n\nSee [Usage Guide](docs/USAGE.md) for detailed examples and [Distributed Testing Guide](docs/DISTRIBUTED_TESTING_GUIDE.md) for multi-host patterns.\n\n## 🌳 Directory Tree Workloads\n\nTest realistic shared filesystem scenarios with configurable directory hierarchies:\n\n```yaml\nprepare:\n  directory_structure:\n    width: 3              # Subdirectories per level\n    depth: 2              # Tree depth (2 = 3 + 9 directories)\n    files_per_dir: 10     # Files per directory\n    distribution: bottom  # \"bottom\" (leaf only) or \"all\" (every level)\n    dir_mask: \"d%d_w%d.dir\"  # Directory naming pattern\n  \n  ensure_objects:\n    - base_uri: \"file:///tmp/tree-test/\"\n      count: 0            # Files created by directory_structure\n      size_spec: \n        type: uniform\n        min: 4096         # 4 KiB\n        max: 16384        # 16 KiB\n      fill: random        # Cryptographic random data (recommended)\n      dedup_factor: 1     # 1 = unique, 2+ = duplicate blocks\n      compress_factor: 1  # 1 = incompressible, 2+ = compressible\n```\n\n**Fill Pattern Options:**\n- **`random`** (default, recommended): Cryptographic random data - realistic, incompressible\n- ***⚠️ `zero`: DO NOT USE for benchmarks - triggers dedup/compression, produces unrealistic results***\n\n**Key Features:**\n- **Enhanced `--dry-run`**: Shows directory/file counts and total data size before execution\n- **Multi-level distributions**: Place files only in leaves (`bottom`) or at all levels (`all`)\n- **Cloud storage compatible**: Works seamlessly with S3, Azure Blob, GCS (implicit directories)\n- **Distributed coordination**: TreeManifest ensures collision-free file numbering across agents\n- **Realistic data**: Random fill default provides compression-resistant patterns\n\n**Example:**\n```bash\n# Validate configuration with enhanced dry-run\n./sai3-bench run --config tree-test.yaml --dry-run\n# Output: Total Directories: 12, Total Files: 60, Total Data: 600 KiB\n\n# Run workload on Azure Blob Storage\n./sai3-bench run --config tree-test.yaml\n```\n\nSee [Directory Tree Test Configs](tests/configs/directory-tree/README.md) for examples.\n\n## 📦 Architecture \u0026 Binaries\n\n- **`sai3-bench`** - Single-node CLI with subcommands: `run`, `replay`, `util`\n- **`sai3bench-agent`** - Distributed gRPC agent for multi-node load generation  \n- **`sai3bench-ctl`** - Controller for coordinating distributed agents\n- **`sai3-analyze`** - Results consolidation tool (Excel spreadsheet generation) ✨ NEW in v0.8.17\n\n## 📖 Documentation\n\n- **[Usage Guide](docs/USAGE.md)** - Getting started and common workflows\n- **[Config Syntax](docs/CONFIG_SYNTAX.md)** - Complete YAML configuration reference\n- **[Config Examples](tests/configs/README.md)** - Annotated test configurations\n- **[Distributed Testing Guide](docs/DISTRIBUTED_TESTING_GUIDE.md)** - Multi-host load generation\n- **[Cloud Storage Setup](docs/CLOUD_STORAGE_SETUP.md)** - S3, Azure, and GCS authentication\n- **[s3dlio Performance Tuning](docs/S3DLIO_PERFORMANCE_TUNING.md)** - Range downloads \u0026 multipart upload optimization ✨ NEW\n- **[Data Generation Guide](docs/DATA_GENERATION.md)** - Deduplication and compression testing\n- **[Results Analysis Tool](docs/ANALYZE_TOOL.md)** - Consolidating multiple results into Excel ✨ NEW\n- **[Changelog](docs/CHANGELOG.md)** - Complete version history\n\n\n\n## 🔬 Workload Replay\n\nCapture production I/O traces with s3dlio and replay them with microsecond-accurate timing.\n\n### Capturing Workloads with s3dlio\n\nThe [s3dlio library](https://github.com/russfellows/s3dlio) provides op-log capture for applications using storage APIs. This is the recommended way to capture real production workloads:\n\n**Python Application:**\n```python\nimport s3dlio\n\n# Initialize op-log capture at application startup\ns3dlio.init_op_log(\"/tmp/production_trace.tsv.zst\")\n\n# Your application's normal storage operations - all are logged\ndata = s3dlio.get(\"s3://bucket/model.bin\")\ns3dlio.put(\"s3://bucket/results/output.json\", result_bytes)\nfiles = s3dlio.list(\"s3://bucket/data/\")\n\n# Finalize when done (flushes and closes the log)\ns3dlio.finalize_op_log()\n```\n\n**Rust Application:**\n```rust\nuse s3dlio::{init_op_logger, store_for_uri, LoggedObjectStore, global_logger};\n\n// Initialize op-log capture\ninit_op_logger(\"production_trace.tsv.zst\")?;\n\n// Wrap your ObjectStore with logging\nlet store = store_for_uri(\"s3://bucket/\")?;\nlet logged_store = LoggedObjectStore::new(Arc::from(store), global_logger().unwrap());\n\n// All operations now captured to op-log\nlet data = logged_store.get(\"s3://bucket/file.bin\").await?;\n```\n\n### Replaying Captured Traces\n\nOnce you have an op-log from production, replay it with sai3-bench:\n\n```bash\n# Replay against test environment with original timing\nsai3-bench replay --op-log /tmp/production_trace.tsv.zst --target \"az://test-storage/\"\n\n# Replay at 5x speed for accelerated load testing\nsai3-bench replay --op-log /tmp/production_trace.tsv.zst --speed 5.0\n```\n\n\u003e **Note**: sai3-bench can also capture op-logs during benchmark runs with `--op-log`, but this is primarily for analyzing benchmark I/O patterns rather than capturing production workloads.\n\n### Backpressure Handling (v0.8.9+)\nWhen target storage can't sustain the recorded I/O rate:\n\n```yaml\n# replay_config.yaml - controls replay behavior\nlag_threshold: 5s        # Switch to best-effort when lag exceeds this\nrecovery_threshold: 1s   # Switch back when lag drops below this\nmax_flaps_per_minute: 3  # Exit gracefully if oscillating too much\nmax_concurrent: 1000     # Maximum in-flight operations\ndrain_timeout: 10s       # Timeout for draining on exit\n```\n\n```bash\nsai3-bench replay --op-log trace.tsv.zst --config replay_config.yaml --target \"s3://bucket/\"\n```\n\n### URI Remapping\nTransform source URIs during replay for migration testing:\n\n```yaml\n# remap.yaml - 1:1 bucket rename (simple migration)\nrules:\n  - match:\n      bucket: \"prod-bucket\"\n    map_to:\n      bucket: \"staging-bucket\"\n      prefix: \"migrated/\"\n```\n\n```bash\n# Apply remapping during replay\nsai3-bench replay --op-log trace.tsv.zst --remap remap.yaml --target \"s3://staging-bucket/\"\n```\n\n**Advanced Remapping Strategies:**\n- **1→1**: Simple bucket/prefix rename (migration validation)\n- **1→N**: Fanout to replicas (`round_robin` or `sticky_key` distribution)\n- **N→1**: Consolidate multiple sources to single target\n- **N→M**: Regex-based transformations (e.g., `s3://` → `gs://` for cross-cloud)\n\nSee [remap_examples.yaml](tests/configs/remap_examples.yaml) for complete examples.\n\n**Use Cases**: Pre-migration validation, performance regression testing, capacity planning, cross-cloud comparison.\n\n## 💾 Storage Efficiency Testing\n\nTest deduplication and compression with controlled data patterns.\n\n**Important**: `dedup_factor` and `compress_factor` are **optional** - if omitted, both default to `1` (no dedup, no compression).\n\n### Example 1: Default Behavior (No Dedup/Compression)\n```yaml\nprepare:\n  ensure_objects:\n    - base_uri: \"s3://bucket/unique-media/\"\n      count: 500\n      size_spec: 10485760  # 10 MB fixed size\n      fill: random\n      # dedup_factor: 1 (default - omitted, all blocks unique)\n      # compress_factor: 1 (default - omitted, incompressible)\n```\n\n### Example 2: Testing Storage Deduplication (3:1 Ratio)\n```yaml\nprepare:\n  ensure_objects:\n    - base_uri: \"s3://bucket/vm-snapshots/\"\n      count: 100\n      size_spec: 52428800  # 50 MB\n      fill: random\n      dedup_factor: 3      # 3:1 dedup (1/3 blocks unique, 2/3 duplicates)\n      compress_factor: 1   # No compression (incompressible data)\n```\n**Result**: 100 files × 50 MB = 5 GB logical, but only ~1.67 GB unique data (3:1 dedup).\n\n### Example 3: Combined Dedup + Compression (5:1 and 2:1)\n```yaml\nprepare:\n  ensure_objects:\n    - base_uri: \"s3://bucket/log-archives/\"\n      count: 200\n      size_spec:\n        type: uniform\n        min: 5242880       # 5 MB\n        max: 52428800      # 50 MB\n      fill: random\n      dedup_factor: 5      # 5:1 dedup (1/5 blocks unique)\n      compress_factor: 2   # 2:1 compression (50% zeros)\n```\n**Result**: Avg 28.5 MB × 200 files = 5.7 GB logical → ~1.14 GB unique (5:1) → ~570 MB after compression (2:1).\n\n### Dedup/Compression Ratios Explained\n\n| Setting | Value | Meaning | Storage Impact |\n|---------|-------|---------|----------------|\n| `dedup_factor: 1` | 1:1 (default) | All blocks unique | No dedup savings |\n| `dedup_factor: 3` | 3:1 | 1/3 unique, 2/3 duplicate | 67% space savings |\n| `dedup_factor: 5` | 5:1 | 1/5 unique, 4/5 duplicate | 80% space savings |\n| `compress_factor: 1` | 1:1 (default) | Incompressible | No compression savings |\n| `compress_factor: 2` | 2:1 | 50% zeros | 50% compression savings |\n| `compress_factor: 4` | 4:1 | 75% zeros | 75% compression savings |\n\n**Fill Pattern Guidelines:**\n| Pattern | Speed | Use Case |\n|---------|-------|----------|\n| `random` | Standard | Production benchmarks, realistic workloads (RECOMMENDED) |\n| ***⚠️ `zero`*** | Fastest | ***DO NOT USE - triggers dedup/compression, unrealistic results*** |\n\n**Use Cases**: Validate vendor dedup/compression claims, predict migration space requirements, model hot vs. cold data.\n\nSee [Data Generation Guide](docs/DATA_GENERATION.md) for detailed patterns.\n\n## 📐 Realistic Size Distributions\n\nModel real-world object storage patterns with statistical distributions:\n\n```yaml\nworkload:\n  - op: put\n    path: \"data/\"\n    weight: 100\n    size_spec:\n      type: lognormal    # Many small files, few large files\n      mean: 1048576      # Mean: 1 MB\n      std_dev: 524288    # Std dev: 512 KB\n      min: 1024          # Floor: 1 KB\n      max: 10485760      # Ceiling: 10 MB\n    fill: random         # Cryptographic random (recommended)\n```\n\n**Why lognormal?** Research shows object storage naturally follows lognormal distributions (many small configs/thumbnails, few large videos/backups).\n\n**Distribution Types:**\n- `lognormal`: Realistic - many small, few large (requires `mean`, `std_dev`)\n- `uniform`: Even spread between `min` and `max`\n- Fixed size: Just use `size_spec: 1048576` (integer value)\n\nSee [Config Syntax](docs/CONFIG_SYNTAX.md) for complete options.\n\n## 🌐 Distributed Testing\n\nGenerate large-scale coordinated load across multiple nodes with automated deployment:\n\n### Automated SSH Deployment\n```bash\n# One-time setup: Configure passwordless SSH\nsai3bench-ctl ssh-setup --hosts ubuntu@vm1,ubuntu@vm2,ubuntu@vm3\n\n# Run distributed test: Agents deploy automatically\nsai3bench-ctl run --config distributed-workload.yaml\n```\n\n### Configuration-Driven Agents\nDefine all agents in YAML with per-agent customization:\n```yaml\ndistributed:\n  agents:\n    - address: \"vm1.example.com\"\n      id: \"us-west-agent\"\n      target_override: \"s3://us-west-bucket/\"\n      concurrency_override: 128\n      env: { AWS_PROFILE: \"benchmark\" }\n    \n    - address: \"vm2.example.com\"\n      id: \"us-east-agent\"\n      target_override: \"s3://us-east-bucket/\"\n  \n  ssh:\n    enabled: true\n    key_path: \"~/.ssh/sai3bench_id_rsa\"\n  \n  deployment:\n    container_runtime: \"docker\"  # or \"podman\"\n    image: \"sai3bench:latest\"\n    network_mode: \"host\"\n```\n\n### Flexible Scaling Strategies\n\n**Scale-Out** (Multiple VMs): Maximum network bandwidth, fault tolerance\n```yaml\n# 8 VMs, 1 container each = 8× network interfaces\nagents:\n  - { address: \"vm1:7167\", id: \"agent-1\" }\n  - { address: \"vm2:7167\", id: \"agent-2\" }\n  # ... vm3-vm8\n```\n\n**Scale-Up** (Single VM): Cost optimization, lower latency\n```yaml\n# 1 large VM, 8 containers on different ports\nagents:\n  - { address: \"big-vm:7167\", id: \"c1\", listen_port: 7167 }\n  - { address: \"big-vm:7168\", id: \"c2\", listen_port: 7168 }\n  # ... c3-c8\n```\n\n### Cloud Automation\nPre-built scripts for rapid deployment:\n- **GCP**: `scripts/gcp_distributed_test.sh` - Complete VM lifecycle automation\n- **AWS/Azure**: `scripts/cloud_test_template.sh` - Customizable templates\n- **Local**: `scripts/local_docker_test.sh` - Test distributed mode without cloud\n\n### Key Features\n- **Automated lifecycle**: SSH, container deployment, health checks, cleanup\n- **Per-agent overrides**: Target storage, concurrency, environment variables, volumes\n- **Graceful shutdown**: Ctrl+C handling with automatic container cleanup\n- **Result aggregation**: Proper HDR histogram merging for accurate percentiles\n- **Container flexibility**: Docker or Podman via YAML (no recompilation)\n\n**Learn More**:\n- [Distributed Testing Guide](docs/DISTRIBUTED_TESTING_GUIDE.md) - Complete workflows and patterns\n- [SSH Setup Guide](docs/SSH_SETUP_GUIDE.md) - One-command SSH automation\n- [Scale-Out vs Scale-Up](docs/SCALE_OUT_VS_SCALE_UP.md) - Performance and cost comparison\n- [Cloud Scripts Guide](scripts/README.md) - GCP automation and templates\n\n## ⚙️ Key Features\n\n### I/O Rate Control (v0.7.1)\nThrottle operation start rate with realistic arrival patterns:\n```yaml\nio_rate:\n  iops: 1000              # Target operations per second\n  distribution: exponential  # Poisson arrivals (realistic)\n                             # or \"uniform\" (fixed intervals)\n                             # or \"deterministic\" (precise timing)\n```\n- **Inspired by rdf-bench**: Similar to `iorate=` parameter with enhanced distributions\n- **Three distribution types**: Exponential (Poisson), Uniform (fixed), Deterministic (precise)\n- **Drift compensation**: tokio::time::Interval for uniform distribution accuracy\n- **Zero overhead when disabled**: Optional wrapper for maximum performance\n- **Per-worker division**: Target IOPS automatically split across concurrent workers\n\nSee [I/O Rate Control Guide](docs/IO_RATE_CONTROL_GUIDE.md) for detailed usage and examples.\n\n### TSV Export with Aggregate Rows\nMachine-readable output with per-bucket and aggregate summary rows:\n- **Per-bucket rows**: Statistics for each size bucket (zero, 1B-8KiB, 8KiB-64KiB, etc.)\n- **Aggregate rows**: \"ALL\" rows combining all size buckets per operation type (GET/PUT/META)\n- **Accurate latency merging**: HDR histogram merging for statistically correct percentiles\n- **Distributed support**: Per-agent TSVs and consolidated TSV with overall aggregates\n\n### Per-Operation Concurrency\nFine-grained worker pool control:\n```yaml\nconcurrency: 32  # Global default\nworkload:\n  - op: get\n    path: \"data/*\"\n    weight: 70\n    concurrency: 64  # More GET workers\n  - op: put\n    path: \"uploads/\"\n    weight: 30\n    concurrency: 8   # Fewer PUT workers\n```\n\n### Config Validation\nVerify YAML before execution:\n```bash\nsai3-bench run --config my-workload.yaml --dry-run\n```\n\nSee [Config Syntax](docs/CONFIG_SYNTAX.md) for complete reference.\n\n## 🛠️ Development\n\n### Requirements\n- Rust stable toolchain (2024 edition)\n- `protoc` compiler for gRPC (distributed mode)\n- Storage credentials for cloud backends\n\n### Building \u0026 Testing\n```bash\n# Build\ncargo build --release\n\n# Run tests\ncargo test\n\n# Streaming replay tests (must run sequentially)\ncargo test --test streaming_replay_tests -- --test-threads=1\n```\n\nSee [Cloud Storage Setup](docs/CLOUD_STORAGE_SETUP.md) for authentication details.\n\n## 🔗 Related Projects\n\n- **[s3dlio](https://github.com/russfellows/s3dlio)** - The underlying multi-protocol storage library powering sai3-bench\n- **[polarWarp](https://github.com/russfellows/polarWarp)** - Op-log analysis tool for parsing and visualizing s3dlio operation logs\n\n## 📄 License\n\nGPL-3.0 License - See [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frussfellows%2Fsai3-bench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frussfellows%2Fsai3-bench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frussfellows%2Fsai3-bench/lists"}