{"id":51321830,"url":"https://github.com/eric-zhou-tz/concurrent-kv-store","last_synced_at":"2026-07-01T14:30:28.576Z","repository":{"id":352512379,"uuid":"1213086775","full_name":"eric-zhou-tz/concurrent-kv-store","owner":"eric-zhou-tz","description":"a persistent memory and execution layer for AI agents in C++17, supporting stateful workflows, step logging, and replayable execution","archived":false,"fork":false,"pushed_at":"2026-05-25T08:55:32.000Z","size":234,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-25T09:22:36.376Z","etag":null,"topics":["agentic-ai","agents","backend","cplusplus","persistence","systems-programming"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eric-zhou-tz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"docs/Roadmap.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-17T03:05:29.000Z","updated_at":"2026-05-25T08:55:35.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/eric-zhou-tz/concurrent-kv-store","commit_stats":null,"previous_names":["eric-zhou-tz/kv_store","eric-zhou-tz/context-kv","eric-zhou-tz/agentkv"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/eric-zhou-tz/concurrent-kv-store","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eric-zhou-tz%2Fconcurrent-kv-store","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eric-zhou-tz%2Fconcurrent-kv-store/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eric-zhou-tz%2Fconcurrent-kv-store/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eric-zhou-tz%2Fconcurrent-kv-store/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eric-zhou-tz","download_url":"https://codeload.github.com/eric-zhou-tz/concurrent-kv-store/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eric-zhou-tz%2Fconcurrent-kv-store/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35011254,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-01T02:00:05.325Z","response_time":130,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agentic-ai","agents","backend","cplusplus","persistence","systems-programming"],"created_at":"2026-07-01T14:30:26.618Z","updated_at":"2026-07-01T14:30:28.572Z","avatar_url":"https://github.com/eric-zhou-tz.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Concurrent KV Store in Modern C++\n\nA key-value store built from first principles in modern C++, with WAL\npersistence, snapshot recovery, correctness-first concurrency, benchmarking,\nand an eventual storage-engine architecture.\n\nThe current implementation is intentionally small and inspectable: a\nsingle-process CLI routes parsed commands into an in-memory `KVStore`, while\nthe persistence layer records durable mutations and restores state from\nsnapshots plus WAL tail replay.\n\nCurrent release status: `v0.5.0` adds coarse-grained reader/writer\nsynchronization with `std::shared_mutex`. Concurrent readers can proceed\ntogether, while writes, snapshots, compaction, recovery, clear, and persistence\nreset are exclusive. The CLI reports this through `INFO`, `VERSION`, or\n`STATUS`.\n\n## Table of Contents\n\n- [Performance Highlights](#performance-highlights)\n- [Current Status](#current-status)\n- [Architecture](#architecture)\n- [Features](#features)\n- [Quick Start](#quick-start)\n- [Repository Tour](#repository-tour)\n- [Benchmarks](#benchmarks)\n- [Engineering Notes](#engineering-notes)\n\n## Performance Highlights\n\nLatest focused EC2 Release run: AWS `c7i-flex.large`, Ubuntu Linux, Intel Xeon\nPlatinum 8488C, GCC/G++ 15.2.0, CMake Release with `-O3 -DNDEBUG`, five Google\nBenchmark repetitions. This KV-store run was not CPU-pinned, and the numbers\nbelow should not be compared directly to matching-engine results because the\nworkloads are different.\n\n| Workload | What It Measures | Count / Size | Throughput / Latency |\n| --- | --- | ---: | ---: |\n| Mixed 70/30 read-write | Direct in-memory `KVStore` flow | 1,000-key working set | `55.09M ops/sec` |\n| Get | Successful in-memory lookup after preload | 1,000-key preload | `48.05M ops/sec`, `20.8 ns/op` |\n| Delete | In-memory erase after deterministic preload | 1,000 deletes/batch | `29.66M ops/sec` |\n| Set | In-memory insert/overwrite | 1,000 writes/batch | `17.11M ops/sec` |\n| Durable Set | WAL-backed `Set` with flush behavior | 1,000 writes/batch | `1.66M ops/sec` |\n| WAL replay | Checksum-framed WAL recovery path | 10,000 records | `3.75 ms`, `2.67M records/sec` |\n| Snapshot load | Full snapshot restore into memory | 10,000 entries | `1.21 ms`, `8.28M entries/sec` |\n| Snapshot + WAL-tail recovery | Snapshot-assisted recovery path | 10,000 base entries + 10% tail | `1.72 ms`, `6.38M entries/sec` |\n\nSee [docs/Benchmarks.md](docs/Benchmarks.md) for methodology, caveats, raw\nartifact paths, and benchmark history.\n\n## Current Status\n\n| Area | Current Behavior |\n| --- | --- |\n| Version | `v0.5.0` |\n| Concurrency | Coarse `std::shared_mutex`: concurrent reads, serialized writes and durability operations |\n| Durability | WAL append happens before in-memory mutation; snapshots and compaction are exclusive |\n| Recovery | Startup loads snapshot first, then replays the checksum-verified WAL tail |\n| CLI | `INFO`, `VERSION`, and `STATUS` print version, entry count, concurrency model, and durability notes |\n| Validation | Latest local validation: Release build plus `85/85` CTest cases passing |\n| Benchmarks | Published EC2 numbers are the pre-concurrency baseline; no official contention rows yet |\n\n## Architecture\n\n```text\nCommand text -\u003e CliParser -\u003e CliServer -\u003e KVStore -\u003e WAL + Snapshot\n```\n\n`KVStore` owns the live map and exposes the core `Set`, `Get`, and `Delete`\nAPI. `WriteAheadLog` stores ordered mutation records before in-memory mutation.\n`Snapshot` stores full point-in-time materialized state and records the WAL byte\noffset covered by the checkpoint.\n\nAdditional docs:\n\n- [Architecture](docs/Architecture.md)\n- [Benchmarks](docs/Benchmarks.md)\n- [Benchmark History](docs/Benchmark_History.md)\n- [Changelog](docs/CHANGELOG.md)\n- [Roadmap](docs/Roadmap.md)\n\n## Features\n\n- Modern C++20 build through CMake\n- In-memory `SET`, `GET`, and `DELETE`\n- Overwrite and missing-key semantics\n- Binary append-only WAL with CRC32 payload checksums\n- Corruption-aware WAL replay with safe truncation to the last valid record\n- Verified full-state snapshot checkpoints with WAL rotation compaction\n- Startup recovery from snapshot plus checksum-verified WAL tail\n- Coarse-grained reader/writer synchronization for concurrent readers and\n  serialized writes\n- Interactive CLI with `INFO`/`VERSION`/`STATUS`\n- GoogleTest coverage for storage and persistence behavior\n- Google Benchmark hot-path benchmarks\n\n## Quick Start\n\n### Prerequisites\n\n- CMake 3.20 or newer\n- C++20 compiler such as GCC, Clang, or Apple Clang\n- Network access during first configure so CMake can fetch GoogleTest and\n  Google Benchmark\n\n### Native Release Build\n\n```bash\ncmake -S . -B build -DCMAKE_BUILD_TYPE=Release\ncmake --build build --config Release\n```\n\nRun the CLI:\n\n```bash\n./build/kv_store\n```\n\nExample session:\n\n```text\nConcurrent KV Store v0.5.0\nConcurrency: coarse shared_mutex: concurrent reads, serialized writes and durability\nLoading snapshot...\nLoaded 0 snapshot entrie(s)\nReplaying WAL...\nRecovered 0 operation(s)\nkv-store\u003e INFO\nConcurrent KV Store v0.5.0\nentries: 0\nconcurrency: coarse shared_mutex: concurrent reads, serialized writes and durability\ndurability: WAL appends are serialized before memory mutation; snapshot, compaction, and recovery are exclusive\nkv-store\u003e SET language cpp\nOK\nkv-store\u003e GET language\ncpp\nkv-store\u003e DELETE language\n1\nkv-store\u003e GET language\n(nil)\nkv-store\u003e EXIT\nBye\n```\n\n### Tests\n\n```bash\nctest --test-dir build --output-on-failure -C Release\n```\n\nFocused targets:\n\n```bash\ncmake --build build --target kv_store_tests\n./build/kv_store_tests\n\ncmake --build build --target kv_store_stress_tests\n./build/kv_store_stress_tests\n```\n\nThreadSanitizer validation is opt-in and intended for Linux Clang/GCC Debug\nbuilds:\n\n```bash\ncmake -S . -B build-tsan \\\n  -DCMAKE_BUILD_TYPE=Debug \\\n  -DCONCURRENT_KV_STORE_ENABLE_TSAN=ON \\\n  -DCONCURRENT_KV_STORE_BUILD_TESTS=ON \\\n  -DCONCURRENT_KV_STORE_BUILD_BENCHMARKS=OFF\ncmake --build build-tsan\nctest --test-dir build-tsan --output-on-failure\n```\n\n### Benchmarks\n\n```bash\ncmake --build build --target kv_store_benchmark\n./build/kv_store_benchmark\n./build/kv_store_benchmark --benchmark_filter=BM_Get\n```\n\nSee [docs/Benchmarks.md](docs/Benchmarks.md) for methodology and result\ntables.\n\n### Benchmarking on EC2\n\nPublication benchmark runs should happen on the target EC2 instance, not on a\nlocal development machine. Use the existing EC2 host at public IPv4\n`3.20.238.237`:\n\n```bash\nssh ubuntu@3.20.238.237\ncd ~/concurrent-kv-store\ngit pull\nchmod +x scripts/run_ec2_benchmarks.sh\n./scripts/run_ec2_benchmarks.sh\n```\n\nThe script writes raw text, JSON, and metadata files under\n`benchmark_results/`. Summarize those results manually in\n[docs/Benchmarks.md](docs/Benchmarks.md) and\n[docs/Benchmark_History.md](docs/Benchmark_History.md).\n\n## Repository Tour\n\n```text\ninclude/        Public headers for store, persistence, parser, and CLI server\nsrc/            Implementation files\ntests/          GoogleTest unit, integration, and stress suites\nbenchmarks/     Google Benchmark hot-path benchmarks\ndocs/           Architecture, benchmark, changelog, and roadmap notes\nscripts/        CMake convenience scripts\n```\n\n## Benchmarks\n\nThe benchmark suite currently covers:\n\n| Benchmark | What It Measures |\n| --- | --- |\n| `BM_Put` | In-memory insert/overwrite path |\n| `BM_Get` | Successful in-memory lookup path |\n| `BM_Delete` | Delete path after deterministic preload |\n| `BM_MixedReadWrite70_30` | Deterministic 70% read / 30% write flow |\n| `BM_DurableSetWithWalFlush` | WAL-backed Set path |\n| `BM_SnapshotSave` | Full snapshot write and verification |\n| `BM_SnapshotLoad` | Snapshot load into memory |\n| `BM_WalReplay` | Replay checksum-framed WAL records |\n| `BM_RecoveryFromSnapshotAndWalTail` | Snapshot load plus WAL tail replay |\n| `BM_RecoveryFromCompactedSnapshotAndWalTail` | Snapshot plus rotated WAL recovery |\n| `BM_SnapshotCompaction` | Snapshot verification and WAL rotation |\n\nLatest EC2 results, methodology, caveats, and raw artifact paths are documented\nin [docs/Benchmarks.md](docs/Benchmarks.md). Benchmark results are\nmachine-specific; record compiler, build type, CPU, OS, commit, and command line\nwith every published run. The current published EC2 baseline predates the\n`v0.5.0` lock insertion, so it should be treated as a pre-concurrency baseline\nuntil a clean refresh is published.\n\n## Engineering Notes\n\n- The storage core uses a correctness-first `std::shared_mutex` model:\n  concurrent readers can proceed together, while writes, snapshots,\n  compaction, recovery, clear, and persistence reset are exclusive.\n- WAL appends remain serialized through `KVStore`, preserving the existing\n  write-ahead durability contract under concurrent callers.\n- Sharing one WAL or Snapshot object across multiple stores or using it\n  directly from another thread is outside the synchronization contract.\n- WAL records are length-framed, checksum-protected, and bounded to avoid\n  unbounded allocation while recovering corrupted files.\n- WAL replay applies only complete checksum-verified records. It stops at the\n  first untrusted frame and can truncate a corrupted crash tail to the last\n  known-good byte offset.\n- `COMPACT`/`SNAPSHOT` writes and verifies a full snapshot before rotating the\n  WAL to an empty log. If snapshot writing or verification fails, WAL history\n  remains untouched.\n- Snapshots duplicate full state by design. Incremental checkpoints are future\n  storage-engine work.\n- Current concurrency is coarse-grained, not sharded or lock-free. Future\n  performance work may add sharded maps, per-shard locks, a single WAL writer,\n  batched flush/group commit, segmented WAL, or an LSM/memtable-style engine.\n- The CLI is an integration boundary, not the storage API. Tests and benchmarks\n  exercise `KVStore` and persistence components directly where possible.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feric-zhou-tz%2Fconcurrent-kv-store","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feric-zhou-tz%2Fconcurrent-kv-store","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feric-zhou-tz%2Fconcurrent-kv-store/lists"}