{"id":50023562,"url":"https://github.com/eric-zhou-tz/low-latency-matching-engine","last_synced_at":"2026-05-31T23:00:53.969Z","repository":{"id":357748317,"uuid":"1238310600","full_name":"eric-zhou-tz/low-latency-matching-engine","owner":"eric-zhou-tz","description":"C++20 low-latency matching engine implementing exchange-style order books, FIFO price-time priority, deterministic event flow, and Linux benchmark infrastructure.","archived":false,"fork":false,"pushed_at":"2026-05-20T09:44:49.000Z","size":6724,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-20T11:50:11.130Z","etag":null,"topics":["benchmarking","cmake","cplusplus","cpp20","docker","exchange","high-performance","linux","low-latency","matching-engine","order-book","quantitative-finance","systems-programming"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eric-zhou-tz.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-14T02:25:26.000Z","updated_at":"2026-05-20T09:44:24.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/eric-zhou-tz/low-latency-matching-engine","commit_stats":null,"previous_names":["eric-zhou-tz/low-latency-matching-engine"],"tags_count":28,"template":false,"template_full_name":null,"purl":"pkg:github/eric-zhou-tz/low-latency-matching-engine","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eric-zhou-tz%2Flow-latency-matching-engine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eric-zhou-tz%2Flow-latency-matching-engine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eric-zhou-tz%2Flow-latency-matching-engine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eric-zhou-tz%2Flow-latency-matching-engine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eric-zhou-tz","download_url":"https://codeload.github.com/eric-zhou-tz/low-latency-matching-engine/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eric-zhou-tz%2Flow-latency-matching-engine/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33752286,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmarking","cmake","cplusplus","cpp20","docker","exchange","high-performance","linux","low-latency","matching-engine","order-book","quantitative-finance","systems-programming"],"created_at":"2026-05-20T10:00:47.704Z","updated_at":"2026-05-31T23:00:53.954Z","avatar_url":"https://github.com/eric-zhou-tz.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Matching Engine\n\n[![CI](https://github.com/eric-zhou-tz/low-latency-matching-engine/actions/workflows/ci.yml/badge.svg)](https://github.com/eric-zhou-tz/low-latency-matching-engine/actions/workflows/ci.yml)\n\nA C++20 exchange-style matching engine built around deterministic price-time\npriority, low-latency data structures, and reproducible Linux benchmarking.\n\nThe engine routes parsed order commands through an exchange layer, dispatches to\nsymbol-level order books, matches against resting liquidity with FIFO semantics,\nand emits structured events for accepted orders, trades, cancels, modifies, and\nrejects.\n\n## Table of Contents\n\n- [Architecture](#architecture)\n- [Hot Path Optimizations](#hot-path-optimizations)\n- [Quick Start](#quick-start)\n- [Repository Tour](#repository-tour)\n- [Next Steps](#next-steps)\n\n## Performance Highlights\n\nLatest full-suite EC2 Release run: AWS `c7i-flex.large`, Ubuntu Linux, Intel Xeon\nPlatinum 8488C, pinned to one CPU with `taskset -c 0`, GCC/G++ 15.2.0,\n`-O3 -DNDEBUG -march=native`.\n\n| Workload | What It Measures | Count | Throughput |\n| --- | --- | ---: | ---: |\n| Random cancel | Live order-id lookup, FIFO unlink, pool release | 100,000 orders | `26.10M ops/sec` |\n| One-level crossing match | Aggressive orders consuming one resting level | 100,000 orders | `35.86M ops/sec` |\n| True mixed OrderBook flow | Direct matching-core submits, cancels, modifies, taker flow | 100,000 operations | `22.48M ops/sec` |\n| End-to-end true mixed CLI-style flow | Parser -\u003e exchange -\u003e book -\u003e event formatting | 100,000 commands | `2.10M commands/sec` |\n\nSingle-order latency measures one precomputed public `Exchange::process(action)`\ncall per sample:\n\n| Operation | p50 | p95 | p99 | p99.9 | Max Observed |\n| --- | ---: | ---: | ---: | ---: | ---: |\n| Passive insert | `313 ns` | `450 ns` | `557 ns` | `1,890 ns` | `14,982,433 ns` |\n| Aggressive match | `459 ns` | `633 ns` | `739 ns` | `990 ns` | `72,508 ns` |\n| Known cancel | `327 ns` | `500 ns` | `603 ns` | `794 ns` | `455,591 ns` |\n| Modify if present | `320 ns` | `464 ns` | `540 ns` | `718 ns` | `72,542 ns` |\n| Market order | `476 ns` | `665 ns` | `779 ns` | `1,077 ns` | `307,948 ns` |\n\nOptimized `OrderBook` versus the simple std-container toy baseline on the same\n10,000-operation direct-book workloads:\n\n| Workload | Count | Optimized | Std Toy Baseline | Speedup |\n| --- | ---: | ---: | ---: | ---: |\n| Passive insert | 10,000 operations | `42.24M ops/sec` | `342.28k ops/sec` | `123.40x` |\n| Random cancel | 10,000 operations | `75.29M ops/sec` | `231.33k ops/sec` | `325.44x` |\n| Modify if present | 10,000 operations | `80.90M ops/sec` | `440.15k ops/sec` | `183.81x` |\n| True mixed OrderBook flow | 10,000 operations | `27.40M ops/sec` | `4.67M ops/sec` | `5.87x` |\n\nHot-path rows measure typed `OrderBook` work directly. End-to-end rows include\nparsing, exchange routing, matching, and event formatting, so they are expected\nto be lower.\n\nSee [BENCHMARKS.md](BENCHMARKS.md) for methodology, hardware, commit\nprovenance, and historical results.\n\n## Architecture\n\n![Matching engine CLI benchmark comparison mode](docs/cli-benchmark-mode.gif)\n\nCore flow:\n\n```text\nCommand text -\u003e Parser -\u003e Exchange -\u003e Symbol OrderBook -\u003e Events/Formatting\n```\n\n`Exchange` owns symbol books and routes commands. `OrderBook` owns the\nlatency-sensitive matching state. The parser and output formatter sit outside\nthe hot path so the matching core can be benchmarked and tested directly.\n\nAdditional docs:\n\n- [Architecture](docs/ARCHITECTURE.md)\n- [Contributing](CONTRIBUTING.md)\n- [Hot Path Analysis](docs/HOTPATH.md)\n- [Benchmarks](BENCHMARKS.md)\n- [Benchmark History](docs/benchmark_history.md)\n- [Changelog](docs/CHANGELOG.md)\n\n## Features\n\n- Price-time priority matching\n- GTC, IOC, and FOK limit orders\n- Market orders\n- Cancel and modify support\n- Multi-symbol exchange routing\n- Integer price and quantity types\n- Deterministic replay fixtures\n- Structured hot-path events\n- GoogleTest and Google Benchmark coverage\n- Docker validation and EC2 benchmark workflow\n\n## Why Matching Engines Are Hard\n\n- Fairness semantics must be explicit: better price first, FIFO within a price.\n- Cancels and modifies can target any live resting order, not just the top.\n- Mixed flow combines passive orders, taker orders, cancels, modifies, and rejects.\n- Allocator churn and cache misses can dominate simple-looking operations.\n- Replay output must stay deterministic while internals evolve.\n- Parser, routing, matching, and formatting costs must be measured separately.\n\n## Design Tradeoffs\n\n| Choice | Tradeoff |\n| --- | --- |\n| `std::map` price levels | Deterministic ordered prices and direct best-level access, with `O(log n)` tree costs. |\n| Intrusive FIFO price queues | Preserves time priority and enables O(1) unlink after lookup, at the cost of manual links. |\n| Pooled order storage | Stable `Order*` handles and slot reuse reduce allocator pressure during churn. |\n| Dense order-id maps | Cache-friendlier cancel/modify lookup than node-based maps, while order lifetime stays in the pool. |\n| Structured events | Matching logic avoids string formatting; presentation work happens at the boundary. |\n\nEarly local structure experiments found the current `std::map` price-level\nlayout performing better than the tested B-tree and price-ladder variants on\nthe measured workloads. Those results are still treated as directional only:\nthe methodology and workload coverage need more investigation before publishing\nthe comparison as a formal benchmark claim.\n\n## Hot Path Optimizations\n\n- Intrusive queues avoid same-price scans during cancels and fills.\n- `OrderPool` reuses canceled or filled order slots.\n- Dense hash maps make order-id lookup cache-conscious.\n- Cancel routing maps live order ids directly to owning symbol books.\n- Parser and formatter work are separated from direct `OrderBook` benchmarks.\n- Caller-owned event buffers avoid fresh vectors for multi-fill submissions.\n\n## Profiling Snapshots\n\nThese flamegraphs come from pinned EC2 Release profiles using `perf` CPU-clock\nsampling with `-O3 -DNDEBUG -march=native -fno-omit-frame-pointer -g`.\n\n### Random Cancel Hot Path\n\n\u003cimg src=\"docs/perf-random-cancel.svg\" alt=\"Random cancel flamegraph\" width=\"900\"\u003e\n\nRandom cancel is dominated by `OrderBook::remove_resting_order`, with visible\ntime in dense-map erase, queue unlinking, and cancel event construction. That is\nthe expected pressure point for arbitrary-id cancellation: the engine must find\nthe live order, unlink it from its FIFO level, update aggregate level state, and\nremove the id from lookup structures without scanning the book. The profile\nkeeps allocator impact limited by using pooled order storage and reusable event\nbuffers, while the remaining cost is mostly cache-sensitive hash-table and\nprice-level metadata updates.\n\n### End-to-End True Mixed Flow\n\n\u003cimg src=\"docs/perf-end-to-end-true-mixed.svg\" alt=\"End-to-end true mixed flow flamegraph\" width=\"900\"\u003e\n\nThe end-to-end profile spreads work across workload generation, parser input\nextraction, exchange routing, matching, and event formatting. `Parser::parse_line`,\n`std::operator\u003e\u003e`, `format_event`, `std::to_string`, and small `malloc` samples\nshow the boundary cost that hot-path-only benchmarks intentionally exclude.\nInside the matching core, modifies, cancels, submits, and buy/sell matching paths\nstill appear as branch-heavy control flow because each command can accept, fill,\nrest, cancel, reject, or touch multiple price levels.\n\n## Quick Start\n\n### Prerequisites\n\n- CMake 3.20 or newer\n- C++20 compiler such as GCC, Clang, or Apple Clang\n- Docker for the recommended onboarding and validation path\n- Ninja optional for faster native builds\n- SQLite optional for inspecting benchmark history locally\n- Linux `perf` tools optional for low-level benchmark counter analysis\n\nCMake fetches GoogleTest, Google Benchmark, and `unordered_dense`\nautomatically during configure.\n\n### Docker Quick Start\n\nDocker is the recommended first path because it builds and validates the project\nin a clean Ubuntu environment.\n\n```bash\ngit clone https://github.com/eric-zhou-tz/low-latency-matching-engine.git\ncd low-latency-matching-engine\ndocker build --target validation -t matching-engine-test .\ndocker run --rm matching-engine-test ctest --test-dir build --output-on-failure -C Release\ndocker run --rm -i matching-engine-test /bin/bash -lc \\\n  './build/matching_engine --model=fast \u003c tests/replay_cli.txt'\n```\n\nFor the full containerized smoke suite:\n\n```bash\n./scripts/docker_validate.sh\n```\n\n### Linux Release Binary\n\nDownload the Linux x86_64 tarball and checksum from the GitHub release assets.\nDo not commit these generated archives to the repository.\n\n```bash\nVERSION=v1.0.0\nASSET=matching-engine-v1.0.0-linux-x86_64.tar.gz\n\ncurl -LO \"https://github.com/eric-zhou-tz/low-latency-matching-engine/releases/download/$VERSION/$ASSET\"\ncurl -LO \"https://github.com/eric-zhou-tz/low-latency-matching-engine/releases/download/$VERSION/$ASSET.sha256\"\n\nsha256sum -c \"$ASSET.sha256\"\ntar -xzf \"$ASSET\"\ncd \"${ASSET%.tar.gz}\"\n```\n\nRun the packaged replay demo:\n\n```bash\n./matching_engine --model=fast \u003c tests/replay_cli.txt\n```\n\nRun the interactive CLI:\n\n```bash\n./matching_engine\n```\n\n### Native CMake Build\n\n```bash\ngit clone https://github.com/eric-zhou-tz/low-latency-matching-engine.git\ncd low-latency-matching-engine\ncmake -S . -B build -DCMAKE_BUILD_TYPE=Release\ncmake --build build --config Release\n```\n\nRun the demo:\n\n```bash\n./build/matching_engine --model=fast \u003c tests/replay_cli.txt\n```\n\nRun the interactive CLI:\n\n```bash\n./build/matching_engine\n```\n\n## CLI Flags\n\n| Flag | Description |\n| --- | --- |\n| `--model=fast` | Optimized matching engine. This is the default. |\n| `--model=toy-std` | Simple std-container baseline for comparison and regression checks. |\n\nExample:\n\n```bash\n./build/matching_engine --model=toy-std \u003c tests/replay_cli.txt\n```\n\n## Command Protocol\n\nOne command is accepted per line:\n\n```text\nSUBMIT \u003cid\u003e \u003csymbol\u003e \u003cBUY|SELL\u003e \u003cprice\u003e \u003cquantity\u003e [GTC|IOC|FOK]\nMARKET \u003cid\u003e \u003csymbol\u003e \u003cBUY|SELL\u003e \u003cquantity\u003e\nCANCEL \u003cid\u003e\nMODIFY \u003cid\u003e \u003cnew_price\u003e \u003cnew_quantity\u003e\nPRINT\n```\n\nBehavior summary:\n\n- `SUBMIT` defaults to `GTC`; IOC cancels unfilled remainder; FOK rejects unless\n  the full quantity is immediately available.\n- `MARKET` consumes available opposite-side liquidity and never rests.\n- `CANCEL` removes an existing resting order by id.\n- `MODIFY` updates an existing resting order by id.\n- `PRINT` emits a readable snapshot of known books.\n\n## Testing\n\n```bash\nctest --test-dir build --output-on-failure -C Release\n```\n\n## Benchmarking\n\nRelease benchmark build:\n\n```bash\ncmake -S . -B build -G Ninja \\\n  -DCMAKE_BUILD_TYPE=Release \\\n  -DCMAKE_CXX_FLAGS_RELEASE=\"-O3 -DNDEBUG -march=native\"\n\ncmake --build build --config Release\n```\n\nFinal benchmark validation is run on Ubuntu Linux/EC2 with CPU pinning. Docker is\nused for Linux compatibility checks, not headline performance numbers.\n\n### EC2 Benchmark Run\n\nStart an Ubuntu EC2 instance and make sure its security group allows SSH from\nyour IP address. The current benchmark host is `3.20.238.237`; replace\n`EC2_HOST` if the instance changes.\n\nFrom your local machine, set the SSH target and connect:\n\n```bash\nexport EC2_HOST=3.20.238.237\nexport EC2_USER=ubuntu\nexport EC2_KEY=\"$HOME/.ssh/matching-engine-key.pem\"\n\nchmod 600 \"$EC2_KEY\"\nssh -i \"$EC2_KEY\" \"$EC2_USER@$EC2_HOST\"\n```\n\nOn the EC2 host, install the build tools:\n\n```bash\nsudo apt-get update\nsudo apt-get install -y \\\n  build-essential \\\n  cmake \\\n  curl \\\n  git \\\n  ninja-build \\\n  python3 \\\n  sqlite3\n```\n\nClone a clean copy and run the full pinned Release benchmark workflow:\n\n```bash\nrm -rf low-latency-matching-engine\ngit clone https://github.com/eric-zhou-tz/low-latency-matching-engine.git\ncd low-latency-matching-engine\n\nCMAKE_CXX_FLAGS_RELEASE=\"-O3 -DNDEBUG -march=native\" \\\nPIN_CPU=0 \\\nBENCHMARK_TARGETS=all \\\nbenchmarks/run_ec2_benchmarks.sh\n```\n\nFor a focused pass, set `BENCHMARK_TARGETS` to one comma-separated subset:\n\n```bash\nBENCHMARK_TARGETS=core_hot_path,realistic_flow,std_toy_comparison \\\nCMAKE_CXX_FLAGS_RELEASE=\"-O3 -DNDEBUG -march=native\" \\\nPIN_CPU=0 \\\nbenchmarks/run_ec2_benchmarks.sh\n```\n\nAvailable benchmark target names are `core_hot_path`, `realistic_flow`,\n`std_toy_comparison`, `stress`, `replay`, `batch_latency`,\n`single_order_latency`, and `reserve_sweep` for explicit experimental runs.\nThe single-order latency target records p50/p95/p99/p999/max around one\nprecomputed `Exchange::process(action)` call per sample.\n\nIf you need to benchmark local uncommitted changes, sync the tree without build\ndirectories or macOS sidecar files:\n\n```bash\nrsync -az --delete \\\n  --exclude '.git/' \\\n  --exclude 'build*/' \\\n  --exclude 'release-artifacts/' \\\n  --exclude '.DS_Store' \\\n  --exclude '._*' \\\n  -e \"ssh -i $EC2_KEY\" \\\n  ./ \"$EC2_USER@$EC2_HOST:~/matching-engine-work/\"\n\nssh -i \"$EC2_KEY\" \"$EC2_USER@$EC2_HOST\"\ncd ~/matching-engine-work\nCMAKE_CXX_FLAGS_RELEASE=\"-O3 -DNDEBUG -march=native\" \\\nPIN_CPU=0 \\\nBENCHMARK_TARGETS=all \\\nbenchmarks/run_ec2_benchmarks.sh\n```\n\nCopy benchmark artifacts back to your local checkout:\n\n```bash\nmkdir -p benchmarks/results\nscp -i \"$EC2_KEY\" \\\n  \"$EC2_USER@$EC2_HOST:~/low-latency-matching-engine/benchmarks/results/*\" \\\n  benchmarks/results/\n```\n\nIf you used the rsync workflow, copy from\n`~/matching-engine-work/benchmarks/results/*` instead.\n\nAfter updating published results, update [BENCHMARKS.md](BENCHMARKS.md),\n`benchmarks/benchmark_history.db`, and `benchmarks/benchmark_history.sql`\ntogether.\n\n## Docker Validation\n\n```bash\ndocker build --target validation -t matching-engine-test .\ndocker run --rm -i matching-engine-test /bin/bash -lc \\\n  './build/matching_engine --model=fast \u003c tests/replay_cli.txt'\n./scripts/docker_validate.sh\n```\n\nThe validation script builds a Release image, runs CTest, checks parser/replay\nand CLI binaries, exercises advertised CLI flows, and launches short benchmark\nsanity checks.\n\n## Repository Tour\n\n1. Start with this README for the project goals, quick start, command protocol,\n   and validation paths.\n2. Read [Contributing](CONTRIBUTING.md) before changing code, fixtures, or\n   benchmarks.\n3. Read [Architecture](docs/ARCHITECTURE.md) for the parser, exchange, and\n   order book design.\n4. Read [Benchmarks](BENCHMARKS.md) for the latest measured results,\n   environment, build flags, and methodology.\n5. Read [Hot Path Analysis](docs/HOTPATH.md) for the latency-sensitive matching\n   path and data-structure notes.\n6. Read [Benchmark History](docs/benchmark_history.md) for a lightweight guide\n   to the SQLite-backed benchmark history.\n\n## Repository Structure\n\n```text\ninclude/    Public headers\nsrc/        Engine implementation\ntests/      Unit, replay, and CLI tests\ntoy/        Simple std-container baseline\nbenchmarks/ Benchmark sources, EC2 runners, and history artifacts\ndocs/       Architecture, benchmark, and hot-path notes\nCONTRIBUTING.md Contributor workflow and validation expectations\n```\n\n## License\n\nThis project is licensed under the [MIT License](LICENSE).\n\n## Next Steps\n\n- Implement a dense price ladder backend for bounded tick ranges and compare it\n  against the tree-based book.\n- Add full differential testing against a simple `std::map` + `std::deque`\n  reference engine.\n- Replace text command ingestion with a compact binary protocol to reduce parse\n  and allocation overhead.\n- Add persistent replay logs for deterministic audit trails and crash/debug\n  replay.\n- Add dashboard of orders for easy viewing.\n\n## Future Scaling Directions\n\nThe current engine is intentionally single-process and deterministic. The next\nscaling steps would preserve that matching model while reducing ingress,\nnetworking, and replay overhead:\n\n- Partition symbols across matching shards so independent books can run on\n  separate cores.\n- Add lock-free ingress queues with explicit sequencing before commands enter a\n  symbol book.\n- Explore NUMA-aware placement for order pools, queues, and symbol ownership.\n- Replace text command ingestion with a compact binary protocol for lower parse\n  and allocation overhead.\n- Add persistent replay logs for crash recovery and deterministic audit trails.\n- Evaluate kernel-bypass networking only after the single-core matching path and\n  replay semantics are fully stable.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feric-zhou-tz%2Flow-latency-matching-engine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feric-zhou-tz%2Flow-latency-matching-engine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feric-zhou-tz%2Flow-latency-matching-engine/lists"}