{"id":35915518,"url":"https://github.com/PIYUSH-KUMAR1809/order-matching-engine","last_synced_at":"2026-01-16T14:00:35.292Z","repository":{"id":328021004,"uuid":"1112053481","full_name":"PIYUSH-KUMAR1809/order-matching-engine","owner":"PIYUSH-KUMAR1809","description":"High performance order matching engine","archived":false,"fork":false,"pushed_at":"2026-01-10T14:03:59.000Z","size":101,"stargazers_count":94,"open_issues_count":0,"forks_count":14,"subscribers_count":4,"default_branch":"main","last_synced_at":"2026-01-11T01:08:31.978Z","etag":null,"topics":["cplusplus","cplusplus-20","cpp","cpp20","finance","hft","high-performance","lock-free","low-latency","matching-engine","order-matching-engine"],"latest_commit_sha":null,"homepage":"https://medium.com/@kpiyush8826/how-i-optimized-a-c-matching-engine-from-100k-to-150-million-orders-per-second-35b2065fa4c0?postPublishedType=initial","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PIYUSH-KUMAR1809.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-08T04:49:33.000Z","updated_at":"2026-01-10T14:04:02.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/PIYUSH-KUMAR1809/order-matching-engine","commit_stats":null,"previous_names":["piyush-kumar1809/order-matching-engine"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/PIYUSH-KUMAR1809/order-matching-engine","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PIYUSH-KUMAR1809%2Forder-matching-engine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PIYUSH-KUMAR1809%2Forder-matching-engine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PIYUSH-KUMAR1809%2Forder-matching-engine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PIYUSH-KUMAR1809%2Forder-matching-engine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PIYUSH-KUMAR1809","download_url":"https://codeload.github.com/PIYUSH-KUMAR1809/order-matching-engine/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PIYUSH-KUMAR1809%2Forder-matching-engine/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28479073,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T11:59:17.896Z","status":"ssl_error","status_checked_at":"2026-01-16T11:55:55.838Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cplusplus","cplusplus-20","cpp","cpp20","finance","hft","high-performance","lock-free","low-latency","matching-engine","order-matching-engine"],"created_at":"2026-01-10T05:00:29.344Z","updated_at":"2026-01-16T14:00:35.283Z","avatar_url":"https://github.com/PIYUSH-KUMAR1809.png","language":"C++","funding_links":[],"categories":["CPP"],"sub_categories":["Data Visualization"],"readme":"# High-Performance Order Matching Engine\n\nA production-grade, high-frequency trading (HFT) Limit Order Book (LOB) and Matching Engine written in C++20. Designed for extreme throughput, deterministic latency, and cache efficiency using modern lock-free techniques.\n\n![C++](https://img.shields.io/badge/C++-20-blue.svg?style=flat\u0026logo=c%2B%2B) ![License](https://img.shields.io/badge/License-MIT-green.svg) [![CMake Build](https://github.com/PIYUSH-KUMAR1809/order-matching-engine/actions/workflows/cmake.yml/badge.svg)](https://github.com/PIYUSH-KUMAR1809/order-matching-engine/actions/workflows/cmake.yml) [![Awesome Quant](https://awesome.re/badge.svg)](https://github.com/wilsonfreitas/awesome-quant)\n\n\u003e **Performance Benchmark**: **~160,000,000 orders/second** (Average) | **~171,000,000** (Peak) on Apple M1 Pro.\n\n### ⚡️ Key Takeaways\n*   **Architecture**: Sharded \"Share-by-Communicating\" design avoids global locks.\n*   **Memory**: `std::pmr` monotonic buffers on the stack = 0 heap allocations on hot path.\n*   **Optimization**: `alignas(128)` (vs 64) reduced M1/M2 false sharing by ~5%.\n*   **Latency**: Sub-microsecond matching latency within the engine core.\n\n---\n\n## 🚀 Key Features\n\n*   **Ultra-High Throughput**: Capable of processing **\u003e160 million** distinct order operations per second on a single machine.\n*   **Zero-Allocation Hot Path**:\n    *   **PMR (Polymorphic Memory Resources)**: Uses `std::pmr::monotonic_buffer_resource` with a pre-allocated 512MB stack buffer for nanosecond-level allocations.\n    *   **Soft Limits**: Gracefully falls back to heap allocation if the static buffer is exhausted (no crashes).\n*   **Lock-Free Architecture**:\n    *   **SPSC Ring Buffer**: Custom cache-line aligned (`alignas(128)`) ring buffer for thread-safe, lock-free communication between producer and consumer.\n    *   **Shard-per-Core**: \"Share by Communicating\" design. Each CPU core owns a dedicated shard, eliminating mutex contention entirely.\n    *   **Thread Pinning**: Experimentally verified `pthread_setaffinity_np` / `thread_policy_set` guarantees exclusive core usage for worker threads.\n*   **Cache Optimizations**:\n    *   **Flat OrderBook**: Replaces node-based maps with `std::vector` for linear memory access.\n    *   **Bitset Scanning**: Uses CPU intrinsics (`__builtin_ctzll`) to \"teleport\" to the next active price level, skipping empty levels instantly. Optimized to avoid \"ghost level\" checks.\n    *   **Compact Storage**: Order objects are PODs (Plain Old Data) optimized for `memcpy`.\n*   **Smart Batching**:\n    *   **Producer-Side**: Thread-local accumulation of orders reduces atomic contention on the RingBuffer tail.\n    *   **Consumer-Side**: Workers pop commands in batches (up to 256) to amortize cache line invalidations.\n*   **Verification \u0026 Safety**:\n    *   **Deterministic**: `--verify` mode runs a mathematically verifiable sequence (Matches == Min(Buys, Sells)).\n    *   **Instrumentation**: `--latency` mode enables wall-clock end-to-end latency tracking.\n\n---\n\n## 🏗 Architecture\n\nThe system moves away from the traditional \"Central Limit Order Book with Global Lock\" to a **Partitioned/Sharded Model**.\n\n```mermaid\ngraph LR\n    %% Styles for high visibility and contrast\n    classDef client fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#0d47a1\n    classDef infra fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#4a148c\n    classDef core fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#1b5e20\n    classDef memory fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#e65100\n\n    User([User / Network]) --\u003e|TCP Input| Gateway{Symbol Mapping}\n    class User client\n    class Gateway infra\n\n    %% Shard 0 Pipeline\n    subgraph Core0 [Core 0: Pinned]\n        direction TB\n        RB0[Ring Buffer\u003cbr/\u003eSPSC Lock-Free]:::infra\n        Matcher0[Matching Engine\u003cbr/\u003eShard 0]:::core\n        Mem0[PMR Arena\u003cbr/\u003eStack Buffer]:::memory\n        \n        RB0 --\u003e Matcher0\n        Matcher0 --\u003e Mem0\n    end\n\n    %% Shard 1 Pipeline\n    subgraph Core1 [Core 1: Pinned]\n        direction TB\n        RB1[Ring Buffer\u003cbr/\u003eSPSC Lock-Free]:::infra\n        Matcher1[Matching Engine\u003cbr/\u003eShard 1]:::core\n        Mem1[PMR Arena\u003cbr/\u003eStack Buffer]:::memory\n        \n        RB1 --\u003e Matcher1\n        Matcher1 --\u003e Mem1\n    end\n\n    %% Routing (No crossing)\n    Gateway --\u003e|Shard 0: GOOG, MSFT| RB0\n    Gateway --\u003e|Shard 1: AAPL, TSLA| RB1\n```\n\n1.  **Ingestion (Exchange)**:\n    *   Orders are received and hashed by `SymbolID`.\n    *   \"Smart Gateway\" logic routes the order to the specific Shard owning that symbol.\n2.  **Transport (Ring Buffer)**:\n    *   Orders are pushed into a lock-free Single-Producer Single-Consumer (SPSC) ring buffer.\n    *   **Union-Based Commands**: Uses a `union` structure to overlay `Add` and `Cancel` commands, saving memory and fitting more commands per cache line.\n3.  **Matching (Core)**:\n    *   **Flat OrderBook**: Bids and Asks are simple `std::pmr::vector`s indexed directly by price (O(1) lookup).\n    *   **Matcher**: Iterates linearly over the vector for maximum hardware prefetching efficiency. Active orders are tracked via a `Bitset`.\n4.  **Memory Management**:\n    *   A Monotonic Buffer (Arena) provides memory for new orders. It resets instantly (`release()`) between benchmark runs, preventing fragmentation.\n\n---\n\n## 🛠 Build \u0026 Run\n\n### Prerequisites\n*   C++20 Compiler (GCC 10+ / Clang 12+)\n*   CMake 3.14+\n\n### Compiling\n```bash\nmkdir -p build \u0026\u0026 cd build\ncmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS=\"-march=native\"\nmake -j$(nproc)\n```\n\n### Running Benchmarks\nTo replicate the \u003e160M ops/s performance:\n```bash\n./build/src/benchmark\n```\n\nTo measure **End-to-End Latency** (approximate P50/P99):\n```bash\n./build/src/benchmark --latency\n```\n*Note: Latency mode adds instrumentation overhead and runs usually at ~15-20M ops/sec on M1 Pro due to timestamp calls.*\n\n### Running Verification\nTo ensure the engine is actually matching correctly and not just dropping frames:\n```bash\n./build/src/benchmark --verify\n```\n\n### Real-World Market Replay\nTest the engine against **live Binance L3 data** (Trade + Depth updates) to verify handling of realistic price clustering and bursty order flow.\n\n1. **Record Data** (Requires Python 3 + `websocket-client`):\n   ```bash\n   # Record 60 seconds of live BTCUSDT data\n   python3 scripts/record_l3_data.py 60\n   ```\n\n2. **Run Replay**:\n   ```bash\n   ./build/src/benchmark --replay data/market_data.csv\n   ```\n   \u003e **Result**: ~132,000,000 orders/sec (M1 Pro) on real-world data.\n\n### Running the Server\n\nStart the engine networking layer (listens on port 8080):\n```bash\n./build/src/OrderMatchingEngine\n```\n\n**Populate the Book (Optional):**\nTo quickly seed the book with Bids, Asks, and Trades for testing/visualization:\n```bash\npython3 scripts/seed_orders.py\n```\n\n---\n\n## 📊 Client Usage (TCP)\n\nConnect using `netcat` or any TCP client.\n\n**Submit Order:**\n```text\nBUY AAPL 100 15000\n\u003e ORDER_ACCEPTED_ASYNC 1\n```\n*(Format: SIDE SYMBOL QTY PRICE_INT)*\n\n**Subscribe to Market Data:**\n```text\nSUBSCRIBE AAPL\n\u003e SUBSCRIBED AAPL\n\u003e TRADE AAPL 15000 50\n```\n\n---\n\n## 🧪 Testing\n\nThe project includes a comprehensive GoogleTest suite covering matching logic, cancellation scenarios, and partial fills.\n\n```bash\n./build/tests/unit_tests\n```\n\n## 📈 Roadmap (Future)\n*   **Kernel Bypass**: Integration with DPDK/Solarflare for sub-microsecond wire latency.\n*   **Market Data Distribution**: Multicast UDP feed for quote dissemination.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPIYUSH-KUMAR1809%2Forder-matching-engine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FPIYUSH-KUMAR1809%2Forder-matching-engine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPIYUSH-KUMAR1809%2Forder-matching-engine/lists"}