https://github.com/Lattice9AI/NexusFix
A zero-alloc, compile-time hardened FIX engine built for sub-100ns execution.
https://github.com/Lattice9AI/NexusFix
cpp23 fix-protocol header-only hft high-frequency-trading low-latency quantitative-finance simd trading
Last synced: about 22 hours ago
JSON representation
A zero-alloc, compile-time hardened FIX engine built for sub-100ns execution.
- Host: GitHub
- URL: https://github.com/Lattice9AI/NexusFix
- Owner: Lattice9AI
- License: mit
- Created: 2026-01-22T00:51:10.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-31T23:42:31.000Z (9 days ago)
- Last Synced: 2026-04-01T01:53:15.591Z (9 days ago)
- Topics: cpp23, fix-protocol, header-only, hft, high-frequency-trading, low-latency, quantitative-finance, simd, trading
- Language: C++
- Size: 8.78 MB
- Stars: 42
- Watchers: 4
- Forks: 7
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
- awesome-quant - NexusFix - `CPP` - C++23 FIX protocol engine with zero-copy parsing and SIMD acceleration, 3x faster than QuickFIX. (Trading & Backtesting)
- awesome-quant - NexusFix - C++23 高性能 FIX 协议引擎,零拷贝解析 + SIMD 加速,解析 ExecutionReport ~246ns(QuickFIX ~730ns) (编程 / C++)
README
NexusFIX
Ultra-Low Latency FIX Protocol Engine for High-Frequency Trading
Modern C++23 | Zero-Copy | SIMD-Accelerated | 3x Faster than QuickFIX
Performance |
Architecture |
Features |
Quick Start |
Docs |
Contact
---
## Why NexusFIX?
**NexusFIX** is a high-performance **FIX protocol** (Financial Information eXchange) engine built for **ultra-low latency quantitative trading**, **sub-microsecond algorithmic execution**, and **high-frequency trading (HFT)** systems. It solves the **performance bottlenecks** of traditional FIX engines by utilizing **hardware-aware C++ programming**.
NexusFIX serves as a modern, faster alternative to QuickFIX with **zero heap allocations** on the critical path.
> *"If you're building a low-latency trading system and QuickFIX is your bottleneck, NexusFIX is your solution."*
---
## Performance
### NexusFIX vs QuickFIX Benchmark
Tested on Linux with GCC 13.3, 100,000 iterations:
| Metric | QuickFIX | NexusFIX | Improvement |
|--------|----------|----------|-------------|
| **ExecutionReport Parse** | 730 ns | 246 ns | **3.0x faster** |
| **NewOrderSingle Parse** | 661 ns | 229 ns | **2.9x faster** |
| **Field Lookup** (O(1) post-parse, 4 fields) | 31 ns | 11 ns | **2.9x faster** |
| **Parse Throughput** | 1.19M msg/sec | 4.17M msg/sec | **3.5x higher** |
| **P99 Parse Latency** | 784 ns | 258 ns | **3.0x lower** |
### Why is NexusFIX Faster?
| Technique | QuickFIX | NexusFIX |
|-----------|----------|----------|
| Memory | Heap allocation per message | Zero-copy `std::span` views |
| Field Lookup | O(log n) `std::map` | O(1) direct array indexing |
| Parsing | Byte-by-byte scanning | AVX2 SIMD vectorized |
| Field Offsets | Runtime calculation | `consteval` compile-time |
| Enum/Type Conversion | Runtime switch chains (~300 branches) | 22 compile-time lookup tables (55-97% faster) |
| Error Handling | Exceptions | `std::expected` (no throw) |
### Zero Allocation Proof
Parsing a **NewOrderSingle** message on the hot path:
| Operation | QuickFIX | NexusFIX |
|-----------|----------|----------|
| **Heap Allocations** | ~12 (`std::string`, `std::map` nodes) | **0** |
| **Field Storage** | `std::map` copies | `std::span` views into original buffer |
| **Parsing Logic** | Runtime map insertion | Compile-time offset table |
| **Memory Footprint** | Dynamic, unpredictable | Static, pre-allocated PMR pool |
| **Destructor Overhead** | ~12 `std::string` destructors | **0** (no owned memory) |
*Verified via custom allocator instrumentation. See [Optimization Diary](docs/optimization_diary.md).*
*For kernel bypass (DPDK/AF_XDP) and FPGA acceleration, see [Roadmap](docs/design/TICKET_204_AERON_HIGH_THROUGHPUT_MESSAGING.md).*
---
## Architecture Influences
NexusFIX stands on the shoulders of giants. We systematically studied **11 industry-leading Modern C++ libraries** and applied their techniques to ultra-low latency FIX processing. Below is our learning journey: what we learned, what we built, and what improvement we measured.
### Learning → Implementation → Verification
| Source Library | Engineering Evaluation | What We Changed | Benchmark Result |
|----------------|------------------------|-----------------|------------------|
| [hffix](https://github.com/jamesdbrock/hffix) | O(n) iterator-based field lookup is suboptimal for dense FIX packets; lacks compile-time optimization and type safety | `[Optimized]` `consteval` field offsets + `std::span` zero-copy views + O(1) direct indexing | **14ns** field access vs ~50ns iterator scan |
| [Abseil](https://github.com/abseil/abseil-cpp) | Swiss Tables offer SIMD-accelerated probing with 7-bit H2 fingerprints; superior cache locality for session maps | `[Adopted]` `absl::flat_hash_map` for session store | **[31% faster](docs/compare/ABSEIL_FLAT_HASH_MAP_BENCHMARK.md)** (20ns → 15ns) |
| [Quill](https://github.com/odygrd/quill) | Lock-free SPSC queue with deferred formatting; only viable approach for hot-path logging without blocking | `[Adopted]` Quill as logging backend | **8ns** median latency; zero blocking |
| [NanoLog](https://github.com/PlatformLab/NanoLog) | Binary encoding + background thread achieves 7ns; compile-time format validation essential for type safety | `[Synthesized]` `DeferredProcessor` with static type-safe binary serialization | **[84% reduction](docs/compare/DEFERRED_PROCESSOR_BENCHMARK.md)** (75ns → 12ns) |
| [liburing](https://github.com/axboe/liburing) | `DEFER_TASKRUN` defers completion to userspace, eliminating kernel task wakeups; registered buffers avoid per-op mapping | `[Adopted]` io_uring + DEFER_TASKRUN + registered buffers + multishot | **[7-27% faster](docs/compare/DEFER_TASKRUN_BENCHMARK.md)**; ~30% fewer syscalls |
| [Highway](https://github.com/google/highway) | Portable SIMD abstraction across AVX2/AVX-512/NEON/SVE; slight overhead vs direct intrinsics | `[Evaluated]` Retained hand-tuned intrinsics for FIX-specific patterns | **13x throughput**; Highway deferred for ARM |
| [Seastar](https://github.com/scylladb/seastar) | Share-nothing reactor optimal for high-concurrency I/O; high abstraction overhead for single-threaded tick-to-trade paths | `[Influenced]` Extracted core-pinning + lock-free pipelining without framework | **[8% P99 improvement](docs/compare/CPU_AFFINITY_BENCHMARK.md)** (18.8ns → 17.3ns) |
| [Folly](https://github.com/facebook/folly) | Advanced memory fencing patterns and lock-free primitives; `folly::Function` overhead acceptable for cold path only | `[Influenced]` Native SPSC queue + bit-masking for tag validation | Comparable performance; zero dependency |
| [Rigtorp](https://github.com/rigtorp/SPSCQueue) | Cache-line padding (`alignas(64)`) eliminates false sharing; simplest correct SPSC implementation | `[Synthesized]` Native `SPSCQueue` with identical techniques | **88M ops/sec**; 11ns median |
| [xsimd](https://github.com/xtensor-stack/xsimd) | Generic SIMD wrappers useful for math, but FIX parsing requires byte-level shuffle control | `[Evaluated]` Direct Intel intrinsics for SOH/delimiter scanning | **2x faster** than generic wrappers |
| [Boost.PMR](https://www.boost.org/doc/libs/release/libs/container/doc/html/container/polymorphic_memory_resources.html) | Standard allocators induce non-deterministic jitter; monotonic buffer enables arena allocation per message | `[Adopted]` `std::pmr::monotonic_buffer_resource` | **Zero heap allocation** on hot path |
### What We Built
| Component | Inspired By | Implementation |
|-----------|-------------|----------------|
| `TagOffsetMap` | hffix | Compile-time generated O(1) field lookup table |
| `DeferredProcessor` | NanoLog | SPSC queue + background thread for async processing |
| `ThreadLocalPool` | NanoLog, Folly | Per-thread object pool, zero lock contention |
| `SPSCQueue` | Rigtorp, Folly | Cache-line aligned lock-free queue |
| `simd_scanner` | xsimd (concept) | Hand-tuned AVX2/AVX-512 SOH and delimiter scanning |
| `IoUringTransport` | liburing | DEFER_TASKRUN + registered buffers + multishot recv |
| `CpuAffinity` | Seastar | Thread-to-core pinning utility |
### Cumulative Impact
| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| ExecutionReport Parse | 730 ns | 246 ns | **3.0x faster** |
| Hot Path Latency | 361 ns | 213 ns | **41% reduction** |
| SIMD SOH Scan | ~150 ns | 11.8 ns | **~13x faster** |
| Hash Map Lookup | 20 ns | 15 ns | **31% faster** |
| P99 Tail Latency | 784 ns | 258 ns | **3.0x lower** |
*Detailed benchmarks: [Optimization Summary](docs/compare/OPTIMIZATION_SUMMARY_BEFORE_AFTER.md)*
### Attribution
NexusFIX is MIT licensed. We gratefully acknowledge these open source projects:
| Dependency | License | Usage |
|------------|---------|-------|
| [Abseil](https://github.com/abseil/abseil-cpp) | Apache 2.0 | `flat_hash_map` for session lookups |
| [Quill](https://github.com/odygrd/quill) | MIT | Async logging infrastructure |
| [liburing](https://github.com/axboe/liburing) | MIT/LGPL | io_uring C wrapper |
---
## Features
### Core Capabilities
- **Zero-Copy Parsing** - `std::span` views into original buffer, no `memcpy`
- **Message Encoding** - Builder pattern with `constexpr` serializer for constructing FIX messages
- **SIMD Acceleration** - AVX2/AVX-512 instructions for delimiter scanning
- **Compile-Time Optimization** - `consteval` field offsets, 22 lookup tables for enum/type conversion, ~300 runtime branches eliminated
- **O(1) Field Lookup** - Pre-indexed lookup table by FIX tag number (post-parse)
- **Zero Heap Allocation** - PMR pools and stack allocation on hot path
- **Session Management** - Full session lifecycle: Logon, Logout, Heartbeat, sequence number tracking, reconnect logic
- **Type-Safe API** - Strong types for Price, Quantity, Side, OrdType
### Modern C++23
- `std::expected` for error handling (no exceptions on hot path)
- `std::span` for zero-copy data views
- Concepts for compile-time interface validation
- `consteval` for compile-time computation
- `[[likely]]` / `[[unlikely]]` branch hints
### Supported FIX Versions
| Version | Status | Notes |
|---------|--------|-------|
| FIX 4.4 | Full Support | Most common in production |
| FIX 5.0 + FIXT 1.1 | Full Support | Only 2% overhead vs 4.4 |
### Supported Message Types
| MsgType | Name | Category |
|---------|------|----------|
| A | Logon | Session |
| 5 | Logout | Session |
| 0 | Heartbeat | Session |
| D | NewOrderSingle | Order Entry |
| F | OrderCancelRequest | Order Entry |
| 8 | ExecutionReport | Order Entry |
| V | MarketDataRequest | Market Data |
| W | MarketDataSnapshotFullRefresh | Market Data |
| X | MarketDataIncrementalRefresh | Market Data |
### Optimization Guide
How we achieved sub-300ns latency with Modern C++23:
- [Optimization Diary](docs/optimization_diary.md) - Step-by-step journey from 730ns to 246ns
- [Modern C++ Quant Techniques](docs/modernc_quant.md) - Cache-line alignment, SIMD, PMR strategies, branch hints
---
## Quick Start
### Installation
```bash
git clone https://github.com/Lattice9AI/NexusFIX.git
cd NexusFIX
./start.sh build
```
### Requirements
- **C++23 compiler**: GCC 13+ or Clang 17+
- **CMake**: 3.20+
- **OS**: Linux (io_uring optional), macOS, Windows
### Basic Usage
```cpp
#include
using namespace nfx;
using namespace nfx::fix44;
// Connect to broker
TcpTransport transport;
transport.connect("fix.broker.com", 9876);
// Configure session
SessionConfig config{
.sender_comp_id = "MY_CLIENT",
.target_comp_id = "BROKER",
.heartbeat_interval = 30
};
SessionManager session{transport, config};
session.initiate_logon();
// Send order (zero allocation)
MessageAssembler asm_;
NewOrderSingle::Builder order;
auto msg = order
.cl_ord_id("ORD001")
.symbol("AAPL")
.side(Side::Buy)
.order_qty(Qty::from_int(100))
.ord_type(OrdType::Limit)
.price(FixedPrice::from_double(150.00))
.build(asm_);
transport.send(msg);
```
### Parse Incoming Messages
```cpp
// Zero-copy parsing
FixParser parser;
auto result = parser.parse(raw_buffer);
if (result) {
auto& msg = *result;
auto order_id = msg.get_string(Tag::OrderID); // O(1) lookup
auto exec_type = msg.get_char(Tag::ExecType); // No allocation
auto fill_qty = msg.get_qty(Tag::LastQty); // Type-safe
}
```
---
## Build Options
| CMake Option | Default | Description |
|--------------|---------|-------------|
| `NFX_ENABLE_SIMD` | ON | AVX2/AVX-512 SIMD acceleration |
| `NFX_ENABLE_IO_URING` | OFF | Linux io_uring transport |
| `NFX_BUILD_BENCHMARKS` | ON | Build benchmark suite |
| `NFX_BUILD_TESTS` | ON | Build unit tests |
| `NFX_BUILD_EXAMPLES` | ON | Build examples |
```bash
# Build with all optimizations
cmake -B build -DCMAKE_BUILD_TYPE=Release -DNFX_ENABLE_SIMD=ON
cmake --build build -j
# Run benchmarks
./start.sh bench 100000
# Compare with QuickFIX
./start.sh compare 100000
```
---
## Benchmarking
Verify performance claims by running benchmarks yourself.
### Quick Start
```bash
# Run parser and session benchmarks
./start.sh bench 100000
# Example output:
# [BENCHMARK] ExecutionReport Parse
# Iterations: 100000
# Mean: 246 ns
# P50: 245 ns
# P99: 258 ns
```
### QuickFIX Comparison
Compare NexusFIX against QuickFIX (requires QuickFIX installed):
```bash
# Install QuickFIX first
# Ubuntu: sudo apt install libquickfix-dev
# Or build from source: https://github.com/quickfix/quickfix
# Run comparison
./start.sh compare 100000
```
### Full Reproduction Guide
For detailed instructions on reproducing benchmark results, including:
- Environment setup (CPU governor, pinning, priority)
- Build configuration options
- Interpreting results
- Troubleshooting
See [BENCHMARK_REPRODUCTION.md](BENCHMARK_REPRODUCTION.md)
---
## Documentation
- [API Reference](docs/API_REFERENCE.md) - Complete API documentation
- [Implementation Guide](docs/design/IMPLEMENTATION_GUIDE.md) - Architecture overview
- [Benchmark Report](docs/compare/BENCHMARK_COMPARISON_REPORT.md) - Detailed performance analysis
- [Modern C++ Techniques](docs/modernc_quant.md) - Optimization techniques used
---
## Project Structure
```
nexusfix/
├── include/nexusfix/
│ ├── parser/ # Zero-copy FIX parser (SIMD)
│ ├── session/ # Session state machine
│ ├── transport/ # TCP / io_uring / Winsock transport
│ ├── platform/ # Cross-platform abstraction
│ ├── types/ # Strong types (Price, Qty, Side)
│ ├── memory/ # PMR buffer pools
│ ├── store/ # Message store (PMR-optimized)
│ ├── serializer/ # Message serialization
│ ├── util/ # Utilities (diagnostics, formatting)
│ ├── messages/fix44/ # FIX 4.4 message builders
│ └── interfaces/ # Concepts and interfaces
├── benchmarks/ # Performance benchmarks
├── tests/ # Unit tests
├── examples/ # Example programs
└── docs/ # Documentation
```
---
## FAQ
### How does NexusFIX achieve zero-copy parsing?
NexusFIX uses `std::span` to create views into the original network buffer. Field values are never copied - the parser returns spans pointing to the exact byte range in the source buffer. This eliminates all `memcpy` and heap allocation overhead.
### Is NexusFIX compatible with QuickFIX?
NexusFIX implements the same FIX 4.4/5.0 protocol standards but with a different API optimized for performance. It is wire-compatible with any FIX counterparty, including systems using QuickFIX.
### What latency can I expect in production?
In our benchmarks: **~250 nanoseconds** for ExecutionReport parsing. Actual production latency depends on network, kernel configuration, and hardware. NexusFIX is designed to minimize the application-layer overhead.
### Does NexusFIX support FIX Repeating Groups?
Yes. Repeating groups are parsed with the same zero-copy approach. Group iteration is O(1) per entry.
---
## Use Cases
NexusFIX is designed for:
- **High-Frequency Trading (HFT)** - Sub-microsecond message processing
- **Algorithmic Trading Systems** - Low-latency order routing
- **Market Making** - High-throughput quote updates
- **Smart Order Routing (SOR)** - Multi-venue connectivity
- **Trading Infrastructure** - FIX gateways and bridges
---
## Contact
For questions or collaboration: nonagonal.portal@gmail.com
---
## Development
Built with **Modern C++23**. Optimized via hardware-aware high-performance patterns including cache-line alignment, SIMD vectorization, and zero-copy memory design. Verified through rigorous benchmarking and AI-assisted static analysis.
For technical deep-dives on our optimization journey, see [Optimization Diary](docs/optimization_diary.md).
---
## Contributing
This project is maintained by **Lattice9AI**.
- **Issues & Discussions**: Welcome for bug reports, performance questions, and feature discussions
- **Pull Requests**: Bug fixes and performance optimizations welcome (see [CONTRIBUTING.md](CONTRIBUTING.md))
- Feature PRs require prior discussion in Issues
- Performance PRs must include benchmark data (before/after)
All contributions must follow:
- C++23 standards
- Zero allocation on hot paths
- Include benchmarks for performance changes
---
## License
MIT License - See [LICENSE](LICENSE) file.
---
Built with Modern C++23 for ultra-low latency quantitative trading