https://github.com/Lattice9AI/NexusFix

A zero-alloc, compile-time hardened FIX engine built for sub-100ns execution.
https://github.com/Lattice9AI/NexusFix
cpp23 fix-protocol header-only hft high-frequency-trading low-latency quantitative-finance simd trading
Last synced: about 22 hours ago
JSON representation
A zero-alloc, compile-time hardened FIX engine built for sub-100ns execution.
Host: GitHub
URL: https://github.com/Lattice9AI/NexusFix
Owner: Lattice9AI
License: mit
Created: 2026-01-22T00:51:10.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-03-31T23:42:31.000Z (9 days ago)
Last Synced: 2026-04-01T01:53:15.591Z (9 days ago)
Topics: cpp23, fix-protocol, header-only, hft, high-frequency-trading, low-latency, quantitative-finance, simd, trading
Language: C++
Size: 8.78 MB
Stars: 42
Watchers: 4
Forks: 7
Open Issues: 1
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project

awesome-quant - NexusFix - `CPP` - C++23 FIX protocol engine with zero-copy parsing and SIMD acceleration, 3x faster than QuickFIX. (Trading & Backtesting)
awesome-quant - NexusFix - C++23 高性能 FIX 协议引擎，零拷贝解析 + SIMD 加速，解析 ExecutionReport ~246ns（QuickFIX ~730ns） (编程 / C++)
README

          


  
NexusFIX

  

    Ultra-Low Latency FIX Protocol Engine for High-Frequency Trading

  

  

    Modern C++23 | Zero-Copy | SIMD-Accelerated | 3x Faster than QuickFIX

  




  

  

  

  

  

  

  

  

  





  Performance |

  Architecture |

  Features |

  Quick Start |

  Docs |

  Contact



---

## Why NexusFIX?

**NexusFIX** is a high-performance **FIX protocol** (Financial Information eXchange) engine built for **ultra-low latency quantitative trading**, **sub-microsecond algorithmic execution**, and **high-frequency trading (HFT)** systems. It solves the **performance bottlenecks** of traditional FIX engines by utilizing **hardware-aware C++ programming**.

NexusFIX serves as a modern, faster alternative to QuickFIX with **zero heap allocations** on the critical path.

> *"If you're building a low-latency trading system and QuickFIX is your bottleneck, NexusFIX is your solution."*

---

## Performance

### NexusFIX vs QuickFIX Benchmark

Tested on Linux with GCC 13.3, 100,000 iterations:

| Metric | QuickFIX | NexusFIX | Improvement |

|--------|----------|----------|-------------|

| **ExecutionReport Parse** | 730 ns | 246 ns | **3.0x faster** |

| **NewOrderSingle Parse** | 661 ns | 229 ns | **2.9x faster** |

| **Field Lookup** (O(1) post-parse, 4 fields) | 31 ns | 11 ns | **2.9x faster** |

| **Parse Throughput** | 1.19M msg/sec | 4.17M msg/sec | **3.5x higher** |

| **P99 Parse Latency** | 784 ns | 258 ns | **3.0x lower** |

### Why is NexusFIX Faster?

| Technique | QuickFIX | NexusFIX |

|-----------|----------|----------|

| Memory | Heap allocation per message | Zero-copy `std::span` views |

| Field Lookup | O(log n) `std::map` | O(1) direct array indexing |

| Parsing | Byte-by-byte scanning | AVX2 SIMD vectorized |

| Field Offsets | Runtime calculation | `consteval` compile-time |

| Enum/Type Conversion | Runtime switch chains (~300 branches) | 22 compile-time lookup tables (55-97% faster) |

| Error Handling | Exceptions | `std::expected` (no throw) |

### Zero Allocation Proof

Parsing a **NewOrderSingle** message on the hot path:

| Operation | QuickFIX | NexusFIX |

|-----------|----------|----------|

| **Heap Allocations** | ~12 (`std::string`, `std::map` nodes) | **0** |

| **Field Storage** | `std::map` copies | `std::span` views into original buffer |

| **Parsing Logic** | Runtime map insertion | Compile-time offset table |

| **Memory Footprint** | Dynamic, unpredictable | Static, pre-allocated PMR pool |

| **Destructor Overhead** | ~12 `std::string` destructors | **0** (no owned memory) |

*Verified via custom allocator instrumentation. See [Optimization Diary](docs/optimization_diary.md).*

*For kernel bypass (DPDK/AF_XDP) and FPGA acceleration, see [Roadmap](docs/design/TICKET_204_AERON_HIGH_THROUGHPUT_MESSAGING.md).*

---

## Architecture Influences

NexusFIX stands on the shoulders of giants. We systematically studied **11 industry-leading Modern C++ libraries** and applied their techniques to ultra-low latency FIX processing. Below is our learning journey: what we learned, what we built, and what improvement we measured.

### Learning → Implementation → Verification

| Source Library | Engineering Evaluation | What We Changed | Benchmark Result |

|----------------|------------------------|-----------------|------------------|

| [hffix](https://github.com/jamesdbrock/hffix) | O(n) iterator-based field lookup is suboptimal for dense FIX packets; lacks compile-time optimization and type safety | `[Optimized]` `consteval` field offsets + `std::span` zero-copy views + O(1) direct indexing | **14ns** field access vs ~50ns iterator scan |

| [Abseil](https://github.com/abseil/abseil-cpp) | Swiss Tables offer SIMD-accelerated probing with 7-bit H2 fingerprints; superior cache locality for session maps | `[Adopted]` `absl::flat_hash_map` for session store | **[31% faster](docs/compare/ABSEIL_FLAT_HASH_MAP_BENCHMARK.md)** (20ns → 15ns) |

| [Quill](https://github.com/odygrd/quill) | Lock-free SPSC queue with deferred formatting; only viable approach for hot-path logging without blocking | `[Adopted]` Quill as logging backend | **8ns** median latency; zero blocking |

| [NanoLog](https://github.com/PlatformLab/NanoLog) | Binary encoding + background thread achieves 7ns; compile-time format validation essential for type safety | `[Synthesized]` `DeferredProcessor` with static type-safe binary serialization | **[84% reduction](docs/compare/DEFERRED_PROCESSOR_BENCHMARK.md)** (75ns → 12ns) |

| [liburing](https://github.com/axboe/liburing) | `DEFER_TASKRUN` defers completion to userspace, eliminating kernel task wakeups; registered buffers avoid per-op mapping | `[Adopted]` io_uring + DEFER_TASKRUN + registered buffers + multishot | **[7-27% faster](docs/compare/DEFER_TASKRUN_BENCHMARK.md)**; ~30% fewer syscalls |

| [Highway](https://github.com/google/highway) | Portable SIMD abstraction across AVX2/AVX-512/NEON/SVE; slight overhead vs direct intrinsics | `[Evaluated]` Retained hand-tuned intrinsics for FIX-specific patterns | **13x throughput**; Highway deferred for ARM |

| [Seastar](https://github.com/scylladb/seastar) | Share-nothing reactor optimal for high-concurrency I/O; high abstraction overhead for single-threaded tick-to-trade paths | `[Influenced]` Extracted core-pinning + lock-free pipelining without framework | **[8% P99 improvement](docs/compare/CPU_AFFINITY_BENCHMARK.md)** (18.8ns → 17.3ns) |

| [Folly](https://github.com/facebook/folly) | Advanced memory fencing patterns and lock-free primitives; `folly::Function` overhead acceptable for cold path only | `[Influenced]` Native SPSC queue + bit-masking for tag validation | Comparable performance; zero dependency |

| [Rigtorp](https://github.com/rigtorp/SPSCQueue) | Cache-line padding (`alignas(64)`) eliminates false sharing; simplest correct SPSC implementation | `[Synthesized]` Native `SPSCQueue` with identical techniques | **88M ops/sec**; 11ns median |

| [xsimd](https://github.com/xtensor-stack/xsimd) | Generic SIMD wrappers useful for math, but FIX parsing requires byte-level shuffle control | `[Evaluated]` Direct Intel intrinsics for SOH/delimiter scanning | **2x faster** than generic wrappers |

| [Boost.PMR](https://www.boost.org/doc/libs/release/libs/container/doc/html/container/polymorphic_memory_resources.html) | Standard allocators induce non-deterministic jitter; monotonic buffer enables arena allocation per message | `[Adopted]` `std::pmr::monotonic_buffer_resource` | **Zero heap allocation** on hot path |

### What We Built

| Component | Inspired By | Implementation |

|-----------|-------------|----------------|

| `TagOffsetMap` | hffix | Compile-time generated O(1) field lookup table |

| `DeferredProcessor` | NanoLog | SPSC queue + background thread for async processing |

| `ThreadLocalPool` | NanoLog, Folly | Per-thread object pool, zero lock contention |

| `SPSCQueue` | Rigtorp, Folly | Cache-line aligned lock-free queue |

| `simd_scanner` | xsimd (concept) | Hand-tuned AVX2/AVX-512 SOH and delimiter scanning |

| `IoUringTransport` | liburing | DEFER_TASKRUN + registered buffers + multishot recv |

| `CpuAffinity` | Seastar | Thread-to-core pinning utility |

### Cumulative Impact

| Metric | Before | After | Improvement |

|--------|--------|-------|-------------|

| ExecutionReport Parse | 730 ns | 246 ns | **3.0x faster** |

| Hot Path Latency | 361 ns | 213 ns | **41% reduction** |

| SIMD SOH Scan | ~150 ns | 11.8 ns | **~13x faster** |

| Hash Map Lookup | 20 ns | 15 ns | **31% faster** |

| P99 Tail Latency | 784 ns | 258 ns | **3.0x lower** |

*Detailed benchmarks: [Optimization Summary](docs/compare/OPTIMIZATION_SUMMARY_BEFORE_AFTER.md)*

### Attribution

NexusFIX is MIT licensed. We gratefully acknowledge these open source projects:

| Dependency | License | Usage |

|------------|---------|-------|

| [Abseil](https://github.com/abseil/abseil-cpp) | Apache 2.0 | `flat_hash_map` for session lookups |

| [Quill](https://github.com/odygrd/quill) | MIT | Async logging infrastructure |

| [liburing](https://github.com/axboe/liburing) | MIT/LGPL | io_uring C wrapper |

---

## Features

### Core Capabilities

- **Zero-Copy Parsing** - `std::span` views into original buffer, no `memcpy`

- **Message Encoding** - Builder pattern with `constexpr` serializer for constructing FIX messages

- **SIMD Acceleration** - AVX2/AVX-512 instructions for delimiter scanning

- **Compile-Time Optimization** - `consteval` field offsets, 22 lookup tables for enum/type conversion, ~300 runtime branches eliminated

- **O(1) Field Lookup** - Pre-indexed lookup table by FIX tag number (post-parse)

- **Zero Heap Allocation** - PMR pools and stack allocation on hot path

- **Session Management** - Full session lifecycle: Logon, Logout, Heartbeat, sequence number tracking, reconnect logic

- **Type-Safe API** - Strong types for Price, Quantity, Side, OrdType

### Modern C++23

- `std::expected` for error handling (no exceptions on hot path)

- `std::span` for zero-copy data views

- Concepts for compile-time interface validation

- `consteval` for compile-time computation

- `[[likely]]` / `[[unlikely]]` branch hints

### Supported FIX Versions

| Version | Status | Notes |

|---------|--------|-------|

| FIX 4.4 | Full Support | Most common in production |

| FIX 5.0 + FIXT 1.1 | Full Support | Only 2% overhead vs 4.4 |

### Supported Message Types

| MsgType | Name | Category |

|---------|------|----------|

| A | Logon | Session |

| 5 | Logout | Session |

| 0 | Heartbeat | Session |

| D | NewOrderSingle | Order Entry |

| F | OrderCancelRequest | Order Entry |

| 8 | ExecutionReport | Order Entry |

| V | MarketDataRequest | Market Data |

| W | MarketDataSnapshotFullRefresh | Market Data |

| X | MarketDataIncrementalRefresh | Market Data |

### Optimization Guide

How we achieved sub-300ns latency with Modern C++23:

- [Optimization Diary](docs/optimization_diary.md) - Step-by-step journey from 730ns to 246ns

- [Modern C++ Quant Techniques](docs/modernc_quant.md) - Cache-line alignment, SIMD, PMR strategies, branch hints

---

## Quick Start

### Installation

```bash

git clone https://github.com/Lattice9AI/NexusFIX.git

cd NexusFIX

./start.sh build

```

### Requirements

- **C++23 compiler**: GCC 13+ or Clang 17+

- **CMake**: 3.20+

- **OS**: Linux (io_uring optional), macOS, Windows

### Basic Usage

```cpp

#include 

using namespace nfx;

using namespace nfx::fix44;

// Connect to broker

TcpTransport transport;

transport.connect("fix.broker.com", 9876);

// Configure session

SessionConfig config{

    .sender_comp_id = "MY_CLIENT",

    .target_comp_id = "BROKER",

    .heartbeat_interval = 30

};

SessionManager session{transport, config};

session.initiate_logon();

// Send order (zero allocation)

MessageAssembler asm_;

NewOrderSingle::Builder order;

auto msg = order

    .cl_ord_id("ORD001")

    .symbol("AAPL")

    .side(Side::Buy)

    .order_qty(Qty::from_int(100))

    .ord_type(OrdType::Limit)

    .price(FixedPrice::from_double(150.00))

    .build(asm_);

transport.send(msg);

```

### Parse Incoming Messages

```cpp

// Zero-copy parsing

FixParser parser;

auto result = parser.parse(raw_buffer);

if (result) {

    auto& msg = *result;

    auto order_id = msg.get_string(Tag::OrderID);    // O(1) lookup

    auto exec_type = msg.get_char(Tag::ExecType);    // No allocation

    auto fill_qty = msg.get_qty(Tag::LastQty);       // Type-safe

}

```

---

## Build Options

| CMake Option | Default | Description |

|--------------|---------|-------------|

| `NFX_ENABLE_SIMD` | ON | AVX2/AVX-512 SIMD acceleration |

| `NFX_ENABLE_IO_URING` | OFF | Linux io_uring transport |

| `NFX_BUILD_BENCHMARKS` | ON | Build benchmark suite |

| `NFX_BUILD_TESTS` | ON | Build unit tests |

| `NFX_BUILD_EXAMPLES` | ON | Build examples |

```bash

# Build with all optimizations

cmake -B build -DCMAKE_BUILD_TYPE=Release -DNFX_ENABLE_SIMD=ON

cmake --build build -j

# Run benchmarks

./start.sh bench 100000

# Compare with QuickFIX

./start.sh compare 100000

```

---

## Benchmarking

Verify performance claims by running benchmarks yourself.

### Quick Start

```bash

# Run parser and session benchmarks

./start.sh bench 100000

# Example output:

# [BENCHMARK] ExecutionReport Parse

#   Iterations: 100000

#   Mean: 246 ns

#   P50:  245 ns

#   P99:  258 ns

```

### QuickFIX Comparison

Compare NexusFIX against QuickFIX (requires QuickFIX installed):

```bash

# Install QuickFIX first

# Ubuntu: sudo apt install libquickfix-dev

# Or build from source: https://github.com/quickfix/quickfix

# Run comparison

./start.sh compare 100000

```

### Full Reproduction Guide

For detailed instructions on reproducing benchmark results, including:

- Environment setup (CPU governor, pinning, priority)

- Build configuration options

- Interpreting results

- Troubleshooting

See [BENCHMARK_REPRODUCTION.md](BENCHMARK_REPRODUCTION.md)

---

## Documentation

- [API Reference](docs/API_REFERENCE.md) - Complete API documentation

- [Implementation Guide](docs/design/IMPLEMENTATION_GUIDE.md) - Architecture overview

- [Benchmark Report](docs/compare/BENCHMARK_COMPARISON_REPORT.md) - Detailed performance analysis

- [Modern C++ Techniques](docs/modernc_quant.md) - Optimization techniques used

---

## Project Structure

```

nexusfix/

├── include/nexusfix/

│   ├── parser/           # Zero-copy FIX parser (SIMD)

│   ├── session/          # Session state machine

│   ├── transport/        # TCP / io_uring / Winsock transport

│   ├── platform/         # Cross-platform abstraction

│   ├── types/            # Strong types (Price, Qty, Side)

│   ├── memory/           # PMR buffer pools

│   ├── store/            # Message store (PMR-optimized)

│   ├── serializer/       # Message serialization

│   ├── util/             # Utilities (diagnostics, formatting)

│   ├── messages/fix44/   # FIX 4.4 message builders

│   └── interfaces/       # Concepts and interfaces

├── benchmarks/           # Performance benchmarks

├── tests/                # Unit tests

├── examples/             # Example programs

└── docs/                 # Documentation

```

---

## FAQ

### How does NexusFIX achieve zero-copy parsing?

NexusFIX uses `std::span` to create views into the original network buffer. Field values are never copied - the parser returns spans pointing to the exact byte range in the source buffer. This eliminates all `memcpy` and heap allocation overhead.

### Is NexusFIX compatible with QuickFIX?

NexusFIX implements the same FIX 4.4/5.0 protocol standards but with a different API optimized for performance. It is wire-compatible with any FIX counterparty, including systems using QuickFIX.

### What latency can I expect in production?

In our benchmarks: **~250 nanoseconds** for ExecutionReport parsing. Actual production latency depends on network, kernel configuration, and hardware. NexusFIX is designed to minimize the application-layer overhead.

### Does NexusFIX support FIX Repeating Groups?

Yes. Repeating groups are parsed with the same zero-copy approach. Group iteration is O(1) per entry.

---

## Use Cases

NexusFIX is designed for:

- **High-Frequency Trading (HFT)** - Sub-microsecond message processing

- **Algorithmic Trading Systems** - Low-latency order routing

- **Market Making** - High-throughput quote updates

- **Smart Order Routing (SOR)** - Multi-venue connectivity

- **Trading Infrastructure** - FIX gateways and bridges

---

## Contact

For questions or collaboration: nonagonal.portal@gmail.com

---

## Development

Built with **Modern C++23**. Optimized via hardware-aware high-performance patterns including cache-line alignment, SIMD vectorization, and zero-copy memory design. Verified through rigorous benchmarking and AI-assisted static analysis.

For technical deep-dives on our optimization journey, see [Optimization Diary](docs/optimization_diary.md).

---

## Contributing

This project is maintained by **Lattice9AI**.

- **Issues & Discussions**: Welcome for bug reports, performance questions, and feature discussions

- **Pull Requests**: Bug fixes and performance optimizations welcome (see [CONTRIBUTING.md](CONTRIBUTING.md))

- Feature PRs require prior discussion in Issues

- Performance PRs must include benchmark data (before/after)

All contributions must follow:

- C++23 standards

- Zero allocation on hot paths

- Include benchmarks for performance changes

---

## License

MIT License - See [LICENSE](LICENSE) file.

---



  _{Built with Modern C++23 for ultra-low latency quantitative trading}
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Lattice9AI/NexusFix

Awesome Lists containing this project

README

NexusFIX