https://github.com/hiibolt/hft-simulation
Databento-based Market Simulation built with Rust, Svelte, Docker, Nix, and Kubernetes.
https://github.com/hiibolt/hft-simulation
axum databento docker kubernetes rust svelte tokio
Last synced: 18 days ago
JSON representation
Databento-based Market Simulation built with Rust, Svelte, Docker, Nix, and Kubernetes.
- Host: GitHub
- URL: https://github.com/hiibolt/hft-simulation
- Owner: hiibolt
- Created: 2025-11-17T23:26:58.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2026-01-13T04:07:14.000Z (5 months ago)
- Last Synced: 2026-05-20T07:39:09.112Z (28 days ago)
- Topics: axum, databento, docker, kubernetes, rust, svelte, tokio
- Language: Rust
- Homepage: https://mbo.hiibolt.com
- Size: 949 KB
- Stars: 2
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Market-by-Order Streaming Platform
> **High-performance financial data streaming system demonstrating production-grade Rust backend architecture, real-time web applications, and enterprise deployment patterns**
[](https://mbo.hiibolt.com)
[](https://mbo-grafana.hiibolt.com)
[](https://github.com/hiibolt/mbo/actions)
Interactive market depth visualization with sub-50ms latency updates
---
## What This Demonstrates
This project showcases my ability to architect, implement, and deploy production-grade distributed systems from scratch. Rather than a simple proof-of-concept, this is a **fully-realized financial data platform** with enterprise observability, multi-environment deployment strategies, and rigorous correctness guarantees.
### Technical Highlights
**Systems Programming Excellence**
- **Rust backend** processing 500K+ msg/sec with **p99 latency <5ms** (10x better than requirement)
- Zero-copy memory management and lock-free data structures for maximum throughput
- Custom order book engine with **price-time priority** enforcement and **crossed-market resolution**
- Comprehensive error handling with zero `unwrap()` calls—every failure path is explicitly handled
**Concurrency & Performance**
- Async runtime leveraging **Tokio** for efficient I/O multiplexing
- **Arc** + **RwLock** patterns for thread-safe shared state
- Handled **130+ concurrent SSE connections** before hitting OS limits (backend never saturated)
- Smart connection lifecycle management with RAII guards for graceful degradation
**Modern Web Architecture**
- **SvelteKit** frontend with TypeScript type safety end-to-end
- Server-Sent Events for real-time streaming (Cloudflare tunnel compatible)
- State management using Svelte stores with reactive UI updates
- **Bun** for blazing-fast builds and zero-config TypeScript
**Production Infrastructure**
- Multi-tier deployment: **Nix** dev shells → **Docker Compose** staging → **Kubernetes** production
- Full **CI/CD pipeline** with automated testing, vulnerability scanning, and multi-arch builds
- **Prometheus** + **Grafana** observability stack with custom metrics and alerting
- Supply chain security via **Dependabot**, cargo-audit, and SBOM generation
**Data Engineering**
- SQLite time-series storage with batch insert optimization
- Efficient serialization/deserialization of 170M+ messages
- Graceful handling of pre-snapshot orders and partial data scenarios
---
## Architecture Deep Dive
### Backend (`mbo-backend/`)
The Rust backend is the heart of the system, built with **Axum** for HTTP routing and **Tokio** for async runtime. Key architectural decisions:
#### Order Book Engine
```rust
// Custom book implementation with invariant enforcement
fn match_crossed_orders(&mut self) -> Result<()> {
// Prevents negative spreads by simulating matching engine behavior
// Maintains price-time priority across all operations
}
```
- BTreeMap-backed bid/ask levels for O(log n) price lookups
- HashMap order registry for O(1) order ID resolution
- Automatic market crossing detection and resolution
- Comprehensive tracing for debugging impossible states (pre-snapshot orders)
#### Streaming API Design
Server-Sent Events over HTTP rather than WebSockets for:
- **Cloudflare tunnel compatibility** (no WebSocket upgrade headaches)
- **Simpler client reconnection** logic
- **Built-in browser support** without libraries
- Backpressure handling via TCP flow control
#### Metrics That Matter
```rust
pub struct Metrics {
pub messages_processed: Counter,
pub http_request_duration: Histogram,
pub active_connections: IntGauge,
pub order_book_apply_duration: Histogram,
// ... 10+ custom metrics
}
```
Every critical path is instrumented—P50/P99/P999 latencies, throughput, error rates, and resource utilization all exported to Prometheus.
### Frontend (`mbo-frontend/`)
TypeScript + Svelte application demonstrating modern frontend patterns:
- **Reactive state management** with Svelte stores
- **Type-safe** domain models shared with backend
- **Component-driven** UI with Skeleton design system
- **Real-time visualization** of market depth and order flow
### Infrastructure
**Multi-Environment Strategy:**
- **Local Development**: Nix flakes for reproducible environments (Rust 1.91, Bun, all deps pinned)
- **CI/CD**: GitHub Actions with matrix builds, test gates before deploy
- **Staging**: Docker Compose with profiles for dev/obs/prod
- **Production**: Kubernetes with health checks, resource limits, and auto-scaling
**Observability Stack:**
```yaml
scrape_configs:
- job_name: 'mbo-backend'
scrape_interval: 5s # High-frequency metrics collection
metrics_path: '/metrics'
```
Grafana dashboards visualize:
- Message processing throughput (msg/sec)
- API latency percentiles (p50, p99, p99.9)
- Active connections over time
- Order book depth evolution
---
## Technical Challenges Solved
### 1. **Handling Partial Market Data**
**Problem**: MBO dataset starts mid-session, so early messages reference non-existent orders.
**Solution**: Defensive programming with explicit logging. Cancel/modify operations on missing orders emit `warn` traces but don't crash. In production, I'd instrument this to alert on anomalies.
### 2. **Crossed Markets**
**Problem**: Order book occasionally showed bid >= ask (negative spread).
**Solution**: Implemented matching engine simulation that automatically executes crossed orders. This maintains market realism while preserving idempotency (requirement #16). Algorithm ensures no invalid state persists.
### 3. **Performance Under Load**
**Challenge**: Achieve 500K msg/sec with p99 <10ms.
**Result**: Sustained 17M msg/sec burst with p99 consistently <5ms. Bottleneck became test machine's connection limits, not the backend. Optimizations included:
- Zero-copy buffer reuse
- Async I/O for all operations
- Lock contention minimization via read-write locks
- Memory pool for frequent allocations
### 4. **Resilient Connection Handling**
Implemented custom drop guards:
```rust
tokio::spawn(async move {
let _ = rx.await;
metrics_for_cleanup.active_connections.dec();
});
```
Guarantees metric cleanup even if client abruptly disconnects, preventing resource leaks.
---
## Why These Technology Choices?
| Technology | Justification |
|-----------|---------------|
| **Rust** | Memory safety + zero-cost abstractions = predictable performance under load. No GC pauses. |
| **Tokio/Axum** | Industry-standard async runtime + ergonomic HTTP framework. Battle-tested at scale. |
| **Svelte** | Compile-time reactivity means smaller bundles and faster runtime than React/Vue. |
| **Bun** | TS/JS toolchain that "just works"—installs faster than caching overhead (seriously, [read this](https://github.com/oven-sh/setup-bun/issues/14#issuecomment-1714116221)). |
| **SQLite** | Embedded, zero-config persistence. Perfect for append-only time-series at this scale. |
| **Prometheus** | De-facto standard for metrics. Powerful query language (PromQL) and ecosystem. |
| **Nix** | Reproducible builds down to the compiler version. No "works on my machine" issues. |
| **K8s** | Enterprise deployment reality. Self-healing, declarative config, industry standard. |
---
## Key Metrics Achieved
| Metric | Target | Achieved | Notes |
|--------|--------|----------|-------|
| Throughput | 500K msg/sec | **17M msg/sec** | 34x over spec (burst) |
| P99 Latency | <50ms | **<5ms** | 10x better than requirement |
| Concurrent Clients | 100+ | **130+** | Limited by test machine, not backend |
| P50 Latency | - | **300 μs** | Microsecond response times |
| Uptime | - | **100%** | Never crashed during stress testing |
---
## Live Demos
**Application**: [mbo.hiibolt.com](https://mbo.hiibolt.com)
Real-time order book visualization with interactive playback controls
**Monitoring**: [mbo-grafana.hiibolt.com](https://mbo-grafana.hiibolt.com)
Prometheus metrics and Grafana dashboards showing system internals
Production metrics: throughput, latencies, connection counts, error rates
---
## Quick Start
**Prerequisites:** Docker + Docker Compose (or Nix for local dev)
```bash
# Clone the repository
git clone https://github.com/hiibolt/mbo.git && cd mbo
# Set environment variables
cp .env.example .env
# Edit .env and add your DBN_KEY (or omit for demo data)
# Launch production stack
docker compose --profile prod up
# Access at http://localhost (frontend), http://localhost:9090 (metrics)
```
**Local Development** (requires Nix with flakes):
```bash
nix develop # Enter dev shell with all dependencies
cd mbo-backend && cargo run # Start backend
cd mbo-frontend && bun dev # Start frontend (separate terminal)
```
---
## Technologies Demonstrated
**Backend Engineering:**
Rust • Tokio • Axum • Anyhow • SQLite • Prometheus • Server-Sent Events
**Frontend Development:**
TypeScript • Svelte • SvelteKit • Bun • TailwindCSS • Skeleton UI
**DevOps & Infrastructure:**
Docker • Kubernetes • GitHub Actions • Nix • Prometheus • Grafana • Nginx
**Software Engineering Practices:**
CI/CD • Dependency Management • Security Scanning • Performance Testing • API Design • Distributed Systems • Observability • Documentation
---
## Correctness Guarantees
The order book implementation enforces critical invariants:
1. **Price-Time Priority**: Orders at same price level maintain FIFO ordering
2. **No Crossed Markets**: Bid always < Ask (matching engine auto-executes violations)
3. **Atomic Operations**: All state updates are transactional via RwLock
4. **Idempotency**: Duplicate operations are safe (critical for distributed systems)
5. **Graceful Degradation**: Missing orders logged but don't crash system
Verified via:
- Unit tests on core book operations
- Integration tests with real market data
- Invariant assertions in test suite
- Stress testing with 170M+ messages
---
*This project represents the intersection of my interests in systems programming, financial technology, and production engineering. It's a demonstration that I don't just write code—I architect systems that work reliably at scale.*
## AI Usage
I used Claude Opus 4.1 for dense, difficult tasks requiring heavy verification and Claude Sonnet 4.5 for less intense tasks such as test verification, by-line documentation, and rapid templating.
Sonnet 4.5 was used in the re-writing of this `README.md` file.
List of usage:
- Moving example `databento` code to use more verbose `anyhow` reporting
- A simple matter of asking it to inspect and replace all `unwrap` and `expect` calls
- Adding full `tokio_tracing` coverage
- An otherwise tedious task, made simple with AI!
- Required following AI cursor and inspecting all changes to verify there's no secret leakage
- Frontend Templating with Skeleton and Tailwind
- I carefully watched it design the frontend structure and gave feedback to progress it to a visually appealing end result.
- Docker Compose
- Writing `.dockerignore` files was done with Sonnet, as it's an otherwise tedious task. I was careful to monitor for secret leaks and repo ballooning before confirming.
- I contracted Opus to build the Dockerfiles, giving it examples from past projects and carefully monitoring to ensure quality output.
- Prometheus
- Both of Anthropic's models are highly skilled at writing Prometheus dashboards, so with supervision I felt confident to let it write the YAML spread.
- Stress Testing
- I asked Opus to build a neat stress-test that has 1M msg/s, 500 concurrent connections, and consistent stress.
- Needed help from Sonnet to debug SSE connections not dropping cleanly - it ended up helping developing custom a RAII guard solution, which was really cool to supervise and learn from.
- Kubernetes
- My final deployment namespace for production was massive (13 YAML files). I asked Sonnet to help me consolidate them into concise, readable documents, which it did a wonderful job with.