An open API service indexing awesome lists of open source software.

https://github.com/hiibolt/hft-simulation

Databento-based Market Simulation built with Rust, Svelte, Docker, Nix, and Kubernetes.
https://github.com/hiibolt/hft-simulation

axum databento docker kubernetes rust svelte tokio

Last synced: 18 days ago
JSON representation

Databento-based Market Simulation built with Rust, Svelte, Docker, Nix, and Kubernetes.

Awesome Lists containing this project

README

          

# Market-by-Order Streaming Platform

> **High-performance financial data streaming system demonstrating production-grade Rust backend architecture, real-time web applications, and enterprise deployment patterns**

[![Live Demo](https://img.shields.io/badge/demo-mbo.hiibolt.com-blue?style=for-the-badge)](https://mbo.hiibolt.com)
[![Monitoring](https://img.shields.io/badge/grafana-metrics-orange?style=for-the-badge)](https://mbo-grafana.hiibolt.com)
[![GitHub Actions](https://img.shields.io/github/actions/workflow/status/hiibolt/mbo/build.yml?style=for-the-badge)](https://github.com/hiibolt/mbo/actions)


Real-time order book visualization

Interactive market depth visualization with sub-50ms latency updates


---

## What This Demonstrates

This project showcases my ability to architect, implement, and deploy production-grade distributed systems from scratch. Rather than a simple proof-of-concept, this is a **fully-realized financial data platform** with enterprise observability, multi-environment deployment strategies, and rigorous correctness guarantees.

### Technical Highlights

**Systems Programming Excellence**
- **Rust backend** processing 500K+ msg/sec with **p99 latency <5ms** (10x better than requirement)
- Zero-copy memory management and lock-free data structures for maximum throughput
- Custom order book engine with **price-time priority** enforcement and **crossed-market resolution**
- Comprehensive error handling with zero `unwrap()` calls—every failure path is explicitly handled

**Concurrency & Performance**
- Async runtime leveraging **Tokio** for efficient I/O multiplexing
- **Arc** + **RwLock** patterns for thread-safe shared state
- Handled **130+ concurrent SSE connections** before hitting OS limits (backend never saturated)
- Smart connection lifecycle management with RAII guards for graceful degradation

**Modern Web Architecture**
- **SvelteKit** frontend with TypeScript type safety end-to-end
- Server-Sent Events for real-time streaming (Cloudflare tunnel compatible)
- State management using Svelte stores with reactive UI updates
- **Bun** for blazing-fast builds and zero-config TypeScript

**Production Infrastructure**
- Multi-tier deployment: **Nix** dev shells → **Docker Compose** staging → **Kubernetes** production
- Full **CI/CD pipeline** with automated testing, vulnerability scanning, and multi-arch builds
- **Prometheus** + **Grafana** observability stack with custom metrics and alerting
- Supply chain security via **Dependabot**, cargo-audit, and SBOM generation

**Data Engineering**
- SQLite time-series storage with batch insert optimization
- Efficient serialization/deserialization of 170M+ messages
- Graceful handling of pre-snapshot orders and partial data scenarios

---

## Architecture Deep Dive

### Backend (`mbo-backend/`)

The Rust backend is the heart of the system, built with **Axum** for HTTP routing and **Tokio** for async runtime. Key architectural decisions:

#### Order Book Engine
```rust
// Custom book implementation with invariant enforcement
fn match_crossed_orders(&mut self) -> Result<()> {
// Prevents negative spreads by simulating matching engine behavior
// Maintains price-time priority across all operations
}
```
- BTreeMap-backed bid/ask levels for O(log n) price lookups
- HashMap order registry for O(1) order ID resolution
- Automatic market crossing detection and resolution
- Comprehensive tracing for debugging impossible states (pre-snapshot orders)

#### Streaming API Design
Server-Sent Events over HTTP rather than WebSockets for:
- **Cloudflare tunnel compatibility** (no WebSocket upgrade headaches)
- **Simpler client reconnection** logic
- **Built-in browser support** without libraries
- Backpressure handling via TCP flow control

#### Metrics That Matter
```rust
pub struct Metrics {
pub messages_processed: Counter,
pub http_request_duration: Histogram,
pub active_connections: IntGauge,
pub order_book_apply_duration: Histogram,
// ... 10+ custom metrics
}
```
Every critical path is instrumented—P50/P99/P999 latencies, throughput, error rates, and resource utilization all exported to Prometheus.

### Frontend (`mbo-frontend/`)

TypeScript + Svelte application demonstrating modern frontend patterns:

- **Reactive state management** with Svelte stores
- **Type-safe** domain models shared with backend
- **Component-driven** UI with Skeleton design system
- **Real-time visualization** of market depth and order flow

### Infrastructure

**Multi-Environment Strategy:**
- **Local Development**: Nix flakes for reproducible environments (Rust 1.91, Bun, all deps pinned)
- **CI/CD**: GitHub Actions with matrix builds, test gates before deploy
- **Staging**: Docker Compose with profiles for dev/obs/prod
- **Production**: Kubernetes with health checks, resource limits, and auto-scaling

**Observability Stack:**
```yaml
scrape_configs:
- job_name: 'mbo-backend'
scrape_interval: 5s # High-frequency metrics collection
metrics_path: '/metrics'
```
Grafana dashboards visualize:
- Message processing throughput (msg/sec)
- API latency percentiles (p50, p99, p99.9)
- Active connections over time
- Order book depth evolution

---

## Technical Challenges Solved

### 1. **Handling Partial Market Data**
**Problem**: MBO dataset starts mid-session, so early messages reference non-existent orders.
**Solution**: Defensive programming with explicit logging. Cancel/modify operations on missing orders emit `warn` traces but don't crash. In production, I'd instrument this to alert on anomalies.

### 2. **Crossed Markets**
**Problem**: Order book occasionally showed bid >= ask (negative spread).
**Solution**: Implemented matching engine simulation that automatically executes crossed orders. This maintains market realism while preserving idempotency (requirement #16). Algorithm ensures no invalid state persists.

### 3. **Performance Under Load**
**Challenge**: Achieve 500K msg/sec with p99 <10ms.
**Result**: Sustained 17M msg/sec burst with p99 consistently <5ms. Bottleneck became test machine's connection limits, not the backend. Optimizations included:
- Zero-copy buffer reuse
- Async I/O for all operations
- Lock contention minimization via read-write locks
- Memory pool for frequent allocations

### 4. **Resilient Connection Handling**
Implemented custom drop guards:
```rust
tokio::spawn(async move {
let _ = rx.await;
metrics_for_cleanup.active_connections.dec();
});
```
Guarantees metric cleanup even if client abruptly disconnects, preventing resource leaks.

---

## Why These Technology Choices?

| Technology | Justification |
|-----------|---------------|
| **Rust** | Memory safety + zero-cost abstractions = predictable performance under load. No GC pauses. |
| **Tokio/Axum** | Industry-standard async runtime + ergonomic HTTP framework. Battle-tested at scale. |
| **Svelte** | Compile-time reactivity means smaller bundles and faster runtime than React/Vue. |
| **Bun** | TS/JS toolchain that "just works"—installs faster than caching overhead (seriously, [read this](https://github.com/oven-sh/setup-bun/issues/14#issuecomment-1714116221)). |
| **SQLite** | Embedded, zero-config persistence. Perfect for append-only time-series at this scale. |
| **Prometheus** | De-facto standard for metrics. Powerful query language (PromQL) and ecosystem. |
| **Nix** | Reproducible builds down to the compiler version. No "works on my machine" issues. |
| **K8s** | Enterprise deployment reality. Self-healing, declarative config, industry standard. |

---

## Key Metrics Achieved

| Metric | Target | Achieved | Notes |
|--------|--------|----------|-------|
| Throughput | 500K msg/sec | **17M msg/sec** | 34x over spec (burst) |
| P99 Latency | <50ms | **<5ms** | 10x better than requirement |
| Concurrent Clients | 100+ | **130+** | Limited by test machine, not backend |
| P50 Latency | - | **300 μs** | Microsecond response times |
| Uptime | - | **100%** | Never crashed during stress testing |

---

## Live Demos

**Application**: [mbo.hiibolt.com](https://mbo.hiibolt.com)
Real-time order book visualization with interactive playback controls

**Monitoring**: [mbo-grafana.hiibolt.com](https://mbo-grafana.hiibolt.com)
Prometheus metrics and Grafana dashboards showing system internals


Grafana observability dashboard

Production metrics: throughput, latencies, connection counts, error rates


---

## Quick Start

**Prerequisites:** Docker + Docker Compose (or Nix for local dev)

```bash
# Clone the repository
git clone https://github.com/hiibolt/mbo.git && cd mbo

# Set environment variables
cp .env.example .env
# Edit .env and add your DBN_KEY (or omit for demo data)

# Launch production stack
docker compose --profile prod up

# Access at http://localhost (frontend), http://localhost:9090 (metrics)
```

**Local Development** (requires Nix with flakes):
```bash
nix develop # Enter dev shell with all dependencies
cd mbo-backend && cargo run # Start backend
cd mbo-frontend && bun dev # Start frontend (separate terminal)
```

---

## Technologies Demonstrated

**Backend Engineering:**
Rust • Tokio • Axum • Anyhow • SQLite • Prometheus • Server-Sent Events

**Frontend Development:**
TypeScript • Svelte • SvelteKit • Bun • TailwindCSS • Skeleton UI

**DevOps & Infrastructure:**
Docker • Kubernetes • GitHub Actions • Nix • Prometheus • Grafana • Nginx

**Software Engineering Practices:**
CI/CD • Dependency Management • Security Scanning • Performance Testing • API Design • Distributed Systems • Observability • Documentation

---

## Correctness Guarantees

The order book implementation enforces critical invariants:

1. **Price-Time Priority**: Orders at same price level maintain FIFO ordering
2. **No Crossed Markets**: Bid always < Ask (matching engine auto-executes violations)
3. **Atomic Operations**: All state updates are transactional via RwLock
4. **Idempotency**: Duplicate operations are safe (critical for distributed systems)
5. **Graceful Degradation**: Missing orders logged but don't crash system

Verified via:
- Unit tests on core book operations
- Integration tests with real market data
- Invariant assertions in test suite
- Stress testing with 170M+ messages

---

*This project represents the intersection of my interests in systems programming, financial technology, and production engineering. It's a demonstration that I don't just write code—I architect systems that work reliably at scale.*

## AI Usage
I used Claude Opus 4.1 for dense, difficult tasks requiring heavy verification and Claude Sonnet 4.5 for less intense tasks such as test verification, by-line documentation, and rapid templating.

Sonnet 4.5 was used in the re-writing of this `README.md` file.

List of usage:
- Moving example `databento` code to use more verbose `anyhow` reporting
- A simple matter of asking it to inspect and replace all `unwrap` and `expect` calls
- Adding full `tokio_tracing` coverage
- An otherwise tedious task, made simple with AI!
- Required following AI cursor and inspecting all changes to verify there's no secret leakage
- Frontend Templating with Skeleton and Tailwind
- I carefully watched it design the frontend structure and gave feedback to progress it to a visually appealing end result.
- Docker Compose
- Writing `.dockerignore` files was done with Sonnet, as it's an otherwise tedious task. I was careful to monitor for secret leaks and repo ballooning before confirming.
- I contracted Opus to build the Dockerfiles, giving it examples from past projects and carefully monitoring to ensure quality output.
- Prometheus
- Both of Anthropic's models are highly skilled at writing Prometheus dashboards, so with supervision I felt confident to let it write the YAML spread.
- Stress Testing
- I asked Opus to build a neat stress-test that has 1M msg/s, 500 concurrent connections, and consistent stress.
- Needed help from Sonnet to debug SSE connections not dropping cleanly - it ended up helping developing custom a RAII guard solution, which was really cool to supervise and learn from.
- Kubernetes
- My final deployment namespace for production was massive (13 YAML files). I asked Sonnet to help me consolidate them into concise, readable documents, which it did a wonderful job with.