{"id":30687595,"url":"https://github.com/james-ralph8555/1brc","last_synced_at":"2026-04-18T11:03:26.630Z","repository":{"id":310788666,"uuid":"1041217816","full_name":"james-ralph8555/1brc","owner":"james-ralph8555","description":"1brc https://www.morling.dev/blog/one-billion-row-challenge/","archived":false,"fork":false,"pushed_at":"2025-08-21T06:40:58.000Z","size":769,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-02T00:07:45.773Z","etag":null,"topics":["cpp","datafusion","duckdb","rust","sql"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/james-ralph8555.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-20T06:50:14.000Z","updated_at":"2025-08-21T06:41:01.000Z","dependencies_parsed_at":"2025-08-20T09:18:40.830Z","dependency_job_id":null,"html_url":"https://github.com/james-ralph8555/1brc","commit_stats":null,"previous_names":["james-ralph8555/1brc"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/james-ralph8555/1brc","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/james-ralph8555%2F1brc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/james-ralph8555%2F1brc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/james-ralph8555%2F1brc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/james-ralph8555%2F1brc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/james-ralph8555","download_url":"https://codeload.github.com/james-ralph8555/1brc/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/james-ralph8555%2F1brc/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279015709,"owners_count":26085748,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-13T02:00:06.723Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","datafusion","duckdb","rust","sql"],"created_at":"2025-09-02T00:03:25.767Z","updated_at":"2025-10-13T14:15:03.195Z","avatar_url":"https://github.com/james-ralph8555.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# One Billion Row Challenge Implementations\n\n![Flamegraph](flamegraph.jpg)\n\nThis repository contains high-performance implementations of the [One Billion Row Challenge](https://github.com/gunnarmorling/1brc) using different technologies and optimization strategies.\n\n## 🏆 Performance Results\n\nBoth implementations achieve exceptional sub-10-second performance on the full 1-billion-row dataset using Float64/Double data types.\n\n| Implementation         | Mean Time (s) | Std Dev (s) | Min Time (s) | Max Time (s) | Binary Size |\n| ---------------------- | ------------- | ----------- | ------------ | ------------ | ----------- |\n| **1brc-datafusion-rs** | 4.902         | 0.097       | 4.794        | 5.055        | 31M         |\n| **1brc-duckdb-cpp**    | 8.83          | 0.06        | 8.70         | 8.92         | 29K         |\n\n*Benchmarks performed on AMD Ryzen 9 5900X (24 cores) @ 4.95 GHz, 32GB RAM*\n\n## Implementations\n\n### [1brc-datafusion-rs](./1brc-datafusion-rs/) - Rust Implementation\n- **Technology**: Apache DataFusion query engine\n- **Strategy**: Systematic optimization with explicit schema, compiler tuning, and Profile-Guided Optimization\n- **Key Features**: High-level API constraint, production-ready safety\n\n### [1brc-duckdb-cpp](./1brc-duckdb-cpp/) - C++ Implementation  \n- **Technology**: DuckDB analytical database\n- **Strategy**: Single SQL query with aggressive compiler optimizations\n- **Key Features**: Minimal code complexity, maximum CPU utilization\n\n## Generating Test Data\n\nThe official 1BRC data generator is included in this repository. Use it to create the measurements dataset:\n\n```bash\n# Compile the data generator (requires Java and javac)\njavac dev/morling/onebrc/CreateMeasurements.java\n\n# Generate the full 1 billion row dataset (takes 5-10 minutes)\ntime java -cp . dev.morling.onebrc.CreateMeasurements 1000000000\n\n# Or generate a smaller test dataset for development\njava -cp . dev.morling.onebrc.CreateMeasurements 100000\n```\n\nThe generator creates realistic weather station data with:\n- 413 authentic weather stations from around the world\n- Gaussian-distributed temperatures around each station's mean\n- Temperature range: -99.9°C to +99.9°C\n- Output file: `measurements.txt` (approximately 12-16 GB for 1 billion rows)\n\n## Quick Start\n\n### Prerequisites\n- **Java**: For data generation\n- **C++**: GCC/Clang with CMake for C++ implementation\n- **Rust**: Latest stable version for Rust implementation\n- **hyperfine**: `cargo install hyperfine` (recommended for benchmarking)\n\n### Build and Run\n```bash\n# 1. Generate test data\njavac dev/morling/onebrc/CreateMeasurements.java\njava -cp . dev.morling.onebrc.CreateMeasurements 1000000000\n\n# 2a. Run Rust implementation (fastest)\ncd 1brc-datafusion-rs\n./build.sh\n./benchmark.sh                                              # Comprehensive benchmark\n# OR manually:\ntime ./target/release/onebrc-datafusion ../test_data/measurements_1b.txt results.csv\n\n# 2b. Run C++ implementation\ncd ../1brc-duckdb-cpp\n./build.sh\n./benchmark.sh                                              # Comprehensive benchmark  \n# OR manually:\ntime ./build/1brc_duckdb ../test_data/measurements_1b.txt results.csv\n```\n\n## Architecture Strategy\n\nBoth implementations use a **query engine approach** rather than custom data processing:\n\n- **Core Philosophy**: Leverage purpose-built analytical engines instead of implementing custom parsing/aggregation\n- **Single Query Strategy**: Delegate entire data pipeline (reading, parsing, grouping, sorting) to declarative SQL\n- **Engine Optimization**: Configure database engines for maximum CPU/memory utilization\n- **Minimal Host Code**: Focus only on configuration and result formatting\n\nThis approach avoids the complexity of building custom high-performance parsers, instead relying on heavily optimized, parallel, and vectorized analytical database engines.\n\n## Implementation Standardization\n\nBoth implementations follow key standardizations for consistency:\n\n- **No A Priori Station List**: Implementations dynamically discover weather stations from the data rather than using a predefined list of 413 stations\n- **CSV Output Format**: All implementations output results as CSV files instead of the original challenge's print format, enabling better data processing and analysis\n\n### Output Format\nBoth implementations process input format `\u003cstation_name\u003e;\u003ctemperature\u003e` and produce a CSV file:\n```csv\nstation_name,min_measurement,mean_measurement,max_measurement\nBulawayo,8.9,9.0,9.1\nHamburg,12.0,12.6,13.2\nPalembang,38.8,38.9,39.0\n```\nResults are sorted alphabetically by station name with temperatures rounded to one decimal place.\n\n## Repository Structure\n\n```\n1brc/\n├── dev/morling/onebrc/          # Java data generator (shared)\n├── measurements.txt             # Generated dataset (shared)\n├── test_data/                   # Test datasets\n│   ├── measurements_1k.txt     # 1,000 row test dataset\n│   └── measurements_1b.txt     # 1 billion row dataset\n├── 1brc-duckdb-cpp/            # C++ implementation using DuckDB\n│   ├── README.md               # Detailed C++ implementation guide\n│   ├── CLAUDE.md               # Claude Code guidance\n│   ├── src/main.cpp            # Float64/Double implementation\n│   ├── test.sh                 # Test script (uses measurements_1k.txt)\n│   ├── benchmark.sh            # Comprehensive benchmark script\n│   └── build/                  # Compiled binaries\n├── 1brc-datafusion-rs/         # Rust implementation using DataFusion  \n│   ├── README.md               # Detailed Rust implementation guide\n│   ├── CLAUDE.md               # Claude Code guidance\n│   ├── src/main.rs             # Float64 implementation\n│   ├── test.sh                 # Test script (uses measurements_1k.txt)\n│   ├── benchmark.sh            # Comprehensive benchmark script\n│   └── target/release/         # Compiled binaries\n└── README.md                   # This overview file\n```\n\n### Key Insights\n- **DataFusion (Rust)**: Achieves highly consistent performance through systematic optimization\n- **DuckDB (C++)**: Achieves excellent parallelization\n- Both approaches prioritize **declarative simplicity** over custom implementation complexity\n\n## References\n\n- [One Billion Row Challenge](https://github.com/gunnarmorling/1brc) by Gunnar Morling\n- [Apache DataFusion](https://github.com/apache/arrow-datafusion) - Rust query engine\n- [DuckDB](https://duckdb.org/) - Analytical database for C++\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjames-ralph8555%2F1brc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjames-ralph8555%2F1brc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjames-ralph8555%2F1brc/lists"}