{"id":31274158,"url":"https://github.com/stefanoamorelli/crabrl","last_synced_at":"2025-09-23T22:39:22.194Z","repository":{"id":310321845,"uuid":"1037559968","full_name":"stefanoamorelli/crabrl","owner":"stefanoamorelli","description":"Rust XBRL parser that's 50-150x faster than traditional parsers. Built for speed and accuracy when processing SEC EDGAR filings.","archived":false,"fork":false,"pushed_at":"2025-08-17T11:37:43.000Z","size":1264,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-22T18:44:01.749Z","etag":null,"topics":["accounting","cli","edgar","finance","financial-data","financial-reporting","high-performance","parser","regulatory-reporting","rust","rust-lang","sec","sec-edgar","xbrl","xbrl-parser","xml-parser"],"latest_commit_sha":null,"homepage":"https://crates.io/crates/crabrl","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stefanoamorelli.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-13T18:59:04.000Z","updated_at":"2025-09-05T04:05:26.000Z","dependencies_parsed_at":"2025-08-17T10:56:57.034Z","dependency_job_id":null,"html_url":"https://github.com/stefanoamorelli/crabrl","commit_stats":null,"previous_names":["stefanoamorelli/crabrl"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/stefanoamorelli/crabrl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stefanoamorelli%2Fcrabrl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stefanoamorelli%2Fcrabrl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stefanoamorelli%2Fcrabrl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stefanoamorelli%2Fcrabrl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stefanoamorelli","download_url":"https://codeload.github.com/stefanoamorelli/crabrl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stefanoamorelli%2Fcrabrl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276662397,"owners_count":25682029,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-23T02:00:09.130Z","response_time":73,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["accounting","cli","edgar","finance","financial-data","financial-reporting","high-performance","parser","regulatory-reporting","rust","rust-lang","sec","sec-edgar","xbrl","xbrl-parser","xml-parser"],"created_at":"2025-09-23T22:39:17.602Z","updated_at":"2025-09-23T22:39:22.188Z","avatar_url":"https://github.com/stefanoamorelli.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# crabrl 🦀\n\n[![Crates.io](https://img.shields.io/crates/v/crabrl.svg)](https://crates.io/crates/crabrl)\n[![CI Status](https://github.com/stefanoamorelli/crabrl/workflows/CI/badge.svg)](https://github.com/stefanoamorelli/crabrl/actions)\n[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)\n[![Rust Version](https://img.shields.io/badge/rust-1.75%2B-orange.svg)](https://www.rust-lang.org)\n[![Downloads](https://img.shields.io/crates/d/crabrl.svg)](https://crates.io/crates/crabrl)\n[![docs.rs](https://docs.rs/crabrl/badge.svg)](https://docs.rs/crabrl)\n\n![crabrl Performance](benchmarks/header.png)\n\nLightning-fast XBRL parser that's **50-150x faster** than traditional parsers, built for speed and accuracy when processing [SEC EDGAR](https://www.sec.gov/edgar) filings.\n\n## Performance\n\n![Performance Benchmarks](benchmarks/performance_charts.png)\n\n### Speed Comparison\n\n![Speed Comparison](benchmarks/speed_comparison_clean.png)\n\n**Key Performance Metrics:**\n- **50-150x faster** than traditional XBRL parsers\n- **140,000+ facts/second** throughput\n- **\u003c 50MB memory** for 100K facts\n- **Linear scaling** with file size\n\n## Technical Architecture\n\ncrabrl is built on Rust's zero-cost abstractions and modern parsing techniques. While established parsers like [Arelle](https://arelle.org/) provide comprehensive XBRL specification support and extensive validation capabilities, crabrl focuses on high-performance parsing for scenarios where speed is critical.\n\n### Implementation Details\n\n| Optimization | Impact | Technology |\n|-------------|---------|------------|\n| **Zero-copy parsing** | -90% memory allocs | [`quick-xml`](https://github.com/tafia/quick-xml) with string slicing |\n| **No garbage collection** | Predictable latency | Rust's ownership model |\n| **Faster hashmaps** | 2x lookup speed | [`ahash`](https://github.com/tkaitchuck/aHash) instead of default hasher |\n| **Compact strings** | -50% memory for small strings | [`compact_str`](https://github.com/ParkMyCar/compact_str) |\n| **Parallelization** | 4-8x on multicore | [`rayon`](https://github.com/rayon-rs/rayon) work-stealing |\n| **Memory mapping** | Zero-copy file I/O | [`memmap2`](https://github.com/RazrFalcon/memmap2-rs) |\n| **Better allocator** | -25% allocation time | [`mimalloc`](https://github.com/microsoft/mimalloc) |\n\n**Benchmark results:** 100,000 XBRL facts parsed in 56ms (crabrl) vs 2,672ms (Arelle) on identical hardware.\n\n## XBRL Support Status\n\n| Feature | Description | Status |\n|---------|-------------|---------|\n| **XBRL 2.1 Instance** | Parse facts, contexts, units from `.xml` files | ✅ Stable |\n| **SEC Validation** | EDGAR-specific rules and checks | ✅ Stable |\n| **Calculation Linkbase** | Validate arithmetic relationships | ✅ Stable |\n| **Presentation Linkbase** | Extract display hierarchy | 🚧 Beta |\n| **Label Linkbase** | Human-readable concept names | 🚧 Beta |\n| **Definition Linkbase** | Dimensional relationships | 📋 Planned |\n| **Formula Linkbase** | Business rules validation | 📋 Planned |\n| **Inline XBRL (iXBRL)** | HTML-embedded XBRL | 📋 Planned |\n\n## Installation\n\n### From crates.io\n```bash\ncargo install crabrl\n```\n\n### From Source\n```bash\ngit clone https://github.com/stefanoamorelli/crabrl\ncd crabrl\ncargo build --release --features cli\n```\n\n### As Library Dependency\n```toml\n[dependencies]\ncrabrl = \"0.1.0\"\n```\n\n## Usage\n\n### CLI\n\n```bash\n# Parse and display summary\ncrabrl parse filing.xml\n\n# Parse with statistics (timing and throughput)\ncrabrl parse filing.xml --stats\n\n# Validate with generic rules\ncrabrl validate filing.xml\n\n# Validate with SEC EDGAR rules\ncrabrl validate filing.xml --profile sec-edgar\n\n# Validate with strict mode (warnings as errors)\ncrabrl validate filing.xml --strict\n\n# Benchmark performance\ncrabrl bench filing.xml --iterations 100\n```\n\n### Library\n\n#### Basic Usage\n\n```rust\nuse crabrl::Parser;\n\n// Parse XBRL document\nlet parser = Parser::new();\nlet doc = parser.parse_file(\"filing.xml\")?;\n\n// Access parsed data\nprintln!(\"Facts: {}\", doc.facts.len());\nprintln!(\"Contexts: {}\", doc.contexts.len());\nprintln!(\"Units: {}\", doc.units.len());\n```\n\n#### Parse from Different Sources\n\n```rust\n// From file path\nlet doc = parser.parse_file(\"filing.xml\")?;\n\n// From bytes\nlet xml_bytes = std::fs::read(\"filing.xml\")?;\nlet doc = parser.parse_bytes(\u0026xml_bytes)?;\n```\n\n#### Validation\n\n```rust\nuse crabrl::{Parser, Validator};\n\nlet parser = Parser::new();\nlet doc = parser.parse_file(\"filing.xml\")?;\n\n// Generic validation\nlet validator = Validator::new();\nlet result = validator.validate(\u0026doc)?;\n\nif result.is_valid {\n    println!(\"Document is valid!\");\n} else {\n    for error in \u0026result.errors {\n        eprintln!(\"Error: {}\", error);\n    }\n}\n\n// SEC EDGAR validation (stricter rules)\nlet sec_validator = Validator::sec_edgar();\nlet sec_result = sec_validator.validate(\u0026doc)?;\n```\n\n## Performance Measurements\n\nPerformance comparison with [Arelle](https://arelle.org/) v2.17.4 (Python-based XBRL processor with full specification support):\n\n### Synthetic Dataset Benchmarks\n\n| File Size | Facts | crabrl | Arelle | Ratio |\n|-----------|------:|-------:|-------:|------:|\n| Tiny      | 10    | 1.1 ms | 164 ms | 150x |\n| Small     | 100   | 1.4 ms | 168 ms | 119x |\n| Medium    | 1K    | 1.7 ms | 184 ms | 108x |\n| Large     | 10K   | 6.1 ms | 351 ms | 58x  |\n| Huge      | 100K  | 57 ms  | 2,672 ms | 47x |\n\n### SEC Filing Parse Times\n\n| Company | Filing Type | File Size | Facts | Parse Time | Throughput |\n|---------|-------------|-----------|-------|------------|------------|\n| Apple | [10-K 2023](https://www.sec.gov/Archives/edgar/data/320193/000032019323000106/aapl-20230930_htm.xml) | 1.4 MB | 1,075 | 2.1 ms | 516K facts/sec |\n| Microsoft | [10-Q 2023](https://www.sec.gov/Archives/edgar/data/789019/000095017023064280/msft-20230930_htm.xml) | 2.8 MB | 2,341 | 4.3 ms | 544K facts/sec |\n| Tesla | [10-K 2023](https://www.sec.gov/Archives/edgar/data/1318605/000162828024002390/tsla-20231231_htm.xml) | 3.1 MB | 3,122 | 5.8 ms | 538K facts/sec |\n\n### Run Your Own Benchmarks\n\n```bash\n# Quick benchmark with Criterion\ncargo bench\n\n# Compare against Arelle\ncd benchmarks \u0026\u0026 python compare_performance.py\n\n# Test on real SEC filings\npython scripts/download_fixtures.py  # Download Apple, MSFT, Tesla, etc.\ncargo run --release --bin crabrl -- bench fixtures/apple/aapl-20230930_htm.xml\n```\n\n## Resources \u0026 Links\n\n### XBRL Standards\n- [XBRL International](https://www.xbrl.org/) - Official XBRL specifications\n- [XBRL 2.1 Specification](https://www.xbrl.org/Specification/XBRL-2.1/REC-2003-12-31/XBRL-2.1-REC-2003-12-31+corrected-errata-2013-02-20.html) - Core standard we implement\n- [SEC EDGAR](https://www.sec.gov/edgar/searchedgar/companysearch) - Search real company filings\n- [EDGAR Filer Manual](https://www.sec.gov/info/edgar/forms/edgform.pdf) - SEC filing requirements\n\n### Dependencies We Use\n\n| Crate | Purpose | Why We Chose It |\n|-------|---------|-----------------|\n| [`quick-xml`](https://github.com/tafia/quick-xml) | XML parsing | Zero-copy, fastest XML parser in Rust |\n| [`ahash`](https://github.com/tkaitchuck/aHash) | HashMap hashing | 2x faster than default hasher |\n| [`compact_str`](https://github.com/ParkMyCar/compact_str) | String storage | Small string optimization |\n| [`rayon`](https://github.com/rayon-rs/rayon) | Parallelization | Work-stealing for automatic load balancing |\n| [`mimalloc`](https://github.com/microsoft/mimalloc) | Memory allocator | Microsoft's high-performance allocator |\n| [`criterion`](https://github.com/bheisler/criterion.rs) | Benchmarking | Statistical benchmarking with graphs |\n\n### Alternative XBRL Parsers\n- [Arelle](https://arelle.org/) - Complete XBRL processor with validation, formulas, and rendering (Python)\n- [python-xbrl](https://github.com/manusimidt/py-xbrl) - Lightweight Python parser\n- [xbrl-parser](https://www.npmjs.com/package/xbrl-parser) - JavaScript/Node.js\n- [XBRL4j](https://github.com/br-data/xbrl-parser) - Java implementation\n\n## License ⚖️\n\nThis open-source project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). This means:\n\n- You can use, modify, and distribute this software\n- If you modify and distribute it, you must release your changes under AGPL-3.0\n- If you run a modified version on a server, you must provide the source code to users\n- See the [LICENSE](LICENSE) file for full details\n\nFor commercial licensing options or other licensing inquiries, please contact stefano@amorelli.tech.\n\n© 2025 Stefano Amorelli – Released under the GNU Affero General Public License v3.0. Enjoy! 🎉","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstefanoamorelli%2Fcrabrl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstefanoamorelli%2Fcrabrl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstefanoamorelli%2Fcrabrl/lists"}