{"id":49927787,"url":"https://github.com/pbower/minarrow","last_synced_at":"2026-05-17T01:05:02.478Z","repository":{"id":309445707,"uuid":"1035712879","full_name":"pbower/minarrow","owner":"pbower","description":"Apache Arrow and Polars compatible, Rust-first columnar data library for real-time and systems workloads","archived":false,"fork":false,"pushed_at":"2026-05-16T23:26:51.000Z","size":1499,"stargazers_count":70,"open_issues_count":2,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-16T23:42:23.966Z","etag":null,"topics":["arrow","data-science","dataengineering","polars","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pbower.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":"CLA.md"}},"created_at":"2025-08-11T01:18:01.000Z","updated_at":"2026-05-16T23:24:43.000Z","dependencies_parsed_at":"2025-08-12T00:26:01.532Z","dependency_job_id":"07c62602-1db7-4442-b9c2-3643e277d47a","html_url":"https://github.com/pbower/minarrow","commit_stats":null,"previous_names":["pbower/minarrow"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/pbower/minarrow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbower%2Fminarrow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbower%2Fminarrow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbower%2Fminarrow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbower%2Fminarrow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pbower","download_url":"https://codeload.github.com/pbower/minarrow/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbower%2Fminarrow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33124143,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-16T18:38:32.183Z","status":"ssl_error","status_checked_at":"2026-05-16T18:38:29.903Z","response_time":115,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrow","data-science","dataengineering","polars","rust"],"created_at":"2026-05-17T01:05:01.033Z","updated_at":"2026-05-17T01:05:02.471Z","avatar_url":"https://github.com/pbower.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Minarrow\n\n**A fast, minimal columnar data library for Rust with Arrow compatibility.**\n\nMinarrow gives you typed columnar arrays that compile in ~1.5 seconds, run with SIMD alignment, and convert to Arrow when you need interop. It keeps the common path concrete and lightweight, so iteration stays fast.\n\n## Why Minarrow?\n\n**The gap:** Arrow-rs is powerful but heavy. Build times can stretch to minutes, which matters when it becomes a base dependency for other systems. Working with arrays often means handling them through `dyn Array`, where the concrete backing vector is hidden behind the trait object. You downcast to recover the type before the data becomes ergonomic again. For real-time systems, embedded devices, or rapid iteration, that friction adds up.\n\n**The solution:** Minarrow keeps concrete types throughout. An `IntegerArray\u003ci64\u003e` stays fully typed through composable abstractions. You get direct access, ergonomics, IDE autocomplete, and fast compilation. When you need to talk to Arrow, Polars, or PyArrow, zero-copy conversion is one method call away.\n\n## Quick Start\n\n```rust\nuse minarrow::{arr_i32, arr_f64, arr_str32, arr_bool};\n\n// Create arrays with macros\nlet ids = arr_i32![1, 2, 3, 4];\nlet prices = arr_f64![10.5, 20.0, 15.75];\nlet names = arr_str32![\"alice\", \"bob\", \"charlie\"];\nlet flags = arr_bool![true, false, true];\n\n// Direct typed access - no downcasting\nassert_eq!(ids.len(), 4);\nassert_eq!(prices.get(0), Some(10.5));\n```\n\n```rust\nuse minarrow::{FieldArray, Table, arr_i32, arr_str32, Print};\n\n// Build tables from columns\nlet table = Table::new(\n    \"users\".into(),\n    vec![\n        FieldArray::from_arr(\"id\", arr_i32![1, 2, 3]),\n        FieldArray::from_arr(\"name\", arr_str32![\"alice\", \"bob\", \"charlie\"]),\n    ].into(),\n);\ntable.print();\n```\n\n## Core Features\n\n### Typed Arrays\n\nSix array types cover standard workloads:\n\n| Type | Description |\n|------|-------------|\n| `IntegerArray\u003cT\u003e` | i8 through u64 |\n| `FloatArray\u003cT\u003e` | f32, f64 |\n| `StringArray\u003cT\u003e` | UTF-8 with u32 or u64 offsets |\n| `BooleanArray` | Bit-packed with validity mask |\n| `CategoricalArray\u003cT\u003e` | Dictionary-encoded |\n| `DatetimeArray\u003cT\u003e` | Timestamps, dates, durations |\n\nSemantic groupings (`NumericArray`, `TextArray`, `TemporalArray`) let you write generic functions while keeping static dispatch.\n\n`Array` and `Table` complete the story, with chunked `Super` versions for streaming.\n\n### Fast Compilation\n\n| Metric | Time |\n|--------|------|\n| Clean build | \u003c 1.5s |\n| Incremental rebuild | \u003c 0.15s |\n\nAchieved through minimal dependencies: primarily `num-traits`, with optional `rayon` for parallelism.\n\n### SIMD Alignment\n\nAll buffers use 64-byte alignment via `Vec64`. No reallocation step to fix alignment-data is ready for vectorised operations from the moment it's created.\n\n### Zero-Copy Views\n\nSelect columns and rows without copying data:\n\n```rust\nuse minarrow::*;\n\nlet table = create_table();\n\n// Pandas-style selection\nlet view = table.c(\u0026[\"name\", \"value\"]);  // columns\nlet view = table.r(10..20);               // rows\nlet view = table.c(\u0026[\"A\", \"B\"]).r(0..100); // both\n\n// Materialise only when needed\nlet owned = view.to_table();\n```\n\n### Streaming with SuperArrays\n\nFor streaming workloads, `SuperArray` and `SuperTable` hold multiple chunks with consistent schema:\n\n```rust\n// Append chunks as they arrive\nlet mut super_table = SuperTable::new();\nsuper_table.push_table(batch1);\nsuper_table.push_table(batch2);\n\n// Consolidate to single table when ready\nlet table = super_table.consolidate();\n```\n\n### Arrow Interop\n\nConvert at the boundary, stay native internally:\n\n```rust\n// To Arrow (feature: cast_arrow)\nlet arrow_array = minarrow_array.to_apache_arrow();\n\n// To Polars (feature: cast_polars)\nlet series = minarrow_array.to_polars();\n\n// FFI via Arrow C Data Interface\nlet (array_ptr, schema_ptr) = minarrow_array.export_to_c();\n```\n\n## Architecture\n\nMinarrow uses enums for type dispatch instead of trait objects:\n\n```rust\n// Static dispatch, full inlining\nmatch array {\n    Array::NumericArray(num) =\u003e match num {\n        NumericArray::Int64(arr) =\u003e process(arr),\n        NumericArray::Float64(arr) =\u003e process(arr),\n        // ...\n    },\n    // ...\n}\n```\n\nThis gives you:\n- **Performance** - Compiler inlines through the dispatch\n- **Type safety** - No `Any`, no runtime downcasts\n- **Ergonomics** - Direct accessors like `array.num().i64()`\n\n## Benchmarks\n\nSum of 1,000 integers, averaged over 1,000 runs (Intel Ultra 7 155H):\n\n| Implementation | Time |\n|----------------|------|\n| Raw `Vec\u003ci64\u003e` | 85 ns |\n| Minarrow `IntegerArray` (direct) | 88 ns |\n| Minarrow `IntegerArray` (via enum) | 124 ns |\n| Arrow-rs `Int64Array` (struct) | 147 ns |\n| Arrow-rs `Int64Array` (dyn) | 181 ns |\n\nMinarrow's direct access matches raw Vec performance. Even through enum dispatch, it outperforms arrow-rs.\n\nWith SIMD + Rayon, summing 1 billion integers takes ~114ms.\n\n## Feature Flags\n\nEnable what you need:\n\n| Feature | Description |\n|---------|-------------|\n| `views` | Zero-copy windowed access (default) |\n| `chunked` | SuperArray/SuperTable for streaming (default) |\n| `datetime` | Temporal types |\n| `cast_arrow` | Arrow-rs conversion |\n| `cast_polars` | Polars conversion |\n| `parallel_proc` | Rayon parallel iterators |\n| `select` | Pandas-style `.c()` / `.r()` selection |\n| `broadcast` | Arithmetic broadcasting |\n\n## Ecosystem\n\n| Crate | Purpose |\n|-------|---------|\n| `minarrow-pyo3` | Zero-copy Python interop via PyArrow. See [pyo3/README.md](pyo3/README.md) |\n| `lightstream` | Zero-copy Arrow streaming over Tokio, TCP, QUIC, WebSocket, Unix sockets, and Stdio |\n| `simd-kernels` | 60+ SIMD kernels including statistical distributions |\n| `vec64` | 64-byte aligned Vec for optimal SIMD |\n\n## Limitations\n\nMinarrow focuses on flat columnar data and 80/20. Nested types (List, Struct) are not currently supported. If you need deeply nested schemas, arrow-rs is the better choice.\n\n## Contributing\n\nContributions are welcome, particularly in the following areas:\n\n1. **Connectors** – Data source and sink integrations\n2. **Optimisations** – Performance improvements\n3. **Nested types** – List and Struct support\n4. **Bug fixes**\n\nAll contributions are subject to the Contributor Licence Agreement (CLA).\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for details.\n\n## License\n\nCopyright © 2025–2026 Peter Garfield Bower.\n\nReleased under the Apache 2.0 License. See [LICENSE](LICENSE) for details.\n\n## Acknowledgements\n\nMinarrow is a from-scratch implementation of the Apache Arrow memory layout inspired by the standards pioneered by Apache Arrow, Arrow2, and Polars.\n\nMinarrow is not affiliated with Apache Arrow.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpbower%2Fminarrow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpbower%2Fminarrow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpbower%2Fminarrow/lists"}