{"id":35867881,"url":"https://github.com/tonbo-io/typed-arrow","last_synced_at":"2026-02-06T06:01:29.198Z","repository":{"id":310027283,"uuid":"1038403100","full_name":"tonbo-io/typed-arrow","owner":"tonbo-io","description":"First-class compile‑time Arrow schemas for Rust.","archived":false,"fork":false,"pushed_at":"2026-02-05T09:13:39.000Z","size":500,"stargazers_count":198,"open_issues_count":12,"forks_count":12,"subscribers_count":4,"default_branch":"main","last_synced_at":"2026-02-05T15:56:52.129Z","etag":null,"topics":["arrow","data","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tonbo-io.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-08-15T06:13:12.000Z","updated_at":"2026-02-05T09:13:44.000Z","dependencies_parsed_at":"2026-02-06T06:00:46.267Z","dependency_job_id":null,"html_url":"https://github.com/tonbo-io/typed-arrow","commit_stats":null,"previous_names":["ethe/arrow-native","tonbo-io/typed-arrow"],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/tonbo-io/typed-arrow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tonbo-io%2Ftyped-arrow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tonbo-io%2Ftyped-arrow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tonbo-io%2Ftyped-arrow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tonbo-io%2Ftyped-arrow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tonbo-io","download_url":"https://codeload.github.com/tonbo-io/typed-arrow/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tonbo-io%2Ftyped-arrow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29153135,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-06T02:39:25.012Z","status":"ssl_error","status_checked_at":"2026-02-06T02:37:22.784Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrow","data","rust"],"created_at":"2026-01-08T14:14:24.954Z","updated_at":"2026-02-06T06:01:29.190Z","avatar_url":"https://github.com/tonbo-io.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# typed-arrow\n\n\u003cp align=\"left\"\u003e\n  \u003ca href=\"https://crates.io/crates/typed-arrow/\"\u003e\u003cimg src=\"https://img.shields.io/crates/v/typed-arrow.svg\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://docs.rs/typed-arrow\"\u003e\u003cimg src=\"https://img.shields.io/docsrs/typed-arrow\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/tonbo-io/tonbo/blob/main/LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/crates/l/tonbo\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://discord.gg/j27XVFVmJM\"\u003e\u003cimg src=\"https://img.shields.io/discord/1270294987355197460?logo=discord\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\ntyped-arrow provides a strongly typed, fully compile-time way to declare Arrow schemas in Rust. It maps Rust types directly to arrow-rs typed builders/arrays and `arrow_schema::DataType` — without any runtime `DataType` switching — enabling zero runtime cost, monomorphized column construction and ergonomic ORM-like APIs.\n\n📖 **[Read the full documentation on docs.rs](https://docs.rs/typed-arrow)**\n\n## Why compile-time Arrow?\n\n- Performance: monomorphized builders/arrays with zero dynamic dispatch; avoids runtime `DataType` matching.\n- Safety: column types, names, and nullability live in the type system; mismatches fail at compile time.\n- Interop: uses `arrow-array`/`arrow-schema` types directly; no bespoke runtime layer to learn.\n\n## Quick Start\n\n```rust\nuse typed_arrow::{prelude::*, schema::SchemaMeta};\nuse typed_arrow::{Dictionary, TimestampTz, Millisecond, Utc, List};\n\n#[derive(Record)]\nstruct Address { city: String, zip: Option\u003ci32\u003e }\n\n#[derive(Record)]\nstruct Person {\n    id: i64,\n    address: Option\u003cAddress\u003e,\n    tags: Option\u003cList\u003cOption\u003ci32\u003e\u003e\u003e,          // List column with nullable items\n    code: Option\u003cDictionary\u003ci32, String\u003e\u003e,    // Dictionary\u003ci32, Utf8\u003e\n    joined: TimestampTz\u003cMillisecond, Utc\u003e,    // Timestamp(ms) with timezone (UTC)\n}\n\nfn main() {\n    // Build from owned rows\n    let rows = vec![\n        Person {\n            id: 1,\n            address: Some(Address { city: \"NYC\".into(), zip: None }),\n            tags: Some(List::new(vec![Some(1), None, Some(3)])),\n            code: Some(Dictionary::new(\"gold\".into())),\n            joined: TimestampTz::\u003cMillisecond, Utc\u003e::new(1_700_000_000_000),\n        },\n        Person {\n            id: 2,\n            address: None,\n            tags: None,\n            code: None,\n            joined: TimestampTz::\u003cMillisecond, Utc\u003e::new(1_700_000_100_000),\n        },\n    ];\n\n    let mut b = \u003cPerson as BuildRows\u003e::new_builders(rows.len());\n    b.append_rows(rows);\n    let arrays = b.finish();\n\n    // Compile-time schema + RecordBatch\n    let batch = arrays.into_record_batch();\n    assert_eq!(batch.schema().fields().len(), \u003cPerson as Record\u003e::LEN);\n    println!(\"rows={}, field0={}\", batch.num_rows(), batch.schema().field(0).name());\n}\n```\n\nAdd to your `Cargo.toml` (derives enabled by default):\n\n```toml\n[dependencies]\ntyped-arrow = { version = \"0.x\" }\n\n# Enable zero-copy views for reading RecordBatch data\ntyped-arrow = { version = \"0.x\", features = [\"views\"] }\n\n# Choose Arrow major version (default is arrow-57)\ntyped-arrow = { version = \"0.x\", default-features = false, features = [\"arrow-56\", \"derive\", \"views\"] }\n```\n\nWhen working in this repository/workspace:\n\n```toml\n[dependencies]\ntyped-arrow = { path = \".\" }\n\n# With views feature\ntyped-arrow = { path = \".\", features = [\"views\"] }\n```\n\n## Examples\n\nRun the included examples to see end-to-end usage:\n\n- `01_primitives` — derive `Record`, inspect `DataType`, build primitives\n- `02_lists` — `List\u003cT\u003e` and `List\u003cOption\u003cT\u003e\u003e`\n- `03_dictionary` — `Dictionary\u003cK, String\u003e`\n- `04_timestamps` — `Timestamp\u003cU\u003e` units\n- `04b_timestamps_tz` — `TimestampTz\u003cU, Z\u003e` with `Utc` and custom markers\n- `05_structs` — nested structs → `StructArray`\n- `06_rows_flat` — row-based building for flat records\n- `07_rows_nested` — row-based building with nested struct fields\n- `08_record_batch` — compile-time schema + `RecordBatch`\n- `09_duration_interval` — Duration and Interval types\n- `10_union` — Dense Union as a Record column (with attributes)\n- `11_map` — Map (incl. `Option\u003cV\u003e` values) + as a Record column\n- `12_ext_hooks` — Extend `#[derive(Record)]` with visitor injection and macro callbacks\n- `13_record_batch_views` — Zero-copy views over `RecordBatch` rows (requires `views` feature)\n\nRun:\n\n```bash\ncargo run --example 08_record_batch\n```\n\n## Development\n\n- Enable the repo hook: `git config core.hooksPath .githooks`\n- Pre-commit runs the Arrow version test matrix:\n  - `cargo test -q`\n  - `cargo test -q -p typed-arrow --no-default-features --features arrow-56,derive,views`\n  - `cargo test -q -p typed-arrow-dyn --no-default-features --features arrow-55`\n\n## Core Concepts\n\n- `Record`: implemented by the derive macro for structs with named fields.\n- `ColAt\u003cI\u003e`: per-column associated items `Rust`, `ColumnBuilder`, `ColumnArray`, `NULLABLE`, `NAME`, and `data_type()`.\n- `ArrowBinding`: compile-time mapping from a Rust value type to its Arrow builder, array, and `DataType`.\n- `BuildRows`: derive generates `\u003cType\u003eBuilders` and `\u003cType\u003eArrays` with `append_row(s)` and `finish`.\n- `SchemaMeta`: derive provides `fields()` and `schema()`; arrays structs provide `into_record_batch()`.\n- `AppendStruct` and `StructMeta`: enable nested struct fields and `StructArray` building.\n\n## Reading Data (Views Feature)\n\nWhen the `views` feature is enabled, typed-arrow automatically generates zero-copy view types for reading `RecordBatch` data without cloning or allocation. For each `#[derive(Record)]` struct, the macro generates:\n\n- `{Name}View\u003c'a\u003e` — A struct with borrowed references to row data\n- `{Name}Views\u003c'a\u003e` — An iterator yielding `Result\u003c{Name}View\u003c'a\u003e, ViewAccessError\u003e`\n- `impl TryFrom\u003c{Name}View\u003c'_\u003e\u003e for {Name}` for each record type with `Error = ViewAccessError`, making conversion composable and allowing proper error propagation when accessing nested structures.\n\n### Zero-Copy Reading\n\n```rust\nuse typed_arrow::prelude::*;\n\n#[derive(Record)]\nstruct Product {\n    id: i64,\n    name: String,\n    price: f64,\n}\n\n// Build a RecordBatch\nlet rows = vec![\n    Product { id: 1, name: \"Widget\".into(), price: 9.99 },\n    Product { id: 2, name: \"Gadget\".into(), price: 19.99 },\n];\nlet mut b = \u003cProduct as BuildRows\u003e::new_builders(rows.len());\nb.append_rows(rows);\nlet batch = b.finish().into_record_batch();\n\n// Read with zero-copy views\nlet views = batch.iter_views::\u003cProduct\u003e()?;\nfor view in views.try_flatten()? {\n    // view.name is \u0026str, view.id and view.price are copied primitives\n    println!(\"{}: ${}\", view.name, view.price);\n}\n```\n\n\n### Converting Views to Owned Records\n\nViews provide zero-copy access to RecordBatch data, but sometimes you need to store data beyond the batch's lifetime. Use `.try_into()` to convert views into owned records:\n\n```rust\nlet views = batch.iter_views::\u003cProduct\u003e()?;\nlet mut owned_products = Vec::new();\n\nfor view in views.try_flatten()? {\n    // view.name is \u0026str (borrowed)\n    // view.id and view.price are i64/f64 (copied)\n\n    if view.price \u003e 100.0 {\n        // Convert to owned using .try_into()?\n        let owned: Product = view.try_into()?;\n        owned_products.push(owned);  // Can store beyond batch lifetime\n    }\n}\n```\n\n### Metadata (Compile-time)\n\n- Schema-level: annotate with `#[schema_metadata(k = \"owner\", v = \"data\")]`.\n- Field-level: annotate with `#[metadata(k = \"pii\", v = \"email\")]`.\n- You can repeat attributes to add multiple pairs; later duplicates win.\n\n### Field Name Override\n\nOverride the Arrow field name while keeping a different Rust field name:\n\n```rust\n#[derive(Record)]\nstruct Event {\n    #[record(name = \"eventType\")]\n    event_type: String,      // Arrow field name: \"eventType\"\n    #[record(name = \"userID\")]\n    user_id: i64,            // Arrow field name: \"userID\"\n    timestamp: i64,          // Arrow field name: \"timestamp\" (unchanged)\n}\n```\n\nThis is useful for:\n- Matching external schema conventions (e.g., camelCase, PascalCase)\n- Interoperability with other systems that expect specific field names\n- Using Rust naming conventions internally while exposing different names in Arrow\n\n### Nested Type Wrappers\n\n- Struct fields: struct-typed fields map to Arrow `Struct` columns by default. Make the parent field nullable with `Option\u003cNested\u003e`; child nullability is independent.\n- Lists: `List\u003cT\u003e` (items non-null) and `List\u003cOption\u003cT\u003e\u003e` (items nullable). Use `Option\u003cList\u003c_\u003e\u003e` for list-level nulls.\n- LargeList: `LargeList\u003cT\u003e` and `LargeList\u003cOption\u003cT\u003e\u003e` for 64-bit offsets; wrap with `Option\u003c_\u003e` for column nulls.\n- FixedSizeList: `FixedSizeList\u003cT, N\u003e` (items non-null) and `FixedSizeListNullable\u003cT, N\u003e` (items nullable). Wrap with `Option\u003c_\u003e` for list-level nulls.\n- Map: `Map\u003cK, V, const SORTED: bool = false\u003e` where keys are non-null; use `Map\u003cK, Option\u003cV\u003e\u003e` to allow nullable values. Column nullability via `Option\u003cMap\u003c...\u003e\u003e`. `SORTED` sets `keys_sorted` in the Arrow `DataType`.\n- OrderedMap: `OrderedMap\u003cK, V\u003e` uses `BTreeMap\u003cK, V\u003e` and declares `keys_sorted = true`.\n- Dictionary: `Dictionary\u003cK, V\u003e` with integral keys `K ∈ { i8, i16, i32, i64, u8, u16, u32, u64 }` and values:\n  - `String`/`LargeUtf8` (Utf8/LargeUtf8)\n  - `Vec\u003cu8\u003e`/`LargeBinary` (Binary/LargeBinary)\n  - `[u8; N]` (FixedSizeBinary)\n  - primitives `i*`, `u*`, `f32`, `f64`\n  Column nullability via `Option\u003cDictionary\u003c..\u003e\u003e`.\n- Timestamps: `Timestamp\u003cU\u003e` (unit-only) and `TimestampTz\u003cU, Z\u003e` (unit + timezone). Units: `Second`, `Millisecond`, `Microsecond`, `Nanosecond`. Use `Utc` or define your own `Z: TimeZoneSpec`.\n- Decimals: `Decimal128\u003cP, S\u003e` and `Decimal256\u003cP, S\u003e` (precision `P`, scale `S` as const generics).\n- Unions: `#[derive(Union)]` for enums with `#[union(mode = \"dense\"|\"sparse\")]`, per-variant `#[union(tag = N)]`, `#[union(field = \"name\")]`, and optional null carrier `#[union(null)]` or container-level `null_variant = \"Var\"`.\n\n## Arrow DataType Coverage\n\nSupported (arrow-rs v55/v56/v57 via `arrow-55`/`arrow-56`/`arrow-57` features):\n\n- Primitives: Int8/16/32/64, UInt8/16/32/64, Float16/32/64, Boolean\n- Strings/Binary: Utf8, LargeUtf8, Binary, LargeBinary, FixedSizeBinary (via `[u8; N]`)\n- Temporal: Timestamp (with/without TZ; s/ms/us/ns), Date32/64, Time32(s/ms), Time64(us/ns), Duration(s/ms/us/ns), Interval(YearMonth/DayTime/MonthDayNano)\n- Decimal: Decimal128, Decimal256 (const generic precision/scale)\n- Nested:\n  - List (including nullable items), LargeList, FixedSizeList (nullable/non-null items)\n  - Struct,\n  - Map (Vec\u003c(K,V)\u003e; use `Option\u003cV\u003e` for nullable values), OrderedMap (BTreeMap\u003cK,V\u003e) with `keys_sorted = true`\n  - Union: Dense and Sparse (via `#[derive(Union)]` on enums)\n  - Dictionary: keys = all integral types; values = Utf8 (String), LargeUtf8, Binary (Vec\u003cu8\u003e), LargeBinary, FixedSizeBinary (`[u8; N]`), primitives (i*, u*, f32, f64)\n\nMissing:\n\n- BinaryView, Utf8View\n- Utf8View\n- ListView, LargeListView\n- RunEndEncoded\n\n## Extensibility\n\n- Derive extension hooks allow user-level customization without changing the core derive:\n  - Inject compile-time visitors: `#[record(visit(MyVisitor))]`\n  - Call your macros per field/record: `#[record(field_macro = my_ext::per_field, record_macro = my_ext::per_record)]`\n  - Tag fields/records with free-form markers: `#[record(ext(key))]`\n- See `docs/extensibility.md` and the runnable example `examples/12_ext_hooks.rs`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftonbo-io%2Ftyped-arrow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftonbo-io%2Ftyped-arrow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftonbo-io%2Ftyped-arrow/lists"}