https://github.com/duneanalytics/arrow_struct

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/duneanalytics/arrow_struct
Owner: duneanalytics
Created: 2024-09-04T16:39:42.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-12-14T20:48:47.000Z (over 1 year ago)
Last Synced: 2025-06-10T10:53:12.522Z (about 1 year ago)
Language: Rust
Size: 37.1 KB
Stars: 2
Watchers: 8
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # TODO

* Benchmark

    * serde_arrow

    * arrow2-construct

* Configurable column cases with attributes

* Pick a better name

* Add more convenient interface for converting record batches

# Usage

## RecordBatch vs. StructArray

## Option vs non-Option

Unless you have a lot of trust in your data, prefer to use `Option` for all struct fields (i.e., `struct Struct { field: Option }` over `struct Struct { field: i32 }`),

except for nested structs. Arrow does not enforce not-null constraints in RecordBatches. That is, the schema can claim that it's not-null, while in fact the data is null.

We will panic if we encounter a null field for a not-Option column.

# Performance tips for deserialization

## Zero-copy

If you can, you should prefer to use references for non-primitive types (i.e., `&str` instead of `String`, `&[u8]` instead of `Bytes`).

This avoids clones.

## Avoid Arrow lists

If you can, you should prefer to avoid using Arrow lists.

Even if we are careful when deserializing lists, we create a vector for every row with a non-null list.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/duneanalytics/arrow_struct

Awesome Lists containing this project

README