https://github.com/cjermain/capnp2arrow
Cap'N Proto to Arrow data transfer
https://github.com/cjermain/capnp2arrow
arrow arrow2 capnp capnproto polars rust
Last synced: 11 months ago
JSON representation
Cap'N Proto to Arrow data transfer
- Host: GitHub
- URL: https://github.com/cjermain/capnp2arrow
- Owner: cjermain
- License: mit
- Created: 2023-05-29T00:35:00.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-04-15T19:21:30.000Z (about 2 years ago)
- Last Synced: 2025-06-01T08:33:42.738Z (about 1 year ago)
- Topics: arrow, arrow2, capnp, capnproto, polars, rust
- Language: Rust
- Homepage:
- Size: 37.1 KB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# capnp2arrow
This is a work-in-progress demonstration of reading a series of Cap'N Proto messages into Arrow. The dynamic value Reader is used to flexibly traverse arbitrary schemas, allowing the library to be schema-agnostic.
## Setup
```
rustup override set nightly
sudo apt install capnproto # install compiler
```
Generate an id: `capnp id`
## Demo
Create a JSON Lines file with a new-line separated list of points:
```
cat << EOF > points.jsonl
{"values": [{"x": 0, "y": 1}, {"x": -1, "y": 2}]}
{"values": [{"x": 0, "y": 0}]}
{"values": [{"x": -2, "y": 3}]}
EOF
```
Convert the JSONL to binary Cap'N Proto messages based on the schema:
```
cat points.jsonl | capnp convert json:binary ./src/schema/point.capnp Points > points.bin
```
Run the binary messages through the demo:
```
$ cat points.bin | cargo run
shape: (3, 1)
┌─────────────────────────┐
│ values │
│ --- │
│ list[struct[2]] │
╞═════════════════════════╡
│ [{0.0,1.0}, {-1.0,2.0}] │
│ [{0.0,0.0}] │
│ [{-2.0,3.0}] │
└─────────────────────────┘
```
## Tests
```
cargo test
```
The test schema is from the capnproto-rust repo:
```
wget -qO- https://raw.githubusercontent.com/capnproto/capnproto-rust/master/capnpc/test/test.capnp > tests/test.capnp
```
## Debug
```
rust-gdb -q target/debug/capnp2arrow
(gdb) b rust_panic
(gdb) r < points.bin
```
## References
1. Reflection based `Debug` implementation: https://github.com/capnproto/capnproto-rust/blob/f7c86befe11b27f33c2a45957d402abff2b9e347/capnp/src/stringify.rs
2. Reflection based example: https://github.com/capnproto/capnproto-rust/blob/master/example/fill_random_values/src/lib.rs
3. Cap'N Proto `TypeVariant`: https://docs.rs/capnp/latest/capnp/introspect/enum.TypeVariant.html
4. Arrow2 `DataTypes`: https://docs.rs/arrow2/latest/arrow2/datatypes/enum.DataType.html
5. Cap'N Proto Language Reference: https://capnproto.org/language.html
6. Cap'N Proto test schema: https://github.com/capnproto/capnproto/blob/master/c%2B%2B/src/capnp/test.capnp
7. Cap'N Proto test JSON: https://github.com/capnproto/capnproto/blob/master/c%2B%2B/src/capnp/testdata/pretty.json