https://github.com/itamarst/numba-arrow-research
Research and maybe experiments in adding Arrow support to Numba
https://github.com/itamarst/numba-arrow-research
Last synced: about 1 year ago
JSON representation
Research and maybe experiments in adding Arrow support to Numba
- Host: GitHub
- URL: https://github.com/itamarst/numba-arrow-research
- Owner: itamarst
- License: apache-2.0
- Created: 2025-02-18T14:57:18.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-11T15:56:30.000Z (over 1 year ago)
- Last Synced: 2025-03-11T16:40:35.256Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 23.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Research and maybe experiments in adding Arrow support to Numba
## Why?
* Numba only supports NumPy arrays out of the box.
* Lots of projects now use Arrow (Polars, Pandas, if used PyArrow directly, no doubt others).
* Numba is a nice way to write _fast_ extensions without switching languages.
* Current Numba usage involves converting to NumPy arrays and then back, which is a problem because it loses information about missing data.
## Existing attempts/conversations
[Apparently](https://numba.discourse.group/t/feature-request-about-supporting-arrow-in-numba/1668/2) the Awkward Array library uses the same data representation as Arrow for its columns, and can therefore convert to/from Arrow with zero-copy.
And Awkward Array has a Numba integration provided.
So this may just be a matter of documentation rather than coding.
Next step, then:
1. Validate that awkward array is indeed zero-copy from Arrow.
2. Play around with the Numba integration and see if it works.
3. In particular, do a proof-of-concept with Polars.
Docs: https://awkward-array.org/doc/main/user-guide/how-to-use-in-numba-features.html
## What I've learned so far
* Pandas is a required dependency for the Awkard Arrow integration to work.
Annoying but not the end of the world.
* Not necessarily much in the way of APIs to access?