An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with apache-arrow

A curated list of projects in awesome lists tagged with apache-arrow .

https://github.com/aws/aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

amazon-athena amazon-sagemaker-notebook apache-arrow apache-parquet athena aws aws-glue aws-lambda data-engineering data-science emr etl glue-catalog lambda modin mysql pandas python ray redshift

Last synced: 12 May 2025

https://github.com/polarsignals/frostdb

❄️ Coolest database around 🧊 Embeddable column database written in Go.

apache-arrow apache-parquet columnar-storage database golang

Last synced: 18 Jun 2025

https://github.com/geopolars/geopolars

Geospatial extensions for Polars

apache-arrow geospatial polars pyo3 python rust

Last synced: 11 May 2025

https://github.com/unum-cloud/ustore

Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings for C 99, C++ 17, Python 3, Java, GoLang 🗄️

acid apache-arrow arrow big-data bigdata database dataloader document-database graph-database iouring json key-value-store knn-search networkx nosql pandas python search spdk vector-search

Last synced: 11 Apr 2025

https://github.com/kylebarron/parquet-wasm

Rust-based WebAssembly bindings to read and write Apache Parquet data

apache-arrow apache-parquet arrow javascript parquet rust wasm webassembly

Last synced: 14 May 2025

https://github.com/geoarrow/geoarrow

Specification for storing geospatial data in Apache Arrow

apache-arrow geoarrow geospatial

Last synced: 28 Jan 2026

https://github.com/geoarrow/geoarrow-rs

GeoArrow in Rust, Python, and JavaScript (WebAssembly) with vectorized geometry operations

apache-arrow geoarrow geoparquet geospatial javascript pyo3 python rust typescript wasm-bindgen webassembly

Last synced: 15 May 2025

https://github.com/apache/arrow-julia

Official Julia implementation of Apache Arrow

apache-arrow julia

Last synced: 17 Jan 2026

https://github.com/cldellow/sqlite-parquet-vtable

A SQLite vtable extension to read Parquet files

apache-arrow apache-parquet parquet sqlite sqlite3

Last synced: 09 Apr 2025

https://github.com/abs-tudelft/fletcher

Fletcher: A framework to integrate FPGA accelerators with Apache Arrow

accelerators apache-arrow arrow fletcher fpga

Last synced: 30 Dec 2025

https://github.com/scikit-hep/awkward-0.x

Manipulate arrays of complex data structures as easily as Numpy.

analysis apache-arrow arrow big-data columnar columnar-storage hdf5 numpy parquet python python3 root root-cern scikit-hep

Last synced: 02 Oct 2025

https://github.com/g-research/parquetsharp

ParquetSharp is a .NET library for reading and writing Apache Parquet files.

apache-arrow apache-parquet big-data columnar-storage csharp dotnet parquet

Last synced: 14 May 2025

https://github.com/apache/arrow-go

Official Go implementation of Apache Arrow

apache-arrow go

Last synced: 13 Apr 2025

https://github.com/nanoporetech/pod5-file-format

Pod5: a high performance file format for nanopore reads.

apache-arrow file-format nanopore

Last synced: 15 May 2025

https://github.com/mattf96s/quackdb

Open-source in-browser DuckDB SQL editor

apache-arrow comlink duckdb duckdb-wasm parquet remix remix-run shadcn sql sst

Last synced: 16 Apr 2025

https://github.com/man-group/sparrow

C++20 idiomatic APIs for the Apache Arrow Columnar Format

apache apache-arrow cpp20

Last synced: 23 Jan 2026

https://github.com/kylebarron/arrow-js-ffi

Zero-copy reading of Arrow data from WebAssembly

apache-arrow javascript typescript wasm webassembly

Last synced: 07 Apr 2025

https://github.com/mongodb-labs/mongo-arrow

MongoDB integrations for Apache Arrow. Export MongoDB documents to numpy array, parquet files, and pandas dataframes in one line of code.

apache-arrow arrow mongodb numpy-arrays pandas-dataframe parquet-files python

Last synced: 16 May 2025

https://github.com/igor-suhorukov/openstreetmap_h3

OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps

apach-sedona apache-arrow apache-spark arrow citusdb column-store converter duckdb geometry-processing geospatial java openstreetmap parquet parquet-files pbf pbf-format postgis postgresql world

Last synced: 05 Oct 2025

https://github.com/columnar-tech/dbc

dbc is a command-line tool for installing and managing ADBC drivers

adbc apache apache-arrow cli database-connector open-source

Last synced: 10 Feb 2026

https://github.com/abdenlab/oxbow

Oxbow makes genomic data accessible for high-performance analytics.

apache-arrow bioinformatics data-science dataframe fair-data genomics multiomics ngs pandas polars python r rust-lang

Last synced: 19 Jun 2025

https://github.com/red-data-tools/red_amber

A dataframe library for Rubyists.

apache-arrow dataframe dataframe-library dataframes ruby

Last synced: 11 Oct 2025

https://github.com/duo-rs/duo

A lightweight Logging and Tracing observability solution for Rust, built with Apache Arrow, Apache Parquet and Apache DataFusion.

apache-arrow apache-parquet datafusion logging observability rust tracing

Last synced: 29 Mar 2025

https://github.com/cldellow/csv2parquet

Convert a CSV to a parquet file.

apache-arrow apache-parquet csv parquet

Last synced: 21 Aug 2025

https://github.com/mattf96s/QuackDB

Open-source in-browser DuckDB SQL editor

apache-arrow comlink duckdb duckdb-wasm parquet remix remix-run shadcn sql sst

Last synced: 14 May 2025

https://github.com/apache/arrow-java

Official Java implementation of Apache Arrow

apache-arrow java

Last synced: 15 May 2025

https://github.com/tradewelltech/beavers

Python stream processing for analytics

analytics apache-arrow data kafka pandas python realtime stream-processing

Last synced: 14 Jan 2026

https://github.com/influxdata/flightsql-dbapi

DB API 2 interface for Flight SQL with SQLAlchemy extras.

apache-arrow dbapi2 flight-sql python sqlalchemy

Last synced: 22 Apr 2025

https://github.com/elixir-explorer/adbc

Apache Arrow ADBC bindings for Elixir

apache-arrow database elixir postgresql snowflake sqlite

Last synced: 11 Apr 2025

https://github.com/columnar-tech/adbc-quickstarts

Simple examples showing how to use ADBC with various databases, query engines, and data platforms

apache-arrow apache-arrow-adbc

Last synced: 28 Jan 2026

https://github.com/tradewelltech/protarrow

Convert from protobuf to arrow and back

apache-arrow data protobuf python

Last synced: 16 Jan 2026

https://github.com/geoarrow/geoarrow-js

TypeScript implementation of GeoArrow

apache-arrow geoarrow typescript

Last synced: 12 May 2025

https://github.com/extendr/arrow-extendr

Integration between arrow-rs and extendr

apache-arrow ffi rstats rust

Last synced: 05 Aug 2025

https://github.com/datafusion-contrib/datafusion-c

C language bindings for DataFusion

apache-arrow c datafusion glib sql

Last synced: 25 Jan 2026

https://github.com/josiahparry/arrow-extendr

Integration between arrow-rs and extendr

apache-arrow ffi rstats rust

Last synced: 05 Jul 2025

https://github.com/kylebarron/arrow-wasm

Building block library for using Apache Arrow in Rust WebAssembly modules.

apache-arrow javascript rust wasm-bindgen webassembly

Last synced: 25 Oct 2025

https://github.com/mbrobbel/narrow

An experimental (work-in-progress) statically typed implementation of Apache Arrow

apache-arrow

Last synced: 07 Apr 2025

https://github.com/graphext/lector

A fast reader for messy CSV files with optional type inference.

apache-arrow csv data-types parser python type-inference

Last synced: 16 Jan 2026

https://github.com/rpy2/rpy2-arrow

Share Apache Arrow datasets between Python and R.

apache-arrow arrow python r rpy2

Last synced: 23 Apr 2025

https://github.com/kszucs/firebolt

Arrow implementation in Mojo

apache-arrow mojo-lang

Last synced: 17 Mar 2025

https://github.com/amoeba/qlarrow

WIP QuickLook plugin for Apache Arrow and Parquet

apache-arrow golang macos parquet quicklook

Last synced: 13 May 2025

https://github.com/cldellow/parquet-metadata

Dump metadata about a Parquet file.

apache-arrow apache-parquet parquet

Last synced: 07 May 2025

https://github.com/kylebarron/arro3

A minimal Python library for Apache Arrow, connecting to the Rust arrow crate

apache-arrow pyo3 python rust

Last synced: 06 Sep 2025

https://github.com/arkady-emelyanov/pyarrow-flight

Apache Arrow Flight example

apache-arrow arrow-flight pandas

Last synced: 06 Jul 2025

https://github.com/unum-cloud/udsb

Unlimited Data-Science Benchmarks for Numeric, Tabular and Graph Workloads

apache-arrow arrow cublas cudf cugraph dask modin networkx numpy pandas sqlite

Last synced: 26 Jun 2025

https://github.com/ljishen/bitar

Simplify accessing hardware compression/decompression accelerators

apache-arrow compression cpp dpdk hardware-acceleration

Last synced: 01 Feb 2026

https://github.com/cpg314/polarhouse

Interoperability between Polars and Clickhouse

apache-arrow clickhouse polars rust

Last synced: 27 Sep 2025

https://github.com/poopoothegorilla/fastframe

DataFrame project that utilizes Apache Arrow

apache-arrow data-science dataframe golang

Last synced: 12 Jun 2025

https://github.com/rupurt/zodbc

A blazing fast ODBC Zig client

apache-arrow odbc performance zig

Last synced: 06 May 2025

https://github.com/apache/arrow-swift

Official Swift implementation of Apache Arrow

apache-arrow swift

Last synced: 20 Jun 2025

https://github.com/amoeba/arrow-python-js-ipc-example

Example showing how to send Arrow RecordBatches from a Python backend to a web browser.

apache-arrow javascript python

Last synced: 05 Sep 2025

https://github.com/lykmapipo/nyc-tlc-trip-data

Python scripts to download, process, and analyze the New York City Taxi and Limousine Commission (TLC) Trip Record Data dataset

apache-arrow apache-spark data data-engineering data-extraction data-transformation etl fsspec geopandas joblib jupyterlab lykmapipo metadata nyc nyc-taxi-dataset pandas pyarrow python s3

Last synced: 17 Sep 2025

https://github.com/spaghettifunk/norman

Realtime distributed OLAP datastore, designed to answer OLAP queries with low latency written in Go. In Active development

apache-arrow apache-parquet golang olap realtime streaming

Last synced: 14 Apr 2025

https://github.com/pachadotdev/tradestatistics-database-postgresql

Tidy trade data from UN COMTRADE and also countries, commodities, RTAs and tariffs tables. Uses RDS and Apache Arrow, then uploads to PostgreSQL.

apache-arrow comtrade postgresql r trade

Last synced: 20 Mar 2025

https://github.com/roeap/flight-sql-client-node

A Flight SQL client for Node.js

apache-arrow arrow-flight nodejs

Last synced: 12 Apr 2025

https://github.com/marwan116/aws-parquet

a toolkit that provides an object-oriented interface for working with parquet datasets on AWS

amazon-athena apache-arrow apache-parquet athena aws aws-glue data-engineering data-science etl glue-catalog pandas python

Last synced: 12 Jun 2025

https://github.com/apache/arrow-dotnet

Official .NET implementation of Apache Arrow

apache-arrow csharp dotnet

Last synced: 15 Oct 2025

https://github.com/pachadotdev/tradestatistics-plumber-api

tradestatistics.io API, reads from PostgreSQL and provides tidy CSV and Apache Arrow data

apache-arrow api csv plumber-api postgresql r trade

Last synced: 20 Mar 2025

https://github.com/ashvardanian/stringtape

Apache Arrow-compatible space-efficient strings tape class in pure Rust to be used with StringZilla

apache-arrow arrow pyarrow string-manipulation tape

Last synced: 14 Sep 2025

https://github.com/wilhelmagren/falkorflight

Apache Arrow Flight server for OpenCypher queries to FalkorDB.

apache-arrow apache-arrow-flight arrow falkordb graph-database python

Last synced: 20 Jul 2025

https://github.com/matsadler/bishop

Query MongoDB via Apache Arrow and DataFusion

apache-arrow datafusion mongodb

Last synced: 04 Mar 2025

https://github.com/droher/diachronic

Get daily historical snapshots of every article on any Wiki, formatted as Parquet files

apache-arrow google-cloud terraform wikimedia wikipedia

Last synced: 06 Jul 2025

https://github.com/tiwater/rerun-query

Query and extract entity data from Rerun data files.

apache-arrow pyo3 rerun

Last synced: 11 Apr 2025

https://github.com/roeap/adx-arrow

Kusto client library optimized for data science workloads

apache-arrow arrow azure azure-data-explorer kusto python rust

Last synced: 23 Mar 2025

https://github.com/amoeba/arrow-opentelemetry-example

Example of using OpenTelemetry and Apache Arrow

apache-arrow cpp open-telemetry

Last synced: 08 Feb 2026

https://github.com/joewood/react-iceberg

React Components to visualize Apache Iceberg tables

apache-arrow apache-iceberg apache-spark avro devcontainer docker-compose minio reactjs s3

Last synced: 31 Dec 2025

https://github.com/neo4j-field/dataflow-flex-pyarrow-to-gds

Google Dataflow Flex Templates (in Python) for large scale Graph Loading with GDS and Apache Arrow

apache-arrow apache-beam bigquery dataflow neo4j python

Last synced: 09 Apr 2025

https://github.com/amoeba/arrow-flight-playground

Various examples related to Apache Arrow Flight.

apache-arrow arrow-flight cpp

Last synced: 21 Feb 2025

https://github.com/voutilad/redpanda-flight-rs

An Apache Arrow Flight proxy for Redpanda

apache-arrow kafka redpanda

Last synced: 01 Dec 2025

https://github.com/amoeba/arrow-cpp-conan-example

Example using conan to package and use libarrow

apache-arrow conan conan-io cpp

Last synced: 28 Jan 2026

https://github.com/amoeba/arrow-gcs-test

Short example showing how to use GCS with Arrow C++

apache-arrow cplusplus google-cloud-storage

Last synced: 21 Feb 2025

https://github.com/ippras/metadata

Metadata for Apache Arrow IPC format

apache-arrow ipc metadata polars

Last synced: 03 Nov 2025

https://github.com/isedwardtang/rheem_arrow

rheem use arrow

apache-arrow rheem

Last synced: 14 Apr 2025

https://github.com/adbc-drivers/adbc-drivers.org

ADBC Driver Foundry Website

apache-arrow apache-arrow-adbc

Last synced: 20 Jan 2026

https://github.com/amoeba/arrow-cmake-fetchcontent

Minimal example of including Arrow in a C++ project using CMake and FetchContent

apache-arrow cmake cpp

Last synced: 08 Sep 2025

https://github.com/amoeba/arrow-cpp-wasm

Playing around with Arrow C++ and WASM, see Website for demo

apache-arrow cplusplus emscripten wasm

Last synced: 21 Feb 2025

https://github.com/amoeba/arrow-pybind11-example

Minimal example of passing Arrow objects from Python to a C++ extension

apache-arrow

Last synced: 01 Mar 2025

https://github.com/amoeba/pyarrow-ipc-example

An example showing how to send compressed RecordBatches over HTTP with PyArrow.

apache-arrow pyarrow

Last synced: 21 Feb 2025

https://github.com/yoa95/mini-data-cloud

☁️ Build a containerized distributed data processing system to explore cloud database concepts through hands-on implementation and robust SQL support.

apache-arrow apache-calcite cloud cloud-database cloud-security cloud-storage-services columnar-storage containerization database-architecture distributed-systems docker grpc learning-platform online-storage parquet query-engine spring-boot sql-engine

Last synced: 07 Nov 2025

https://github.com/makcymal/arrow-view

CLI preview of Apache Arrow files

apache-arrow cli

Last synced: 01 Apr 2025

https://github.com/mridang/athena-mongodb

MongoDB connector for AWS Athena Federation

apache-arrow athena aws lambda mongodb trino

Last synced: 15 Oct 2025

https://github.com/imperial-genomics-facility/limsmetadataparsing

A pyspark based codebase for fetching and formatting metadata from a LIMS db for IGF

apache-arrow apache-spark pandas pyodbc python-3-6 sparksql

Last synced: 29 Oct 2025