Projects in Awesome Lists tagged with apache-arrow
A curated list of projects in awesome lists tagged with apache-arrow .
https://github.com/pixie-io/pixie
Instant Kubernetes-Native Application Observability
aks apache-arrow cloud-native cncf distributed-systems ebpf eks gke golang kubernetes machine-learning metrics minikube monitoring observability pandas pixie px px-run vega
Last synced: 14 May 2025
https://github.com/aws/aws-sdk-pandas
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
amazon-athena amazon-sagemaker-notebook apache-arrow apache-parquet athena aws aws-glue aws-lambda data-engineering data-science emr etl glue-catalog lambda modin mysql pandas python ray redshift
Last synced: 12 May 2025
https://github.com/polarsignals/frostdb
❄️ Coolest database around 🧊 Embeddable column database written in Go.
apache-arrow apache-parquet columnar-storage database golang
Last synced: 18 Jun 2025
https://github.com/scikit-hep/awkward
Manipulate JSON-like data with NumPy-like idioms.
apache-arrow cern-root columnar-format data-analysis jagged-array json numba numpy pandas python ragged-array rdataframe scikit-hep
Last synced: 14 May 2025
https://github.com/visgl/loaders.gl
Loaders for big data visualization. Website:
3d-tiles apache-arrow apache-parquet basis csv draco geoarrow geoparquet gltf i3s javascript las loaders nodejs obj ply streaming wkb wkt workers
Last synced: 14 Jul 2025
https://github.com/developmentseed/lonboard
A Python library for fast, interactive geospatial vector data visualization in Jupyter.
anywidget apache-arrow apache-parquet data-visualization deck-gl geoarrow geopandas geoparquet geospatial geospatial-analysis jupyter jupyter-widget lonboard map-visualization maps parquet python visualization webgl
Last synced: 14 May 2025
https://github.com/geopolars/geopolars
Geospatial extensions for Polars
apache-arrow geospatial polars pyo3 python rust
Last synced: 11 May 2025
https://github.com/unum-cloud/ustore
Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings for C 99, C++ 17, Python 3, Java, GoLang 🗄️
acid apache-arrow arrow big-data bigdata database dataloader document-database graph-database iouring json key-value-store knn-search networkx nosql pandas python search spdk vector-search
Last synced: 11 Apr 2025
https://github.com/kylebarron/parquet-wasm
Rust-based WebAssembly bindings to read and write Apache Parquet data
apache-arrow apache-parquet arrow javascript parquet rust wasm webassembly
Last synced: 14 May 2025
https://github.com/geoarrow/geoarrow
Specification for storing geospatial data in Apache Arrow
apache-arrow geoarrow geospatial
Last synced: 28 Jan 2026
https://github.com/geoarrow/geoarrow-rs
GeoArrow in Rust, Python, and JavaScript (WebAssembly) with vectorized geometry operations
apache-arrow geoarrow geoparquet geospatial javascript pyo3 python rust typescript wasm-bindgen webassembly
Last synced: 15 May 2025
https://github.com/apache/arrow-julia
Official Julia implementation of Apache Arrow
Last synced: 17 Jan 2026
https://github.com/gizmodata/gizmosql
🚀 GizmoSQL — High-Performance SQL Server
adbc apache-arrow apache-arrow-flight apache-arrow-flight-sql database databases duckdb gizmodata gizmosql ibis jdbc jwt-authentication pyarrow sql sqlalchemy sqlite sqlite3 tls
Last synced: 23 Jan 2026
https://github.com/cldellow/sqlite-parquet-vtable
A SQLite vtable extension to read Parquet files
apache-arrow apache-parquet parquet sqlite sqlite3
Last synced: 09 Apr 2025
https://github.com/abs-tudelft/fletcher
Fletcher: A framework to integrate FPGA accelerators with Apache Arrow
accelerators apache-arrow arrow fletcher fpga
Last synced: 30 Dec 2025
https://github.com/scikit-hep/awkward-0.x
Manipulate arrays of complex data structures as easily as Numpy.
analysis apache-arrow arrow big-data columnar columnar-storage hdf5 numpy parquet python python3 root root-cern scikit-hep
Last synced: 02 Oct 2025
https://github.com/g-research/parquetsharp
ParquetSharp is a .NET library for reading and writing Apache Parquet files.
apache-arrow apache-parquet big-data columnar-storage csharp dotnet parquet
Last synced: 14 May 2025
https://github.com/google/space
Unified storage framework for the entire machine learning lifecycle
apache-arrow apache-parquet data-warehouse dataops dataset dml lakehouse machine-learning mlops multimodal multimodal-data olap ray tensorflow tensorflow-dataset
Last synced: 14 Jan 2026
https://github.com/apache/arrow-go
Official Go implementation of Apache Arrow
Last synced: 13 Apr 2025
https://github.com/nanoporetech/pod5-file-format
Pod5: a high performance file format for nanopore reads.
apache-arrow file-format nanopore
Last synced: 15 May 2025
https://github.com/mattf96s/quackdb
Open-source in-browser DuckDB SQL editor
apache-arrow comlink duckdb duckdb-wasm parquet remix remix-run shadcn sql sst
Last synced: 16 Apr 2025
https://github.com/man-group/sparrow
C++20 idiomatic APIs for the Apache Arrow Columnar Format
Last synced: 23 Jan 2026
https://github.com/geoarrow/deck.gl-layers
deck.gl layers for rendering GeoArrow data
apache-arrow data-visualization deck-gl geoarrow geospatial map-visualization
Last synced: 12 May 2025
https://github.com/kylebarron/arrow-js-ffi
Zero-copy reading of Arrow data from WebAssembly
apache-arrow javascript typescript wasm webassembly
Last synced: 07 Apr 2025
https://github.com/mongodb-labs/mongo-arrow
MongoDB integrations for Apache Arrow. Export MongoDB documents to numpy array, parquet files, and pandas dataframes in one line of code.
apache-arrow arrow mongodb numpy-arrays pandas-dataframe parquet-files python
Last synced: 16 May 2025
https://github.com/igor-suhorukov/openstreetmap_h3
OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps
apach-sedona apache-arrow apache-spark arrow citusdb column-store converter duckdb geometry-processing geospatial java openstreetmap parquet parquet-files pbf pbf-format postgis postgresql world
Last synced: 05 Oct 2025
https://github.com/columnar-tech/dbc
dbc is a command-line tool for installing and managing ADBC drivers
adbc apache apache-arrow cli database-connector open-source
Last synced: 10 Feb 2026
https://github.com/abdenlab/oxbow
Oxbow makes genomic data accessible for high-performance analytics.
apache-arrow bioinformatics data-science dataframe fair-data genomics multiomics ngs pandas polars python r rust-lang
Last synced: 19 Jun 2025
https://github.com/red-data-tools/red_amber
A dataframe library for Rubyists.
apache-arrow dataframe dataframe-library dataframes ruby
Last synced: 11 Oct 2025
https://github.com/duo-rs/duo
A lightweight Logging and Tracing observability solution for Rust, built with Apache Arrow, Apache Parquet and Apache DataFusion.
apache-arrow apache-parquet datafusion logging observability rust tracing
Last synced: 29 Mar 2025
https://github.com/cldellow/csv2parquet
Convert a CSV to a parquet file.
apache-arrow apache-parquet csv parquet
Last synced: 21 Aug 2025
https://github.com/mattf96s/QuackDB
Open-source in-browser DuckDB SQL editor
apache-arrow comlink duckdb duckdb-wasm parquet remix remix-run shadcn sql sst
Last synced: 14 May 2025
https://github.com/apache/arrow-java
Official Java implementation of Apache Arrow
Last synced: 15 May 2025
https://github.com/tradewelltech/beavers
Python stream processing for analytics
analytics apache-arrow data kafka pandas python realtime stream-processing
Last synced: 14 Jan 2026
https://github.com/influxdata/flightsql-dbapi
DB API 2 interface for Flight SQL with SQLAlchemy extras.
apache-arrow dbapi2 flight-sql python sqlalchemy
Last synced: 22 Apr 2025
https://github.com/elixir-explorer/adbc
Apache Arrow ADBC bindings for Elixir
apache-arrow database elixir postgresql snowflake sqlite
Last synced: 11 Apr 2025
https://github.com/columnar-tech/adbc-quickstarts
Simple examples showing how to use ADBC with various databases, query engines, and data platforms
apache-arrow apache-arrow-adbc
Last synced: 28 Jan 2026
https://github.com/tradewelltech/protarrow
Convert from protobuf to arrow and back
apache-arrow data protobuf python
Last synced: 16 Jan 2026
https://github.com/geoarrow/geoarrow-js
TypeScript implementation of GeoArrow
apache-arrow geoarrow typescript
Last synced: 12 May 2025
https://github.com/extendr/arrow-extendr
Integration between arrow-rs and extendr
Last synced: 05 Aug 2025
https://github.com/webysther/aws-glue-docker
🐋 Docker image for AWS Glue Spark/Python
apache-arrow aws aws-cli aws-glue aws-glue-docker cdk data-engineering development docker docker-image dockerfile etl glue-catalog glue-pyspark pandas pytest python python-poetry sam spark
Last synced: 05 May 2025
https://github.com/grouzen/zio-apache-arrow
Scala ZIO-powered Apache Arrow library
apache-arrow arrow arrow-datafusion big-data bigdata datafusion scala zio zio-streams zio2
Last synced: 24 Dec 2025
https://github.com/datafusion-contrib/datafusion-c
C language bindings for DataFusion
apache-arrow c datafusion glib sql
Last synced: 25 Jan 2026
https://github.com/josiahparry/arrow-extendr
Integration between arrow-rs and extendr
Last synced: 05 Jul 2025
https://github.com/kylebarron/arrow-wasm
Building block library for using Apache Arrow in Rust WebAssembly modules.
apache-arrow javascript rust wasm-bindgen webassembly
Last synced: 25 Oct 2025
https://github.com/mbrobbel/narrow
An experimental (work-in-progress) statically typed implementation of Apache Arrow
Last synced: 07 Apr 2025
https://github.com/graphext/lector
A fast reader for messy CSV files with optional type inference.
apache-arrow csv data-types parser python type-inference
Last synced: 16 Jan 2026
https://github.com/rpy2/rpy2-arrow
Share Apache Arrow datasets between Python and R.
apache-arrow arrow python r rpy2
Last synced: 23 Apr 2025
https://github.com/amoeba/qlarrow
WIP QuickLook plugin for Apache Arrow and Parquet
apache-arrow golang macos parquet quicklook
Last synced: 13 May 2025
https://github.com/cldellow/parquet-metadata
Dump metadata about a Parquet file.
apache-arrow apache-parquet parquet
Last synced: 07 May 2025
https://github.com/desdaemon/polars_dart
Dart bindings for the polars library
apache-arrow dart data-science ffi flutter flutter-rust-bridge polars rust
Last synced: 19 Apr 2025
https://github.com/kylebarron/arro3
A minimal Python library for Apache Arrow, connecting to the Rust arrow crate
Last synced: 06 Sep 2025
https://github.com/arkady-emelyanov/pyarrow-flight
Apache Arrow Flight example
apache-arrow arrow-flight pandas
Last synced: 06 Jul 2025
https://github.com/ljishen/bitar
Simplify accessing hardware compression/decompression accelerators
apache-arrow compression cpp dpdk hardware-acceleration
Last synced: 01 Feb 2026
https://github.com/cpg314/polarhouse
Interoperability between Polars and Clickhouse
apache-arrow clickhouse polars rust
Last synced: 27 Sep 2025
https://github.com/poopoothegorilla/fastframe
DataFrame project that utilizes Apache Arrow
apache-arrow data-science dataframe golang
Last synced: 12 Jun 2025
https://github.com/rupurt/zodbc
A blazing fast ODBC Zig client
apache-arrow odbc performance zig
Last synced: 06 May 2025
https://github.com/lykmapipo/python-spark-log-analysis
Python scripts to process, and analyze log files using PySpark.
apache-arrow apache-spark apache-spark-sql data-analysis data-extraction data-processing data-transformation log-analysis log-analyzer log-monitor lykmapipo pandas pyarrow pyspark python seaborn spark-ml spark-nlp sparkml-pipelines sql
Last synced: 22 Jun 2025
https://github.com/apache/arrow-swift
Official Swift implementation of Apache Arrow
Last synced: 20 Jun 2025
https://github.com/amoeba/arrow-python-js-ipc-example
Example showing how to send Arrow RecordBatches from a Python backend to a web browser.
apache-arrow javascript python
Last synced: 05 Sep 2025
https://github.com/lykmapipo/nyc-tlc-trip-data
Python scripts to download, process, and analyze the New York City Taxi and Limousine Commission (TLC) Trip Record Data dataset
apache-arrow apache-spark data data-engineering data-extraction data-transformation etl fsspec geopandas joblib jupyterlab lykmapipo metadata nyc nyc-taxi-dataset pandas pyarrow python s3
Last synced: 17 Sep 2025
https://github.com/spaghettifunk/norman
Realtime distributed OLAP datastore, designed to answer OLAP queries with low latency written in Go. In Active development
apache-arrow apache-parquet golang olap realtime streaming
Last synced: 14 Apr 2025
https://github.com/pachadotdev/tradestatistics-database-postgresql
Tidy trade data from UN COMTRADE and also countries, commodities, RTAs and tariffs tables. Uses RDS and Apache Arrow, then uploads to PostgreSQL.
apache-arrow comtrade postgresql r trade
Last synced: 20 Mar 2025
https://github.com/roeap/flight-sql-client-node
A Flight SQL client for Node.js
apache-arrow arrow-flight nodejs
Last synced: 12 Apr 2025
https://github.com/marwan116/aws-parquet
a toolkit that provides an object-oriented interface for working with parquet datasets on AWS
amazon-athena apache-arrow apache-parquet athena aws aws-glue data-engineering data-science etl glue-catalog pandas python
Last synced: 12 Jun 2025
https://github.com/apache/arrow-dotnet
Official .NET implementation of Apache Arrow
Last synced: 15 Oct 2025
https://github.com/pachadotdev/tradestatistics-plumber-api
tradestatistics.io API, reads from PostgreSQL and provides tidy CSV and Apache Arrow data
apache-arrow api csv plumber-api postgresql r trade
Last synced: 20 Mar 2025
https://github.com/ashvardanian/stringtape
Apache Arrow-compatible space-efficient strings tape class in pure Rust to be used with StringZilla
apache-arrow arrow pyarrow string-manipulation tape
Last synced: 14 Sep 2025
https://github.com/wilhelmagren/falkorflight
Apache Arrow Flight server for OpenCypher queries to FalkorDB.
apache-arrow apache-arrow-flight arrow falkordb graph-database python
Last synced: 20 Jul 2025
https://github.com/matsadler/bishop
Query MongoDB via Apache Arrow and DataFusion
apache-arrow datafusion mongodb
Last synced: 04 Mar 2025
https://github.com/droher/diachronic
Get daily historical snapshots of every article on any Wiki, formatted as Parquet files
apache-arrow google-cloud terraform wikimedia wikipedia
Last synced: 06 Jul 2025
https://github.com/tiwater/rerun-query
Query and extract entity data from Rerun data files.
Last synced: 11 Apr 2025
https://github.com/roeap/adx-arrow
Kusto client library optimized for data science workloads
apache-arrow arrow azure azure-data-explorer kusto python rust
Last synced: 23 Mar 2025
https://github.com/amoeba/arrow-opentelemetry-example
Example of using OpenTelemetry and Apache Arrow
apache-arrow cpp open-telemetry
Last synced: 08 Feb 2026
https://github.com/joewood/react-iceberg
React Components to visualize Apache Iceberg tables
apache-arrow apache-iceberg apache-spark avro devcontainer docker-compose minio reactjs s3
Last synced: 31 Dec 2025
https://github.com/neo4j-field/dataflow-flex-pyarrow-to-gds
Google Dataflow Flex Templates (in Python) for large scale Graph Loading with GDS and Apache Arrow
apache-arrow apache-beam bigquery dataflow neo4j python
Last synced: 09 Apr 2025
https://github.com/amoeba/arrow-flight-playground
Various examples related to Apache Arrow Flight.
Last synced: 21 Feb 2025
https://github.com/voutilad/redpanda-flight-rs
An Apache Arrow Flight proxy for Redpanda
Last synced: 01 Dec 2025
https://github.com/amoeba/arrow-cpp-conan-example
Example using conan to package and use libarrow
apache-arrow conan conan-io cpp
Last synced: 28 Jan 2026
https://github.com/amoeba/arrow-gcs-test
Short example showing how to use GCS with Arrow C++
apache-arrow cplusplus google-cloud-storage
Last synced: 21 Feb 2025
https://github.com/iljavaleev/arrow_examples
apache arrow cpp examples
apache-arrow cpp20 pandas polars pyarrow python3
Last synced: 22 Mar 2025
https://github.com/ippras/metadata
Metadata for Apache Arrow IPC format
apache-arrow ipc metadata polars
Last synced: 03 Nov 2025
https://github.com/chabane/spark-custom-datasource
apache-arrow apache-hadoop apache-spark inputformat pyspark
Last synced: 05 Mar 2025
https://github.com/adbc-drivers/adbc-drivers.org
ADBC Driver Foundry Website
apache-arrow apache-arrow-adbc
Last synced: 20 Jan 2026
https://github.com/amoeba/arrow-cmake-fetchcontent
Minimal example of including Arrow in a C++ project using CMake and FetchContent
Last synced: 08 Sep 2025
https://github.com/krokozyab/ofarrow
Arrow Flight SQL Server for Oracle Fusion
airflow analytics apache-arrow business-intelligence dagster data-extraction etl flight-sql grpc java oracle-fusion oracle-fusion-cloud oracle-fusion-erp pandas perfect polars python reporting stream
Last synced: 30 Dec 2025
https://github.com/amoeba/arrow-cpp-wasm
Playing around with Arrow C++ and WASM, see Website for demo
apache-arrow cplusplus emscripten wasm
Last synced: 21 Feb 2025
https://github.com/amoeba/arrow-pybind11-example
Minimal example of passing Arrow objects from Python to a C++ extension
Last synced: 01 Mar 2025
https://github.com/tansudasli/tensorflow-sandbox
all about machine learning
apache-arrow keras machine-learning numpy pandas scikit-learn tensorflow2
Last synced: 01 Mar 2025
https://github.com/amoeba/arrow-cpp-examples
Various Arrow C++ examples
apache-arrow apache-parquet cmake cpp example-code
Last synced: 21 Feb 2025
https://github.com/amoeba/pyarrow-ipc-example
An example showing how to send compressed RecordBatches over HTTP with PyArrow.
Last synced: 21 Feb 2025
https://github.com/yoa95/mini-data-cloud
☁️ Build a containerized distributed data processing system to explore cloud database concepts through hands-on implementation and robust SQL support.
apache-arrow apache-calcite cloud cloud-database cloud-security cloud-storage-services columnar-storage containerization database-architecture distributed-systems docker grpc learning-platform online-storage parquet query-engine spring-boot sql-engine
Last synced: 07 Nov 2025
https://github.com/flowerinthenight/luna-go
Go SQL driver for Luna.
apache-arrow cache columnar-storage duckdb golang in-memory-database olap sql
Last synced: 08 Oct 2025
https://github.com/mridang/athena-mongodb
MongoDB connector for AWS Athena Federation
apache-arrow athena aws lambda mongodb trino
Last synced: 15 Oct 2025
https://github.com/imperial-genomics-facility/limsmetadataparsing
A pyspark based codebase for fetching and formatting metadata from a LIMS db for IGF
apache-arrow apache-spark pandas pyodbc python-3-6 sparksql
Last synced: 29 Oct 2025