Projects in Awesome Lists tagged with datafusion
A curated list of projects in awesome lists tagged with datafusion .
https://github.com/apache/datafusion
Apache DataFusion SQL Query Engine
arrow big-data dataframe datafusion olap python query-engine rust sql
Last synced: 18 Jun 2026
https://github.com/ibis-project/ibis
the portable Python dataframe library
bigquery clickhouse database datafusion duckdb impala mssql mysql pandas polars postgresql pyarrow pyspark python snowflake sql sqlite trino
Last synced: 13 May 2025
https://github.com/roapi/roapi
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
analytics arrow blob-storage cloud-native columnar datafusion datasets delta-lake graphql in-memory-database parquet query query-frontends rest-api rust s3 sql static-datasets
Last synced: 11 Feb 2026
https://github.com/lakesoul-io/lakesoul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
arrow big-data datafusion datalake flink huggingface lakehouse lakesoul postgresql python pytorch rust spark sql streaming vectorized velox
Last synced: 14 May 2025
https://github.com/lakesoul-io/LakeSoul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
arrow big-data datafusion datalake flink huggingface lakehouse lakesoul postgresql python pytorch rust spark sql streaming vectorized velox
Last synced: 27 Mar 2025
https://github.com/apache/auron
The Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query processing
big-data datafusion rust-lang spark
Last synced: 28 Aug 2025
https://github.com/kwai/blaze
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
big-data datafusion rust-lang spark
Last synced: 14 May 2025
https://github.com/lakehq/sail
LakeSail's computation framework with a mission to unify batch processing, stream processing, and compute-intensive AI workloads.
arrow artificial-intelligence big-data data data-engineering datafusion distributed-computing machine-learning pyspark python rust spark sql
Last synced: 14 Apr 2026
https://github.com/arkflow-rs/arkflow
High performance Rust stream processing engine seamlessly integrates AI capabilities, providing powerful real-time data processing and intelligent analysis.
ai arkflow datafusion deep-learning duckdb flow kafka machine-learning mysql nats postgresql redis rust rust-lang sql sqlite stream tokio tokio-rs websocket
Last synced: 24 Feb 2026
https://github.com/apache/datafusion-comet
Apache DataFusion Comet Spark Accelerator
Last synced: 01 Apr 2026
https://github.com/paradedb/pg_analytics
DuckDB-powered data lake analytics from Postgres
analytics arrow big-data columnar database datafusion datalake deltalake duckdb iceberg lakehouse lakehouse-platform object-storage olap paradedb parquet postgres postgresql realtime-analytics sql
Last synced: 24 Mar 2025
https://github.com/splitgraph/seafowl
Analytical database for data-driven Web applications 🪶
api database datafusion delta-lake delta-rs edge http rust serverless sql visualization
Last synced: 15 May 2025
https://github.com/ark-flow/arkflow
High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors.
datafusion duckdb flow kafka mysql postgresql rust rust-lang sql sqlite stream tokio tokio-rs
Last synced: 03 Apr 2025
https://github.com/chenquan/arkflow
High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors.
datafusion flow rust rust-lang sql stream
Last synced: 13 Oct 2025
https://github.com/kamu-data/kamu-cli
Next-generation decentralized data lakehouse and a multi-party stream processing network
blockchain data-as-code data-management data-science datafusion flink jupyter kamu open-data open-data-fabric spark sql
Last synced: 17 Feb 2026
https://github.com/jankaul/iceberg-rust
Rust implementation of Apache Iceberg with integration for Datafusion
Last synced: 15 May 2025
https://github.com/datafusion-contrib/datafusion-dft
Batteries included CLI, TUI, and server implementations for DataFusion.
arrow cli data database datafusion tui
Last synced: 10 May 2025
https://github.com/xiangpenghao/liquid-cache
10x lower latency for cloud-native DataFusion
arrow cache data-analytics datafusion object-store parquet query-engine
Last synced: 21 Feb 2026
https://github.com/datafusion-contrib/datafusion-java
Java binding to Apache DataFusion
arrow ballista datafusion java
Last synced: 14 Jan 2026
https://github.com/datafusion-contrib/datafusion-postgres
Postgres protocol frontend for DataFusion
Last synced: 16 Jan 2026
https://github.com/datafusion-contrib/datafusion-materialized-views
Incremental view maintenance & query rewriting for materialized views in DataFusion
arrow big-data datafusion materialized-views rust sql
Last synced: 12 May 2026
https://github.com/biodatageeks/polars-bio
Blazing-Fast Bioinformatic Operations on Python DataFrames
arrow bioinformatics dataframes datafusion genomic-intervals genomic-ranges genomics pandas polars rust-lang
Last synced: 29 Jul 2025
https://github.com/duo-rs/duo
A lightweight Logging and Tracing observability solution for Rust, built with Apache Arrow, Apache Parquet and Apache DataFusion.
apache-arrow apache-parquet datafusion logging observability rust tracing
Last synced: 29 Mar 2025
https://github.com/datafusion-contrib/datafusion-objectstore-s3
S3 as an ObjectStore for DataFusion
Last synced: 06 Apr 2026
https://github.com/datafusion-contrib/datafusion-distributed
Library for bringing distributed capabilities to Apache DataFusion
arrow datafusion distributed distributed-computing distributed-da distributed-systems query-e
Last synced: 25 Jan 2026
https://github.com/shauryashaurya/learn-data-munging
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
arrow dask dask-distributed data-engineering datafusion jupyter numpy pandas polars pyspark ray spark
Last synced: 16 Apr 2025
https://github.com/metrico/influxdb3-community
Community InfluxDB 3.0 "IOx" static builds + containers + Examples for Developers & Integrators. Experiment with low-cost storage, unlimited cardinality and FlightSQL APIs
arrow datafusion flightsql flux influx influxdb influxdb3 iox lineprotocol musl rust
Last synced: 12 Apr 2025
https://github.com/splitgraph/seafowl-gcsfuse
Scale to zero Seafowl hosting with Cloud Run
datafusion faas gcp rust seafowl severless
Last synced: 14 Apr 2025
https://github.com/treebee/elixir-arrow
Experimental Elixir bindings for Apache Arrow including Parquet and DataFusion
arrow datafusion parquet query-engine
Last synced: 05 Mar 2026
https://github.com/apache/datafusion-java
Java bindings for Apache DataFusion
apache arrow datafusion java jni jvm query-engine sql
Last synced: 15 Jun 2026
https://github.com/grouzen/zio-apache-arrow
Scala ZIO-powered Apache Arrow library
apache-arrow arrow arrow-datafusion big-data bigdata datafusion scala zio zio-streams zio2
Last synced: 24 Dec 2025
https://github.com/datafusion-contrib/datafusion-c
C language bindings for DataFusion
apache-arrow c datafusion glib sql
Last synced: 25 Jan 2026
https://github.com/influxdata/datafusion-udf-wasm
DataFusion UDFs (User Defined Functions) via WebAssembly
Last synced: 04 Mar 2026
https://github.com/blaze-init/spark-blaze-extension
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Last synced: 31 Mar 2025
https://github.com/jayendra13/zarr-datafusion
Extending DataFusion to do SQL queries on Zarr data.
datafusion geospatial-analysis olap-cube rust zarr-v3
Last synced: 06 Feb 2026
https://github.com/nazarii-piontko/datafusion-sharp
.NET bindings for Apache DataFusion query engine - execute blazing-fast SQL queries on Parquet, CSV, and JSON with Apache Arrow
analytics arrow binding csharp csv datafusion dotnet parquet query-engine rust sql
Last synced: 26 Apr 2026
https://github.com/hengfeiyang/how-query-engines-work-zh-cn
How Query Engines Work 中文版
arrow ballista datafusion parquet
Last synced: 13 Oct 2025
https://github.com/friendlymatthew/arrow-csv2
Vectorized CSV parsing for Apache Arrow
arrow csv datafusion object-storage rust simd
Last synced: 03 Apr 2026
https://github.com/roeap/flight-fusion
arrow data-science datafusion deltalake flight
Last synced: 12 Apr 2025
https://github.com/milenkovicm/wasaffi
Datafusion WASM User Defined Functions
datafusion sql userdefined-functions wasm wasm-bindgen wasmedge
Last synced: 07 May 2025
https://github.com/milenkovicm/torchfusion
Torchfusion is a very opinionated torch inference on datafusion.
batch-inference datafusion inference machine-learning pytorch rust sql torch userdefined-functions
Last synced: 08 Oct 2025
https://github.com/systemxlabs/datafusion-remote-table
A DataFusion table provider for executing SQL queries on remote databases.
datafusion mysql oracle postgresql rust sql sqlite
Last synced: 30 Apr 2025
https://github.com/macieklesiczka/azof
Lakehouse with time travel
datafusion datalake lakehouse parquet rust-lang
Last synced: 02 Mar 2026
https://github.com/splitgraph/experimental-datafusion-webassembly
proof-of-concept: compile datafusion to `wasm32-wasi` (run in `wasmedge`) and `wasm32-unknown-unknown` (run in browser)
arrow datafusion wasm32-unknown-unknown wasm32-wasi wasmedge webassembly
Last synced: 02 Jan 2026
https://github.com/milenkovicm/adhesive
Apache Datafusion JVM User Defined Functions (UDF), integration nobody asked for 😀
arrow bytecode-compiler compiler datafusion java jni jvm rust sql udf udf-libraries userdefined-functions
Last synced: 10 Apr 2025
https://github.com/datafusion-contrib/datafusion-tpch
Native Rust TPCH support for Datafusion using tpchgen
databases datafusion datafusion-testing tpch tpchgen-rs
Last synced: 06 Apr 2026
https://github.com/milenkovicm/ballista_delta
Datafusion Ballista support for Delta Table (showcase project)
ballista datafusion delta-lake deltatable distributed objectstore rust rustlang
Last synced: 10 Aug 2025
https://github.com/walker83/rorisdb
RorisDB is a real-time OLAP database reimagined in Rust. It is architecturally inspired by Apache Doris — adopting its proven MPP architecture, columnar storage, and materialized view design — while rebuilt from the ground up in Rust for memory safety, zero-cost abstractions, and fine-grained resource control.
analytics apache-doris columnar database datafusion local-development mysql olap parquet rust single-node sql
Last synced: 31 May 2026
https://github.com/matsadler/bishop
Query MongoDB via Apache Arrow and DataFusion
apache-arrow datafusion mongodb
Last synced: 11 May 2026
https://github.com/milenkovicm/lightfusion
LightGBM Inference on Datafusion
batch-inference datafusion inference lightgbm machine-learning rust sql udf userdefined-functions
Last synced: 06 May 2026
https://github.com/lostmygithubaccount/ibis-bench
A composable data system benchmark in a Python package.
Last synced: 27 Dec 2025
https://github.com/milenkovicm/ballista_python
Ballista cluster pyarrow udf support
arrow ballista datafusion distributed pyarrow pyo3 python rust rust-lang udf
Last synced: 29 Apr 2026
https://github.com/apache/datafusion-sandbox
Apache DataFusion SQL Query Engine
arrow big-data dataframe datafusion olap python query-engine rust sql
Last synced: 11 Apr 2026
https://github.com/f-aguzzi/ChemFuseKit
Chemometrics library for data fusion, model training and prediction of data from multiple sensor sources.
chemometrics datafusion knn lda pca plsda scikit-learn svm
Last synced: 21 Sep 2025
https://github.com/macieklesiczka/bazof
Lakehouse with time travel
datafusion datalake lakehouse parquet rust-lang
Last synced: 22 Mar 2025
https://github.com/hienphamlabs/fusionj
An Incomplete DataFusion Query Engine implemeted in Java
Last synced: 23 Jun 2026
https://github.com/f-aguzzi/chemfusekit
Chemometrics library for data fusion, model training and prediction of data from multiple sensor sources.
chemometrics datafusion knn lda pca plsda scikit-learn svm
Last synced: 20 Jan 2026
https://github.com/duhanmin/arrow-sql-yarn
通过jni将sql执行到datafusion/polars引擎
datafusion jni polars rust yarn
Last synced: 09 May 2026
https://github.com/mrasu/dataharpoon
An MCP-ready query engine that connects to your data — wherever it lives
database datafusion mcp mcp-client mcp-server query-engine
Last synced: 01 Aug 2025
https://github.com/hienduyph/fusionj
An Incomplete DataFusion Query Engine implemeted in Java
Last synced: 13 Oct 2025
https://github.com/dkdc-dev/ibis-bench
A composable data system benchmark in a Python package.
Last synced: 25 May 2026
https://github.com/svenslaggare/gitrends
Web-based behavior code analysis tool.
behavior-code-analysis datafusion git sql trends
Last synced: 17 May 2026
https://github.com/apache/datafusion-testing
Apache DataFusion SQL Query Engine Testing
apache datafusion datafusion-testing
Last synced: 01 Aug 2025
https://github.com/duyet/ballista
Example of Ballista Rust
arrow ballista datafusion rust
Last synced: 21 Mar 2025
https://github.com/qizhipei/mathfusion
MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion
Last synced: 28 Feb 2026
https://github.com/burhanahmed1/cryptosynth
Bitcoin Sentiment Forecast is a Multimodal approach to Bitcoin price forecasting using NLP and Time Series Analysis
datafusion datapreprocessing eda explainable-ai featureengineering machinelearning multimodal-deep-learning pca predictive-modeling
Last synced: 28 Oct 2025
https://github.com/jeroenflvr/datapress
A config-driven Rust server that publishes Parquet and Delta datasets as fast, typed HTTP APIs from local disk or object storage, with interchangeable DuckDB or Arrow+DataFusion backends, JSON and Arrow IPC output, and production-ready features like auth, metrics, and hot reloads.
api arrow arrow-ipc authz datafusion deltalake duckdb http in-memory parquet s3 sql
Last synced: 06 Jun 2026
https://github.com/pragmaai/yelp-datapipeline
🍽️ Yelp Data Pipeline & Analytics Dashboard End-to-end data engineering pipeline processing Yelp dataset with Rust transforms, Apache Airflow orchestration, and interactive Streamlit analytics. Features business insights, user engagement analysis, and city performance comparisons. 🚀 Docker-ready • 📊 Interactive Dashboard • ⚡ High-performance R
airflow data-engineering data-pipeline data-visualization datafusion docker rust streamlit yelp yelp-dataset
Last synced: 04 May 2026
https://github.com/james-ralph8555/1brc
1brc https://www.morling.dev/blog/one-billion-row-challenge/
cpp datafusion duckdb rust sql
Last synced: 18 Apr 2026