An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with datafusion

A curated list of projects in awesome lists tagged with datafusion .

https://github.com/apache/datafusion

Apache DataFusion SQL Query Engine

arrow big-data dataframe datafusion olap python query-engine rust sql

Last synced: 18 Jun 2026

https://github.com/roapi/roapi

Create full-fledged APIs for slowly moving datasets without writing a single line of code.

analytics arrow blob-storage cloud-native columnar datafusion datasets delta-lake graphql in-memory-database parquet query query-frontends rest-api rust s3 sql static-datasets

Last synced: 11 Feb 2026

https://github.com/lakesoul-io/lakesoul

LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

arrow big-data datafusion datalake flink huggingface lakehouse lakesoul postgresql python pytorch rust spark sql streaming vectorized velox

Last synced: 14 May 2025

https://github.com/lakesoul-io/LakeSoul

LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

arrow big-data datafusion datalake flink huggingface lakehouse lakesoul postgresql python pytorch rust spark sql streaming vectorized velox

Last synced: 27 Mar 2025

https://github.com/apache/auron

The Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query processing

big-data datafusion rust-lang spark

Last synced: 28 Aug 2025

https://github.com/kwai/blaze

Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.

big-data datafusion rust-lang spark

Last synced: 14 May 2025

https://github.com/lakehq/sail

LakeSail's computation framework with a mission to unify batch processing, stream processing, and compute-intensive AI workloads.

arrow artificial-intelligence big-data data data-engineering datafusion distributed-computing machine-learning pyspark python rust spark sql

Last synced: 14 Apr 2026

https://github.com/arkflow-rs/arkflow

High performance Rust stream processing engine seamlessly integrates AI capabilities, providing powerful real-time data processing and intelligent analysis.

ai arkflow datafusion deep-learning duckdb flow kafka machine-learning mysql nats postgresql redis rust rust-lang sql sqlite stream tokio tokio-rs websocket

Last synced: 24 Feb 2026

https://github.com/apache/datafusion-comet

Apache DataFusion Comet Spark Accelerator

arrow datafusion rust spark

Last synced: 01 Apr 2026

https://github.com/splitgraph/seafowl

Analytical database for data-driven Web applications 🪶

api database datafusion delta-lake delta-rs edge http rust serverless sql visualization

Last synced: 15 May 2025

https://github.com/ark-flow/arkflow

High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors.

datafusion duckdb flow kafka mysql postgresql rust rust-lang sql sqlite stream tokio tokio-rs

Last synced: 03 Apr 2025

https://github.com/chenquan/arkflow

High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors.

datafusion flow rust rust-lang sql stream

Last synced: 13 Oct 2025

https://github.com/kamu-data/kamu-cli

Next-generation decentralized data lakehouse and a multi-party stream processing network

blockchain data-as-code data-management data-science datafusion flink jupyter kamu open-data open-data-fabric spark sql

Last synced: 17 Feb 2026

https://github.com/jankaul/iceberg-rust

Rust implementation of Apache Iceberg with integration for Datafusion

arrow datafusion iceberg rust

Last synced: 15 May 2025

https://github.com/datafusion-contrib/datafusion-dft

Batteries included CLI, TUI, and server implementations for DataFusion.

arrow cli data database datafusion tui

Last synced: 10 May 2025

https://github.com/xiangpenghao/liquid-cache

10x lower latency for cloud-native DataFusion

arrow cache data-analytics datafusion object-store parquet query-engine

Last synced: 21 Feb 2026

https://github.com/datafusion-contrib/datafusion-java

Java binding to Apache DataFusion

arrow ballista datafusion java

Last synced: 14 Jan 2026

https://github.com/datafusion-contrib/datafusion-postgres

Postgres protocol frontend for DataFusion

datafusion postgresql rust

Last synced: 16 Jan 2026

https://github.com/datafusion-contrib/datafusion-materialized-views

Incremental view maintenance & query rewriting for materialized views in DataFusion

arrow big-data datafusion materialized-views rust sql

Last synced: 12 May 2026

https://github.com/biodatageeks/polars-bio

Blazing-Fast Bioinformatic Operations on Python DataFrames

arrow bioinformatics dataframes datafusion genomic-intervals genomic-ranges genomics pandas polars rust-lang

Last synced: 29 Jul 2025

https://github.com/duo-rs/duo

A lightweight Logging and Tracing observability solution for Rust, built with Apache Arrow, Apache Parquet and Apache DataFusion.

apache-arrow apache-parquet datafusion logging observability rust tracing

Last synced: 29 Mar 2025

https://github.com/datafusion-contrib/datafusion-objectstore-s3

S3 as an ObjectStore for DataFusion

datafusion rust

Last synced: 06 Apr 2026

https://github.com/datafusion-contrib/datafusion-distributed

Library for bringing distributed capabilities to Apache DataFusion

arrow datafusion distributed distributed-computing distributed-da distributed-systems query-e

Last synced: 25 Jan 2026

https://github.com/shauryashaurya/learn-data-munging

Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.

arrow dask dask-distributed data-engineering datafusion jupyter numpy pandas polars pyspark ray spark

Last synced: 16 Apr 2025

https://github.com/metrico/influxdb3-community

Community InfluxDB 3.0 "IOx" static builds + containers + Examples for Developers & Integrators. Experiment with low-cost storage, unlimited cardinality and FlightSQL APIs

arrow datafusion flightsql flux influx influxdb influxdb3 iox lineprotocol musl rust

Last synced: 12 Apr 2025

https://github.com/splitgraph/seafowl-gcsfuse

Scale to zero Seafowl hosting with Cloud Run

datafusion faas gcp rust seafowl severless

Last synced: 14 Apr 2025

https://github.com/treebee/elixir-arrow

Experimental Elixir bindings for Apache Arrow including Parquet and DataFusion

arrow datafusion parquet query-engine

Last synced: 05 Mar 2026

https://github.com/apache/datafusion-java

Java bindings for Apache DataFusion

apache arrow datafusion java jni jvm query-engine sql

Last synced: 15 Jun 2026

https://github.com/datafusion-contrib/datafusion-c

C language bindings for DataFusion

apache-arrow c datafusion glib sql

Last synced: 25 Jan 2026

https://github.com/influxdata/datafusion-udf-wasm

DataFusion UDFs (User Defined Functions) via WebAssembly

datafusion wasm

Last synced: 04 Mar 2026

https://github.com/blaze-init/spark-blaze-extension

Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.

arrow datafusion spark

Last synced: 31 Mar 2025

https://github.com/jayendra13/zarr-datafusion

Extending DataFusion to do SQL queries on Zarr data.

datafusion geospatial-analysis olap-cube rust zarr-v3

Last synced: 06 Feb 2026

https://github.com/nazarii-piontko/datafusion-sharp

.NET bindings for Apache DataFusion query engine - execute blazing-fast SQL queries on Parquet, CSV, and JSON with Apache Arrow

analytics arrow binding csharp csv datafusion dotnet parquet query-engine rust sql

Last synced: 26 Apr 2026

https://github.com/hengfeiyang/how-query-engines-work-zh-cn

How Query Engines Work 中文版

arrow ballista datafusion parquet

Last synced: 13 Oct 2025

https://github.com/friendlymatthew/arrow-csv2

Vectorized CSV parsing for Apache Arrow

arrow csv datafusion object-storage rust simd

Last synced: 03 Apr 2026

https://github.com/sal-openlab/datafusion-server

Rust DataFusion Server

arrow datafusion rust sql

Last synced: 15 Apr 2025

https://github.com/milenkovicm/wasaffi

Datafusion WASM User Defined Functions

datafusion sql userdefined-functions wasm wasm-bindgen wasmedge

Last synced: 07 May 2025

https://github.com/milenkovicm/torchfusion

Torchfusion is a very opinionated torch inference on datafusion.

batch-inference datafusion inference machine-learning pytorch rust sql torch userdefined-functions

Last synced: 08 Oct 2025

https://github.com/systemxlabs/datafusion-remote-table

A DataFusion table provider for executing SQL queries on remote databases.

datafusion mysql oracle postgresql rust sql sqlite

Last synced: 30 Apr 2025

https://github.com/macieklesiczka/azof

Lakehouse with time travel

datafusion datalake lakehouse parquet rust-lang

Last synced: 02 Mar 2026

https://github.com/splitgraph/experimental-datafusion-webassembly

proof-of-concept: compile datafusion to `wasm32-wasi` (run in `wasmedge`) and `wasm32-unknown-unknown` (run in browser)

arrow datafusion wasm32-unknown-unknown wasm32-wasi wasmedge webassembly

Last synced: 02 Jan 2026

https://github.com/milenkovicm/adhesive

Apache Datafusion JVM User Defined Functions (UDF), integration nobody asked for 😀

arrow bytecode-compiler compiler datafusion java jni jvm rust sql udf udf-libraries userdefined-functions

Last synced: 10 Apr 2025

https://github.com/datafusion-contrib/datafusion-tpch

Native Rust TPCH support for Datafusion using tpchgen

databases datafusion datafusion-testing tpch tpchgen-rs

Last synced: 06 Apr 2026

https://github.com/milenkovicm/ballista_delta

Datafusion Ballista support for Delta Table (showcase project)

ballista datafusion delta-lake deltatable distributed objectstore rust rustlang

Last synced: 10 Aug 2025

https://github.com/walker83/rorisdb

RorisDB is a real-time OLAP database reimagined in Rust. It is architecturally inspired by Apache Doris — adopting its proven MPP architecture, columnar storage, and materialized view design — while rebuilt from the ground up in Rust for memory safety, zero-cost abstractions, and fine-grained resource control.

analytics apache-doris columnar database datafusion local-development mysql olap parquet rust single-node sql

Last synced: 31 May 2026

https://github.com/matsadler/bishop

Query MongoDB via Apache Arrow and DataFusion

apache-arrow datafusion mongodb

Last synced: 11 May 2026

https://github.com/lostmygithubaccount/ibis-bench

A composable data system benchmark in a Python package.

datafusion duckdb ibis polars

Last synced: 27 Dec 2025

https://github.com/f-aguzzi/ChemFuseKit

Chemometrics library for data fusion, model training and prediction of data from multiple sensor sources.

chemometrics datafusion knn lda pca plsda scikit-learn svm

Last synced: 21 Sep 2025

https://github.com/macieklesiczka/bazof

Lakehouse with time travel

datafusion datalake lakehouse parquet rust-lang

Last synced: 22 Mar 2025

https://github.com/hienphamlabs/fusionj

An Incomplete DataFusion Query Engine implemeted in Java

arrow datafusion query-engine

Last synced: 23 Jun 2026

https://github.com/f-aguzzi/chemfusekit

Chemometrics library for data fusion, model training and prediction of data from multiple sensor sources.

chemometrics datafusion knn lda pca plsda scikit-learn svm

Last synced: 20 Jan 2026

https://github.com/duhanmin/arrow-sql-yarn

通过jni将sql执行到datafusion/polars引擎

datafusion jni polars rust yarn

Last synced: 09 May 2026

https://github.com/mrasu/dataharpoon

An MCP-ready query engine that connects to your data — wherever it lives

database datafusion mcp mcp-client mcp-server query-engine

Last synced: 01 Aug 2025

https://github.com/hienduyph/fusionj

An Incomplete DataFusion Query Engine implemeted in Java

arrow datafusion query-engine

Last synced: 13 Oct 2025

https://github.com/dkdc-dev/ibis-bench

A composable data system benchmark in a Python package.

datafusion duckdb ibis polars

Last synced: 25 May 2026

https://github.com/svenslaggare/gitrends

Web-based behavior code analysis tool.

behavior-code-analysis datafusion git sql trends

Last synced: 17 May 2026

https://github.com/apache/datafusion-testing

Apache DataFusion SQL Query Engine Testing

apache datafusion datafusion-testing

Last synced: 01 Aug 2025

https://github.com/duyet/ballista

Example of Ballista Rust

arrow ballista datafusion rust

Last synced: 21 Mar 2025

https://github.com/qizhipei/mathfusion

MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion

datafusion math mathematics

Last synced: 28 Feb 2026

https://github.com/burhanahmed1/cryptosynth

Bitcoin Sentiment Forecast is a Multimodal approach to Bitcoin price forecasting using NLP and Time Series Analysis

datafusion datapreprocessing eda explainable-ai featureengineering machinelearning multimodal-deep-learning pca predictive-modeling

Last synced: 28 Oct 2025

https://github.com/jeroenflvr/datapress

A config-driven Rust server that publishes Parquet and Delta datasets as fast, typed HTTP APIs from local disk or object storage, with interchangeable DuckDB or Arrow+DataFusion backends, JSON and Arrow IPC output, and production-ready features like auth, metrics, and hot reloads.

api arrow arrow-ipc authz datafusion deltalake duckdb http in-memory parquet s3 sql

Last synced: 06 Jun 2026

https://github.com/pragmaai/yelp-datapipeline

🍽️ Yelp Data Pipeline & Analytics Dashboard End-to-end data engineering pipeline processing Yelp dataset with Rust transforms, Apache Airflow orchestration, and interactive Streamlit analytics. Features business insights, user engagement analysis, and city performance comparisons. 🚀 Docker-ready • 📊 Interactive Dashboard • ⚡ High-performance R

airflow data-engineering data-pipeline data-visualization datafusion docker rust streamlit yelp yelp-dataset

Last synced: 04 May 2026

https://github.com/james-ralph8555/1brc

1brc https://www.morling.dev/blog/one-billion-row-challenge/

cpp datafusion duckdb rust sql

Last synced: 18 Apr 2026