An open API service indexing awesome lists of open source software.

https://github.com/edmondop/awesome-datafusion

A collection of resources about DataFusion
https://github.com/edmondop/awesome-datafusion

List: awesome-datafusion

Last synced: 3 months ago
JSON representation

A collection of resources about DataFusion

Awesome Lists containing this project

README

        

# awesome-datafusion

A collection of resources about DataFusion

# DataFusion projects

- [Apache DataFusion](https://datafusion.apache.org/)
- [Apache DataFusion Comet](https://datafusion.apache.org/comet/)
- [Apache DataFusion Ballista](https://datafusion.apache.org/ballista/)
- [Apache DataFusion Ray](https://github.com/apache/datafusion-ray)

# DataFusion-contrib projects

- [Apache DataFusion DFT](https://github.com/datafusion-contrib/datafusion-dft)
- [Apache DataFusion Federation](https://github.com/datafusion-contrib/datafusion-federation)
- [Apache DataFusion Table Providers](https://github.com/datafusion-contrib/datafusion-table-providers)

# Projects built on DataFusion

- [Arroyo](https://www.arroyo.dev/)
- [GlareDB](https://github.com/GlareDB/glaredb)
- [LanceDB](https://lancedb.github.io/lancedb/)
- [Optd](https://cmu-db.github.io/optd/)
- [SpiceAI](https://www.spiceai.org/)
- [Synnada](https://www.synnada.ai/)

# DataFusion resources

- [Aggregating Millions of Groups Fast in Apache Arrow DataFusion](https://www.influxdata.com/blog/aggregating-millions-groups-fast-apache-arrow-datafusion/)
- [Announcing Apache Arrow DataFusion Comet](https://arrow.apache.org/blog/2024/03/06/comet-donation/)
- [Apache Arrow DataFusion, primer](https://www.work-bench.com/post/apache-arrow-datafusion-a-primer)
- [Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine](https://dl.acm.org/doi/10.1145/3626246.3653368)
- [DataFusion: A Rust-based Query Engine for Arrow](https://arrow.apache.org/blog/2020/07/07/datafusion-in-rust/)
- [Flight, DataFusion, Arrow and Parquet: Using the FDAP architecture to build InfluxDB 3.0](https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/)
- [GizmoSQL, An Arrow Flight SQL Server with DuckDB or SQLite back-end execution engines](https://github.com/giz[modata/gizmosql)
- [We built a new SQL engine on Arrow and DataFusion](https://www.arroyo.dev/blog/why-arrow-and-datafusion)

# Comet guides
- [Apache DataFusion Comet](https://makism.notion.site/Apache-DataFusion-Comet-13315bfb0437800c9ac9f9f7ca6baf6f)
- [DataFusion Comet EMR builder](https://github.com/edmondop/datafusion-comet-ami-builder/)

# Videos

- [Accelerating Apache Spark Workloads with Apache DataFusion Comet with Andy Grove](https://www.youtube.com/watch?v=o59s0d3HE1k)
- [Apache Arrow DataFusion Architecture with Andrew Lamb - Part 1](https://www.youtube.com/watch?v=NVKujPxwSBA)
- [Apache Arrow DataFusion Architecture with Andrew Lamb - Part 2](https://www.youtube.com/watch?v=EzZTLiSJnhY)
- [Apache Arrow DataFusion Architecture with Andrew Lamb - Part 3](https://www.youtube.com/watch?v=2jkWU3_w6z0)

# Projects that use DataFusion

- [Restate](https://restate.dev/)

# Slides from Meetuups

- [Composable, Malleable & Lean Compute Infrastructure for the Future](https://docs.google.com/presentation/d/1i7l7bslZp3rRx0_S9ejFTC5ChyvS2bYS/edit#slide=id.p1) Mehmet Ozan Kabak, Synnada CEO @ozankabak (I don't seem to have these slides)
- [Composing Data Systems At Datadog.pdf](https://github.com/user-attachments/files/17080025/Composing.Data.Systems.At.Datadog.pdf), - Wendell Smith and Alex Bianchi, DataDog.
- [DataFusion: What, Why, How](https://docs.google.com/presentation/d/1zFh-ayH922k9Rvz2lZxYzjfoemKfr8mRLpw8BLHdw7k/edit#slide=id.g26bebde4fcc_3_7)
- [Database replication using the FDAP stack](https://docs.google.com/presentation/d/1hp0lRIwG8wpRlPMtdx-BPxXU3L2vqCBuPRkTFDgHoHo/edit#slide=id.p1) Marko Grujic, EnterpriseDB @gruuya
- [Embedding DataFusion in Postgres, Strengths and Limitations](https://docs.google.com/presentation/d/15yZBgAKSUB8nQGTOg9hNzuSAN7D9SPPAs_mpyxqoZyk/edit?usp=sharing): Philippe Noël, ParadeDB, CEO
- [Ibis + DataFusion playing (very) nicely together.](https://ibis-project.org/presentations/datafusion-meetup-nyc-2024/talk)- Gil Forsyth, Voltron Data, Senior Staff Software Engineer
- [Materialized Views and Query Rewriting in DataFusion](https://drive.google.com/open?id=1mHDw1uZcOwlpUO3mA8aqSyk7IqeovpSuXG27clowXWE): Matthew Cramerus, [Polygon.io](http://polygon.io/), Software Engineer
- [Reducing query latency in DataFusion via a caching object store layer](https://docs.google.com/presentation/d/1TiToVb5rVFrmuR9Dxej7HgWpyv0p_88Ise3CyQKZSzE/edit#slide=id.p1) Artjoms Iskovs, EnterpriseDB, Principal Engineer @mildbyte
- [The Types](https://docs.google.com/presentation/d/1VW_JCGbN22lrGUOMRvUXGpAmlJopbG02hn_SDYJouiY) Piotr Findeisen, SDF @findepi
- [Using DataFusion to build InfluxDB 3.0](https://docs.google.com/presentation/d/1dOLPAFPEMLhLv4NN6O9QSDIyyeiIySqAjky5cVgdWAE/edit#slide=id.g26bebde4fcc_3_7): Andrew Lamb, InfluxData, Staff Engineer
- [dft: A terminal application for DataFusion](https://docs.google.com/presentation/d/1u42k8ZBHObLx1Ph5CvQOkeRJ72ys81Jo7BkCxjCU5Ik/edit#slide=id.p) Matthew Turner