https://github.com/edmondop/awesome-datafusion
A collection of resources about DataFusion
https://github.com/edmondop/awesome-datafusion
List: awesome-datafusion
Last synced: 3 months ago
JSON representation
A collection of resources about DataFusion
- Host: GitHub
- URL: https://github.com/edmondop/awesome-datafusion
- Owner: edmondop
- Created: 2024-10-03T21:26:13.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-11-11T12:43:36.000Z (7 months ago)
- Last Synced: 2025-02-28T22:47:33.832Z (3 months ago)
- Size: 8.79 KB
- Stars: 16
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- ultimate-awesome - awesome-datafusion - A collection of resources about DataFusion. (Other Lists / Julia Lists)
README
# awesome-datafusion
A collection of resources about DataFusion
# DataFusion projects
- [Apache DataFusion](https://datafusion.apache.org/)
- [Apache DataFusion Comet](https://datafusion.apache.org/comet/)
- [Apache DataFusion Ballista](https://datafusion.apache.org/ballista/)
- [Apache DataFusion Ray](https://github.com/apache/datafusion-ray)# DataFusion-contrib projects
- [Apache DataFusion DFT](https://github.com/datafusion-contrib/datafusion-dft)
- [Apache DataFusion Federation](https://github.com/datafusion-contrib/datafusion-federation)
- [Apache DataFusion Table Providers](https://github.com/datafusion-contrib/datafusion-table-providers)# Projects built on DataFusion
- [Arroyo](https://www.arroyo.dev/)
- [GlareDB](https://github.com/GlareDB/glaredb)
- [LanceDB](https://lancedb.github.io/lancedb/)
- [Optd](https://cmu-db.github.io/optd/)
- [SpiceAI](https://www.spiceai.org/)
- [Synnada](https://www.synnada.ai/)# DataFusion resources
- [Aggregating Millions of Groups Fast in Apache Arrow DataFusion](https://www.influxdata.com/blog/aggregating-millions-groups-fast-apache-arrow-datafusion/)
- [Announcing Apache Arrow DataFusion Comet](https://arrow.apache.org/blog/2024/03/06/comet-donation/)
- [Apache Arrow DataFusion, primer](https://www.work-bench.com/post/apache-arrow-datafusion-a-primer)
- [Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine](https://dl.acm.org/doi/10.1145/3626246.3653368)
- [DataFusion: A Rust-based Query Engine for Arrow](https://arrow.apache.org/blog/2020/07/07/datafusion-in-rust/)
- [Flight, DataFusion, Arrow and Parquet: Using the FDAP architecture to build InfluxDB 3.0](https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/)
- [GizmoSQL, An Arrow Flight SQL Server with DuckDB or SQLite back-end execution engines](https://github.com/giz[modata/gizmosql)
- [We built a new SQL engine on Arrow and DataFusion](https://www.arroyo.dev/blog/why-arrow-and-datafusion)# Comet guides
- [Apache DataFusion Comet](https://makism.notion.site/Apache-DataFusion-Comet-13315bfb0437800c9ac9f9f7ca6baf6f)
- [DataFusion Comet EMR builder](https://github.com/edmondop/datafusion-comet-ami-builder/)# Videos
- [Accelerating Apache Spark Workloads with Apache DataFusion Comet with Andy Grove](https://www.youtube.com/watch?v=o59s0d3HE1k)
- [Apache Arrow DataFusion Architecture with Andrew Lamb - Part 1](https://www.youtube.com/watch?v=NVKujPxwSBA)
- [Apache Arrow DataFusion Architecture with Andrew Lamb - Part 2](https://www.youtube.com/watch?v=EzZTLiSJnhY)
- [Apache Arrow DataFusion Architecture with Andrew Lamb - Part 3](https://www.youtube.com/watch?v=2jkWU3_w6z0)# Projects that use DataFusion
- [Restate](https://restate.dev/)
# Slides from Meetuups
- [Composable, Malleable & Lean Compute Infrastructure for the Future](https://docs.google.com/presentation/d/1i7l7bslZp3rRx0_S9ejFTC5ChyvS2bYS/edit#slide=id.p1) Mehmet Ozan Kabak, Synnada CEO @ozankabak (I don't seem to have these slides)
- [Composing Data Systems At Datadog.pdf](https://github.com/user-attachments/files/17080025/Composing.Data.Systems.At.Datadog.pdf), - Wendell Smith and Alex Bianchi, DataDog.
- [DataFusion: What, Why, How](https://docs.google.com/presentation/d/1zFh-ayH922k9Rvz2lZxYzjfoemKfr8mRLpw8BLHdw7k/edit#slide=id.g26bebde4fcc_3_7)
- [Database replication using the FDAP stack](https://docs.google.com/presentation/d/1hp0lRIwG8wpRlPMtdx-BPxXU3L2vqCBuPRkTFDgHoHo/edit#slide=id.p1) Marko Grujic, EnterpriseDB @gruuya
- [Embedding DataFusion in Postgres, Strengths and Limitations](https://docs.google.com/presentation/d/15yZBgAKSUB8nQGTOg9hNzuSAN7D9SPPAs_mpyxqoZyk/edit?usp=sharing): Philippe Noël, ParadeDB, CEO
- [Ibis + DataFusion playing (very) nicely together.](https://ibis-project.org/presentations/datafusion-meetup-nyc-2024/talk)- Gil Forsyth, Voltron Data, Senior Staff Software Engineer
- [Materialized Views and Query Rewriting in DataFusion](https://drive.google.com/open?id=1mHDw1uZcOwlpUO3mA8aqSyk7IqeovpSuXG27clowXWE): Matthew Cramerus, [Polygon.io](http://polygon.io/), Software Engineer
- [Reducing query latency in DataFusion via a caching object store layer](https://docs.google.com/presentation/d/1TiToVb5rVFrmuR9Dxej7HgWpyv0p_88Ise3CyQKZSzE/edit#slide=id.p1) Artjoms Iskovs, EnterpriseDB, Principal Engineer @mildbyte
- [The Types](https://docs.google.com/presentation/d/1VW_JCGbN22lrGUOMRvUXGpAmlJopbG02hn_SDYJouiY) Piotr Findeisen, SDF @findepi
- [Using DataFusion to build InfluxDB 3.0](https://docs.google.com/presentation/d/1dOLPAFPEMLhLv4NN6O9QSDIyyeiIySqAjky5cVgdWAE/edit#slide=id.g26bebde4fcc_3_7): Andrew Lamb, InfluxData, Staff Engineer
- [dft: A terminal application for DataFusion](https://docs.google.com/presentation/d/1u42k8ZBHObLx1Ph5CvQOkeRJ72ys81Jo7BkCxjCU5Ik/edit#slide=id.p) Matthew Turner