Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/datafusion-contrib/datafusion-table-providers
DataFusion TableProviders for reading data from other systems
https://github.com/datafusion-contrib/datafusion-table-providers
Last synced: 2 months ago
JSON representation
DataFusion TableProviders for reading data from other systems
- Host: GitHub
- URL: https://github.com/datafusion-contrib/datafusion-table-providers
- Owner: datafusion-contrib
- License: apache-2.0
- Created: 2024-04-17T17:16:31.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-11-13T02:30:23.000Z (2 months ago)
- Last Synced: 2024-11-13T02:32:54.637Z (2 months ago)
- Language: Rust
- Size: 442 KB
- Stars: 59
- Watchers: 11
- Forks: 17
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-datafusion - Apache DataFusion Table Providers
- awesome-datafusion - Apache DataFusion Table Providers
README
# DataFusion Table Providers
Note: This is not an official Apache Software Foundation project.
The goal of this repo is to extend the capabilities of DataFusion to support additional data sources via implementations of the `TableProvider` trait.
Many of the table providers in this repo are for querying data from other database systems. Those providers also integrate with the [`datafusion-federation`](https://github.com/datafusion-contrib/datafusion-federation/) crate to allow for more efficient query execution, such as pushing down joins between multiple tables from the same database system, or efficiently implementing TopK style queries (`SELECT * FROM table ORDER BY foo LIMIT 10`).
To use these table providers with efficient federation push-down, add the `datafusion-federation` crate and create a DataFusion `SessionContext` using the Federation optimizer rule and query planner with:
```rust
use datafusion::prelude::SessionContext;let state = datafusion_federation::default_session_state();
let ctx = SessionContext::with_state(state);// Register the specific table providers into ctx
// queries will now automatically be federated
```## Table Providers
- PostgreSQL
- MySQL
- SQLite
- DuckDB
- Flight SQL## Examples
Run the included examples to see how to use the table providers:
### DuckDB
```bash
# Read from a table in a DuckDB file
cargo run --example duckdb --features duckdb
# Create an external table backed by DuckDB directly in DataFusion
cargo run --example duckdb_external_table --features duckdb
# Use the result of a DuckDB function call as the source of a table
cargo run --example duckdb_function --features duckdb
```### SQLite
```bash
cargo run --example sqlite --features sqlite
```### Postgres
In order to run the Postgres example, you need to have a Postgres server running. You can use the following command to start a Postgres server in a Docker container the example can use:
```bash
docker run --name postgres -e POSTGRES_PASSWORD=password -e POSTGRES_DB=postgres_db -p 5432:5432 -d postgres:16-alpine
# Wait for the Postgres server to start
sleep 30# Create a table in the Postgres server and insert some data
docker exec -i postgres psql -U postgres -d postgres_db <