https://github.com/datafusion-contrib/datafusion-table-providers
DataFusion TableProviders for reading data from other systems
https://github.com/datafusion-contrib/datafusion-table-providers
Last synced: 24 days ago
JSON representation
DataFusion TableProviders for reading data from other systems
- Host: GitHub
- URL: https://github.com/datafusion-contrib/datafusion-table-providers
- Owner: datafusion-contrib
- License: apache-2.0
- Created: 2024-04-17T17:16:31.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-29T06:33:45.000Z (about 1 month ago)
- Last Synced: 2025-05-05T04:46:36.916Z (29 days ago)
- Language: Rust
- Size: 1.24 MB
- Stars: 110
- Watchers: 12
- Forks: 38
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-datafusion - Apache DataFusion Table Providers
- awesome-datafusion - Apache DataFusion Table Providers
README
# DataFusion Table Providers
Note: This is not an official Apache Software Foundation project.
The goal of this repo is to extend the capabilities of DataFusion to support additional data sources via implementations of the `TableProvider` trait.
Many of the table providers in this repo are for querying data from other database systems. Those providers also integrate with the [`datafusion-federation`](https://github.com/datafusion-contrib/datafusion-federation/) crate to allow for more efficient query execution, such as pushing down joins between multiple tables from the same database system, or efficiently implementing TopK style queries (`SELECT * FROM table ORDER BY foo LIMIT 10`).
To use these table providers with efficient federation push-down, add the `datafusion-federation` crate and create a DataFusion `SessionContext` using the Federation optimizer rule and query planner with:
```rust
use datafusion::prelude::SessionContext;let state = datafusion_federation::default_session_state();
let ctx = SessionContext::with_state(state);// Register the specific table providers into ctx
// queries will now automatically be federated
```## Table Providers
- PostgreSQL
- MySQL
- SQLite
- DuckDB
- Flight SQL
- ODBC## Examples (in Rust)
Run the included examples to see how to use the table providers:
### DuckDB
```bash
# Read from a table in a DuckDB file
cargo run --example duckdb --features duckdb
# Create an external table backed by DuckDB directly in DataFusion
cargo run --example duckdb_external_table --features duckdb
# Use the result of a DuckDB function call as the source of a table
cargo run --example duckdb_function --features duckdb
```### SQLite
```bash
# Run from repo folder
cargo run --example sqlite --features sqlite
```### Postgres
In order to run the Postgres example, you need to have a Postgres server running. You can use the following command to start a Postgres server in a Docker container the example can use:
```bash
docker run --name postgres -e POSTGRES_PASSWORD=password -e POSTGRES_DB=postgres_db -p 5432:5432 -d postgres:16-alpine
# Wait for the Postgres server to start
sleep 30# Create a table in the Postgres server and insert some data
docker exec -i postgres psql -U postgres -d postgres_db <