https://github.com/datafusion-contrib/datafusion-tpch
Native Rust TPCH support for Datafusion using tpchgen
https://github.com/datafusion-contrib/datafusion-tpch
databases datafusion datafusion-testing tpch tpchgen-rs
Last synced: 3 months ago
JSON representation
Native Rust TPCH support for Datafusion using tpchgen
- Host: GitHub
- URL: https://github.com/datafusion-contrib/datafusion-tpch
- Owner: datafusion-contrib
- License: apache-2.0
- Created: 2025-04-19T19:04:26.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2026-03-24T23:39:11.000Z (3 months ago)
- Last Synced: 2026-03-26T05:13:46.486Z (3 months ago)
- Topics: databases, datafusion, datafusion-testing, tpch, tpchgen-rs
- Language: Rust
- Homepage:
- Size: 61.5 KB
- Stars: 3
- Watchers: 2
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# datafusion-tpch
[![Apache licensed][license-badge]][license-url]
[![Build Status][actions-badge]][actions-url]
[license-badge]: https://img.shields.io/badge/license-Apache%20v2-blue.svg
[license-url]: https://github.com/clflushopt/datafusion-tpch/blob/main/LICENSE
[actions-badge]: https://github.com/clflushopt/datafusion-tpch/actions/workflows/rust.yml/badge.svg
[actions-url]: https://github.com/clflushopt/datafusion-tpch/actions?query=branch%3Amain
Note: This is not an official Apache Software Foundation release.
This crate provides functions to generate the TPCH benchmark dataset for Datafusion
using the [tpchgen](https://github.com/clflushopt/tpchgen-rs) crates.
## Usage
The `datafusion-tpch` crate offers two possible ways to register the TPCH table
functions.
You can register the individual udtfs separately.
```rust
use datafusion_tpch::register_tpch_udtfs;
#[tokio::main]
async fn main() -> Result<()> {
// create local execution context
let ctx = SessionContext::new();
// Register all the UDTFs.
register_tpch_udtfs(&ctx);
// Generate the nation table with a scale factor of 1.
let df = ctx
.sql(format!("SELECT * FROM tpch_nation(1.0);").as_str())
.await?;
df.show().await?;
Ok(())
}
```
Or you can register a single UDTF which generates all tables at once.
```rust
use datafusion_tpch::register_tpch_udtfs;
#[tokio::main]
async fn main() -> Result<()> {
// create local execution context
let ctx = SessionContext::new();
// Register all the UDTFs.
register_tpch_udtf(&ctx);
// Generate the nation table with a scale factor of 1.
let df = ctx
.sql(format!("SELECT * FROM tpch(1.0);").as_str())
.await?;
df.show().await?;
Ok(())
}
```
## Examples
To keep things simple we don't bundle writing to parquet in the table provider
but instead defer that to the user who can use the `COPY` command.
```rust
use datafusion::prelude::{SessionConfig, SessionContext};
use datafusion_tpch::{register_tpch_udtf, register_tpch_udtfs};
#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
let ctx = SessionContext::new_with_config(SessionConfig::new().with_information_schema(true));
register_tpch_udtf(&ctx);
let sql_df = ctx.sql(&format!("SELECT * FROM tpch(1.0);")).await?;
sql_df.show().await?;
let sql_df = ctx.sql(&format!("SHOW TABLES;")).await?;
sql_df.show().await?;
let sql_df = ctx
.sql(&format!(
"COPY nation TO './tpch_nation.parquet' STORED AS PARQUET"
))
.await?;
sql_df.show().await?;
register_tpch_udtfs(&ctx)?;
let sql_df = ctx
.sql(&format!(
"COPY (SELECT * FROM tpch_lineitem(1.0)) TO './tpch_lineitem_sf_10.parquet' STORED AS PARQUET"
))
.await?;
sql_df.show().await?;
Ok(())
}
```
You can find other examples in the [examples](examples/) directory.
### Running Examples
To quickly see the Parquet example in action, you can run the provided example directly from your terminal:
```bash
cargo run --example parquet
```
## License
The project is licensed under the [APACHE 2.0](LICENSE) license.