https://github.com/clflushopt/tpcdsgen
WIP (out of tree) Rust implementation of TPC-DS generators.
https://github.com/clflushopt/tpcdsgen
Last synced: 4 months ago
JSON representation
WIP (out of tree) Rust implementation of TPC-DS generators.
- Host: GitHub
- URL: https://github.com/clflushopt/tpcdsgen
- Owner: clflushopt
- Created: 2025-10-01T23:57:19.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-12-20T21:47:10.000Z (5 months ago)
- Last Synced: 2025-12-21T22:54:43.043Z (5 months ago)
- Language: Rust
- Size: 614 KB
- Stars: 10
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# tpcds but it's Rust instead
This is a WIP port of [Trino's TPCDS](https://github.com/trinodb/tpcds) to Rust that is slowly
taking shape. It is developed out of tree, for now, but will end up as part of
the [tpchgen](https://github.com/clflushopt/tpchgen-rs) once I am satisfied with the port and
probably rewrite a lot of it to be more idiomatic Rust instead of the current Java-ism OOP heavy
mess it is right now.
Currently all 25 tables have been ported with byte-for-byte compatibility verified against the Java
reference implementation.
## Usage
```bash
# Build the generator
cargo build --release
# Generate all tables at scale factor 1 (default)
./target/release/tpcdsgen
# Generate all tables at scale factor 10
./target/release/tpcdsgen --scale 10
# Generate specific table
./target/release/tpcdsgen --table store_sales --scale 10
# Generate to a specific directory
./target/release/tpcdsgen --scale 10 --directory /path/to/output
```
## Generating Fixtures
Fixtures are pre-generated TPC-DS data files used for conformance testing.
### Directory Structure
```
tests/fixtures/
├── java/ # Java reference implementation output
│ ├── scale-1/ # 25 tables, ~1.2GB
│ └── scale-10/ # 25 tables, ~11GB
└── rust/ # Rust implementation output
├── scale-1/ # 25 tables, ~1.2GB
└── scale-10/ # 25 tables, ~11GB
```
### Generating Java Fixtures
Requires the Java TPC-DS implementation to be built:
```bash
# Build Java implementation (if not already built)
cd ../tpcds && mvn clean package -DskipTests && cd -
# Generate Java fixtures for scale 1
java -jar ../tpcds/target/tpcds-1.5-SNAPSHOT-jar-with-dependencies.jar \
--scale 1 \
--directory tests/fixtures/java/scale-1 \
--overwrite
# Generate Java fixtures for scale 10
java -jar ../tpcds/target/tpcds-1.5-SNAPSHOT-jar-with-dependencies.jar \
--scale 10 \
--directory tests/fixtures/java/scale-10 \
--overwrite
```
### Generating Rust Fixtures
```bash
# Build Rust implementation
cargo build --release
# Generate Rust fixtures for scale 1
./target/release/tpcdsgen --scale 1 --directory tests/fixtures/rust/scale-1
# Generate Rust fixtures for scale 10
./target/release/tpcdsgen --scale 10 --directory tests/fixtures/rust/scale-10
```
### Conformance Testing
To verify Rust output matches Java byte-for-byte:
```bash
# Run conformance tests at scale 1
./scripts/test-all-tables.sh --scale 1
# Run conformance tests at scale 10
./scripts/test-all-tables.sh --scale 10
```
See [HASHES.md](HASHES.md) for the canonical MD5 hashes.
### Verifying Fixtures with MD5SUMS
Each fixture directory contains an `MD5SUMS` file for verification.
**On Linux:**
```bash
cd tests/fixtures/java/scale-1
md5sum -c MD5SUMS
```
**On macOS:**
```bash
cd tests/fixtures/java/scale-1
while read hash file; do
[[ $(md5 -q "$file") == "$hash" ]] && echo "$file: OK" || echo "$file: FAILED"
done < MD5SUMS
```
## Known Bugs
The TPC-DS reference implementation contains several bugs that must be replicated for benchmark compliance.
These bugs originated in the C implementation and were faithfully reproduced in the Java port. Our Rust implementation
also replicates these bugs to ensure byte-for-byte compatibility with the reference implementation.
See [BUGS.md](BUGS.md) for a detailed list of documented bugs, more will be added.
## TPC-DS Reference MD5 Hashes
These are the canonical MD5 hashes for TPC-DS data generated by the Java reference implementation.
The Rust implementation must produce byte-for-byte identical output.
## Scale 1
Generated with: `java -jar tpcds-*.jar --scale 1`
| Table | MD5 Hash |
|-------|----------|
| call_center.dat | `cc9aabc63eb8603bd7330b6735ed0961` |
| catalog_page.dat | `0bbac1b8bdcf8ce2d5f0034980ee0196` |
| catalog_returns.dat | `8460b5abd6b6ceaf6107f217b016fb23` |
| catalog_sales.dat | `51a0bc401b4b64d94736634b54068240` |
| customer.dat | `3672ffdefac3cf00413ecef71a753636` |
| customer_address.dat | `abac2e3925ab9bf66cec3b527a0468ed` |
| customer_demographics.dat | `8831872c6d56ea9d4f24701f2feaef48` |
| date_dim.dat | `f3e77714328dcc57302777e72fd7747c` |
| dbgen_version.dat | `a430da74c2e44926c53deb74e35b23f1` * |
| household_demographics.dat | `dccf2ff17c5e420021fbf92bf9a0a5ec` |
| income_band.dat | `db8e8012be51ef81cf215774bec95533` |
| inventory.dat | `cfefc8724693ec9149f1d5b345fcecc2` |
| item.dat | `bebbcfd1acecdea16a5a3feb5e4deb96` |
| promotion.dat | `acb42558d0dc5e0ab6df5a664c1629cf` |
| reason.dat | `57fe9b8688095bd345cc846ec4400be0` |
| ship_mode.dat | `791d16af982a67ad170a6b6527e25a35` |
| store.dat | `80082d03e1b01340e19db3187d8edbd6` |
| store_returns.dat | `9009d804c02ee839e0b2ecd5fb4ae03f` |
| store_sales.dat | `f003b3810e042d6dd47f48506616d88d` |
| time_dim.dat | `a68339c5720d25380b53f6e0f2f72333` |
| warehouse.dat | `f56789e8b724b989d74e213e0686052f` |
| web_page.dat | `6feef91675c336d6f25e55ebbdf8c13c` |
| web_returns.dat | `e45390d32d1698fef71f05f474a4d748` |
| web_sales.dat | `15f9d835727f3a39a096c346f56e51f7` |
| web_site.dat | `de5fb00a80673cb44b4b508da75d4bcf` |
## Scale 10
Generated with: `java -jar tpcds-*.jar --scale 10`
| Table | MD5 Hash |
|-------|----------|
| call_center.dat | `235909679f4d125e769aa38eb16e9098` |
| catalog_page.dat | `a5daa0d93ecde8bd9f6ed79cd3b63916` |
| catalog_returns.dat | `982a8b96fa0d9487015cd137136c8f68` |
| catalog_sales.dat | `97d5351b430d6c15e3906518315f0787` |
| customer.dat | `486a030a55d468ef15ff2ff01583e6dc` |
| customer_address.dat | `860602fea368111009ef08b167e1e299` |
| customer_demographics.dat | `8831872c6d56ea9d4f24701f2feaef48` |
| date_dim.dat | `f3e77714328dcc57302777e72fd7747c` |
| dbgen_version.dat | `8553e926c33f4ad84e4d58fcfd20c48c` * |
| household_demographics.dat | `dccf2ff17c5e420021fbf92bf9a0a5ec` |
| income_band.dat | `db8e8012be51ef81cf215774bec95533` |
| inventory.dat | `4ad3640917c6567038f081bbe2cf0e3e` |
| item.dat | `bff29691c74ae66eb2dcc3af686fb2ba` |
| promotion.dat | `b8e8a7741f64edc5d09fdb0453c86705` |
| reason.dat | `a1fdcd35ca0eddd0d5f37b0e5c2fddb3` |
| ship_mode.dat | `791d16af982a67ad170a6b6527e25a35` |
| store.dat | `430a01467a2d55d0e9a1bebad4f1c44b` |
| store_returns.dat | `4ba001a6066db20066cd198242f92ca1` |
| store_sales.dat | `ecff92350fa0466e9b9407a1b5ad4020` |
| time_dim.dat | `a68339c5720d25380b53f6e0f2f72333` |
| warehouse.dat | `e0c56fe622774d09c9dec42029881ad5` |
| web_page.dat | `e55695fdb2b86f96cf46e2a55b6f3748` |
| web_returns.dat | `ac0197593d3f4cc3bb46c8ad7e6cd735` |
| web_sales.dat | `4da375300bcb0ce8785e1f100fb72efe` |
| web_site.dat | `4669d52e36cd112af10e137e5d8d7697` |
\* `dbgen_version.dat` contains timestamps and will differ between runs.
## Verification
To verify the Rust implementation matches:
```bash
# Verify at scale 1
./scripts/test-all-tables.sh --scale 1
# Verify at scale 10
./scripts/test-all-tables.sh --scale 10
```