https://github.com/distributedsystemsgroup/spark-tpc-ds
Spark job for the TPC-DS benchmark
https://github.com/distributedsystemsgroup/spark-tpc-ds
Last synced: 11 months ago
JSON representation
Spark job for the TPC-DS benchmark
- Host: GitHub
- URL: https://github.com/distributedsystemsgroup/spark-tpc-ds
- Owner: DistributedSystemsGroup
- License: apache-2.0
- Created: 2015-05-06T15:25:10.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2015-06-15T15:30:15.000Z (about 11 years ago)
- Last Synced: 2025-05-27T07:03:13.817Z (about 1 year ago)
- Language: Shell
- Size: 380 KB
- Stars: 7
- Watchers: 8
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Spark-TPC-DS
Spark job for the TPC-DS benchmark.
This code uses this library from Databricks: https://github.com/databricks/spark-sql-perf
To compile put the jar compiled from the above library in lib/ and then run `build/sbt assembly`
To execute the following arguments must be provided:
1. HDFS data location ("/user/test/tpcds-data")
2. scale factor (10)
3. HDFS result location ("/user/test/tpcds-results")
4. N. iterations
5. query to execute:
- impalakit
- interactive
- reporting
- deepAnalytics
- simple
6. dsdgenDir