https://github.com/codingcat/xgboost4j-spark-scalability
a benchmark to test scalability of xgboost4j-spark and relevant projects
https://github.com/codingcat/xgboost4j-spark-scalability
Last synced: 8 months ago
JSON representation
a benchmark to test scalability of xgboost4j-spark and relevant projects
- Host: GitHub
- URL: https://github.com/codingcat/xgboost4j-spark-scalability
- Owner: CodingCat
- Created: 2017-04-05T02:42:17.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2019-12-20T03:24:48.000Z (over 6 years ago)
- Last Synced: 2025-03-25T15:15:05.588Z (about 1 year ago)
- Language: Scala
- Homepage:
- Size: 30.4 MB
- Stars: 22
- Watchers: 6
- Forks: 9
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# xgboost4j-spark-scalability
a benchmark to test scalability of xgboost4j-spark and relevant projects
## Prerequestes
You have to ensure that maven (3.0+) and cmake is installed in your $PATH
## Build Benchmark
1. Edit build/build.sh and define variables like TARGET_URL, TARGET_BRANCH
2. run build/build.sh
3. You get the benchmark jar in target/
## Run Benchmarks
1. Generate Data:
```bash
spark-submit --master yarn-cluster --num-executors 10 --executor-memory 6g --executor-cores 8 \
--class me.codingcat.xgboost4j.AirlineDataGenerator --files conf/airline_datagen.conf \
target/scala-2.11/xgboost4j-spark-scalability-assembly-0.1-SNAPSHOT.jar ./airline_datagen.conf
```
2. Run workload:
```bash
spark-submit --master yarn-cluster --num-executors 10 --executor-memory 6g --executor-cores 8 \
--class me.codingcat.xgboost4j.AirlineClassifier --files conf/airline.conf \
target/scala-2.11/xgboost4j-spark-scalability-assembly-0.1-SNAPSHOT.jar ./airline.conf
```