https://github.com/combust/pachyderm-mleap-demo
Pachyderm/MLeap team up to provide versioned datasets + models
https://github.com/combust/pachyderm-mleap-demo
Last synced: about 1 year ago
JSON representation
Pachyderm/MLeap team up to provide versioned datasets + models
- Host: GitHub
- URL: https://github.com/combust/pachyderm-mleap-demo
- Owner: combust
- Created: 2017-02-16T19:20:20.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2017-06-07T15:19:38.000Z (about 9 years ago)
- Last Synced: 2025-03-24T16:42:23.609Z (over 1 year ago)
- Language: Scala
- Size: 29.3 KB
- Stars: 10
- Watchers: 4
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Pachyderm/MLeap Demo
This is the codebase to support the Pachyderm/MLeap training and scoring
demo. It is used to generate the Docker images used by the demo.
## Docker Image
The Docker images are located here:
1. [Training Image](https://hub.docker.com/r/combustml/pmd-training/)
2. [Scoring Image](https://hub.docker.com/r/combustml/pmd-scoring/)
## Building Locally
Build the Docker image locally with SBT.
1. Install SBT with these [instructions](http://www.scala-sbt.org/0.13/docs/Setup.html)
2. Make sure docker is running
3. Use SBT to publish the image locally
```
sbt training/docker:publishLocal
sbt scoring/docker:publishLocal
```
This will publish two docker images named `combustml/pmd-training:0.1-SNAPSHOT` and
`combustml/pmd-scoring:0.1-SNAPSHOT`.
## Training
Download the Airbnb training dataset here: [airbnb.clean.avro](https://s3-us-west-2.amazonaws.com/mleap-demo/datasources/airbnb.clean.avro).
```
docker run -v /tmp/pmd-in:/data-in \
-v /tmp/pmd-out:/data-training-out combustml/pmd-training:0.1-SNAPSHOT airbnb \
-t random-forest \ # train a random forest model
-i file:///data-in/airbnb.clean.avro \ # input airbnb dataset
-o /data-out/model.zip \ # set the output location of the model file
-s /data-out/summary.txt \ # output path for model summary
-J-Xmx2048m # make sure Spark has enough memory
```
## Scoring
```
docker run -v /tmp/pmd-out:/data-in1 \
-v /tmp/pmd-training-in:/data-int2 \
-v /tmp/pmd-scoring-out:/data-out combustml/pmd-scoring:0.1-SNAPSHOT \
-m /data-in1/model.zip \
-i /data-in2/good.avro \
-o /data-out/test-docker.avro \
-J-Xmx2048m
```