https://github.com/srowen/cdsw-simple-serving
Modeling Lifecycle with ACME Occupancy Detection and Cloudera
https://github.com/srowen/cdsw-simple-serving
cloudera cloudera-data-science data-science openscoring pmml workbench
Last synced: 9 months ago
JSON representation
Modeling Lifecycle with ACME Occupancy Detection and Cloudera
- Host: GitHub
- URL: https://github.com/srowen/cdsw-simple-serving
- Owner: srowen
- License: apache-2.0
- Archived: true
- Created: 2017-03-15T16:41:48.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2017-09-06T15:19:48.000Z (almost 9 years ago)
- Last Synced: 2025-01-28T03:35:08.880Z (over 1 year ago)
- Topics: cloudera, cloudera-data-science, data-science, openscoring, pmml, workbench
- Language: Scala
- Homepage:
- Size: 76 MB
- Stars: 14
- Watchers: 5
- Forks: 18
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Modeling Lifecycle with ACME Occupancy Detection and Cloudera
Data science is more than just modeling. The complete data science lifecycle also includes data
engineering and model deployment. This project offers a simplified yet credible example of
all three elements, as implemented using [Apache Spark](http://spark.apache.org), the
[Cloudera Data Science Workbench](https://www.cloudera.com/products/data-science-and-engineering/data-science-workbench.html),
and [JPMML / OpenScoring](https://github.com/openscoring/openscoring).
In this project, the ACME corporation is productionizing a connected-house platform. Part of this
service requires predicting the occupancy of a room given sensor readings.
This example project includes simplified examples of:
- Data Engineering
- Ingest
- Cleaning
- Data Science
- Modeling
- Tuning and evaluation
- Model Serving
- Model management
- Testing
- REST API
## Requirements
- [Cloudera Data Science Workbench 1.0](https://www.cloudera.com/products/data-science-and-engineering/data-science-workbench.html)
- CDH 5.10+ cluster
- [Spark 2.1 CSD](https://www.cloudera.com/downloads/spark2/2-1.html) for CDH
- [Apache Maven](https://maven.apache.org) 3.2+
## Get Started
To continue, review documentation for each of the three modules, which contains more information
about what it show and how to run it.
- [Data Engineering](acme-dataeng/)
- [Data Science](acme-datasci/)
- [Model Serving](acme-serving/)
[](https://travis-ci.org/srowen/cdsw-simple-serving)