https://github.com/crawles/automl_service
Deploy AutoML as a service using Flask
https://github.com/crawles/automl_service
Last synced: 7 months ago
JSON representation
Deploy AutoML as a service using Flask
- Host: GitHub
- URL: https://github.com/crawles/automl_service
- Owner: crawles
- License: mit
- Created: 2017-08-05T20:25:21.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2017-09-16T22:33:27.000Z (about 8 years ago)
- Last Synced: 2024-10-29T21:59:03.750Z (12 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 3.45 MB
- Stars: 225
- Watchers: 21
- Forks: 53
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
- awesome_time_series_in_python - automl_service
README
# AutoML Service
Deploy automated machine learning (AutoML) as a service using `Flask`, for both pipeline training and pipeline serving.
The framework implements a fully automated time series classification pipeline, automating both feature engineering and model selection and optimization using Python libraries, `TPOT` and `tsfresh`.
Check out the [blog post](https://content.pivotal.io/blog/automated-machine-learning-deploying-automl-to-the-cloud) for more info.
Resources:
- [TPOT](https://github.com/rhiever/tpot)– Automated feature preprocessing and model optimization tool
- [tsfresh](https://github.com/blue-yonder/tsfresh)– Automated time series feature engineering and selection
- [Flask](http://flask.pocoo.org/)– A web development microframework for Python
## Architecture
The application exposes both model training and model predictions with a RESTful API. For model training, input data and labels are sent via POST request, a pipeline is trained, and model predictions are accessible via a prediction route.
Pipelines are stored to a unique key, and thus, live predictions can be made on the same data using different feature construction and modeling pipelines.
An automated pipeline for time-series classification.
The model training logic is exposed as a REST endpoint. Raw, labeled training data is uploaded via a POST request and an optimal model is developed.
Raw training data is uploaded via a POST request and a model prediction is returned.
## Using the app
View the [Jupyter Notebook](https://github.com/crawles/automl_service/blob/master/modelling_and_usage.ipynb) for an example.
### Deploying
```bash
# deploy locally
python automl_service.py
```
```bash
# deploy on cloud foundry
cf push
```
### Usage
Train a pipeline:
```python
train_url = 'http://0.0.0.0:8080/train_pipeline'
train_files = {'raw_data': open('data/data_train.json', 'rb'),
'labels' : open('data/label_train.json', 'rb'),
'params' : open('parameters/train_parameters_model2.yml', 'rb')}
# post request to train pipeline
r_train = requests.post(train_url, files=train_files)
result_df = json.loads(r_train.json())
```
returns:
```python
{'featureEngParams': {'default_fc_parameters': "['median', 'minimum', 'standard_deviation',
'sum_values', 'variance', 'maximum',
'length', 'mean']",
'impute_function': 'impute',
...},
'mean_cv_accuracy': 0.865,
'mean_cv_roc_auc': 0.932,
'modelId': 1,
'modelType': "Pipeline(steps=[('stackingestimator', StackingEstimator(estimator=LinearSVC(...))),
('logisticregression', LogisticRegressionClassifier(solver='liblinear',...))])"
'trainShape': [1647, 8],
'trainTime': 1.953}
```
Serve pipeline predictions:
```python
serve_url = 'http://0.0.0.0:8080/serve_prediction'
test_files = {'raw_data': open('data/data_test.json', 'rb'),
'params' : open('parameters/test_parameters_model2.yml', 'rb')}
# post request to serve predictions from trained pipeline
r_test = requests.post(serve_url, files=test_files)
result = pd.read_json(r_test.json()).set_index('id')
```
| example_id | prediction |
| ------------- | ------------- |
| 1 | 0.853 |
| 2 | 0.991 |
| 3 | 0.060 |
| 4 | 0.995 |
| 5 | 0.003 |
| ... | ... |
View all trained models:
```python
r = requests.get('http://0.0.0.0:8080/models')
pipelines = json.loads(r.json())
```
```python
{'1':
{'mean_cv_accuracy': 0.873,
'modelType': "RandomForestClassifier(...),
...},
'2':
{'mean_cv_accuracy': 0.895,
'modelType': "GradientBoostingClassifier(...),
...},
'3':
{'mean_cv_accuracy': 0.859,
'modelType': "LogisticRegressionClassifier(...),
...},
...}
```
## Running the tests
Supply a user argument for the host.
```bash
# use local app
py.test --host http://0.0.0.0:8080
```
```bash
# use cloud-deployed app
py.test --host http://ROUTE-HERE
```
## Scaling the architecture
For production, I would suggest splitting training and serving into seperate applications, and incorporating a fascade API. Also it would be best to use a shared cache such as Redis or Pivotal Cloud Cache to allow other applications and multiple instances of the pipeline to access the trained model. Here is a potential architecture.
A scalable model training and model serving architecture.
## Author
`Chris Rawles`