{"id":13467789,"url":"https://github.com/crawles/automl_service","last_synced_at":"2025-03-26T03:31:09.449Z","repository":{"id":78684770,"uuid":"99446051","full_name":"crawles/automl_service","owner":"crawles","description":"Deploy AutoML as a service using Flask","archived":false,"fork":false,"pushed_at":"2017-09-16T22:33:27.000Z","size":3615,"stargazers_count":225,"open_issues_count":4,"forks_count":53,"subscribers_count":21,"default_branch":"master","last_synced_at":"2024-10-29T21:59:03.750Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/crawles.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2017-08-05T20:25:21.000Z","updated_at":"2024-05-30T15:41:48.000Z","dependencies_parsed_at":"2023-07-11T15:15:47.145Z","dependency_job_id":null,"html_url":"https://github.com/crawles/automl_service","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crawles%2Fautoml_service","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crawles%2Fautoml_service/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crawles%2Fautoml_service/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crawles%2Fautoml_service/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/crawles","download_url":"https://codeload.github.com/crawles/automl_service/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245584682,"owners_count":20639604,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T15:01:00.570Z","updated_at":"2025-03-26T03:31:09.419Z","avatar_url":"https://github.com/crawles.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook","Examples or singular models","Libraries"],"sub_categories":[],"readme":"# AutoML Service\n\nDeploy automated machine learning (AutoML) as a service using `Flask`, for both pipeline training and pipeline serving. \n\nThe framework implements a fully automated time series classification pipeline, automating both feature engineering and model selection and optimization using Python libraries, `TPOT` and `tsfresh`. \n\nCheck out the [blog post](https://content.pivotal.io/blog/automated-machine-learning-deploying-automl-to-the-cloud) for more info.\n\n\u003cp\u003e\n  \u003cimg src=\"https://github.com/crawles/Logos/blob/master/automl.gif?raw=true\" width = 80%\u003e\n\u003c/p\u003e\n\nResources:\n\n- [TPOT](https://github.com/rhiever/tpot)– Automated feature preprocessing and model optimization tool\n- [tsfresh](https://github.com/blue-yonder/tsfresh)– Automated time series feature engineering and selection\n- [Flask](http://flask.pocoo.org/)– A web development microframework for Python\n \n## Architecture\n\nThe application exposes both model training and model predictions with a RESTful API. For model training, input data and labels are sent via POST request, a pipeline is trained, and model predictions are accessible via a prediction route.\n\nPipelines are stored to a unique key, and thus, live predictions can be made on the same data using different feature construction and modeling pipelines.\n\n\u003cp\u003e\n  \u003cimg src=\"/img/architecture.png?raw=true\" width = 55%\u003e\n\u003c/p\u003e\n\nAn automated pipeline for time-series classification.\n\n\u003cp\u003e\n  \u003cimg src=\"/img/training.png?raw=true\" width = 55%\u003e\n\u003c/p\u003e\nThe model training logic is exposed as a REST endpoint. Raw, labeled training data is uploaded via a POST request and an optimal model is developed.\n\u003cp\u003e\n  \u003cimg src=\"/img/serving.png?raw=true\" width = 55%\u003e\n\u003c/p\u003e\nRaw training data is uploaded via a POST request and a model prediction is returned.\n\n## Using the app\nView the [Jupyter Notebook](https://github.com/crawles/automl_service/blob/master/modelling_and_usage.ipynb) for an example.\n### Deploying\n\n\n```bash\n# deploy locally\npython automl_service.py\n```\n\n```bash\n# deploy on cloud foundry\ncf push\n```\n### Usage\n\nTrain a pipeline:\n\n```python\ntrain_url = 'http://0.0.0.0:8080/train_pipeline'\ntrain_files = {'raw_data': open('data/data_train.json', 'rb'),\n               'labels'  : open('data/label_train.json', 'rb'),\n               'params'  : open('parameters/train_parameters_model2.yml', 'rb')}\n\n# post request to train pipeline\nr_train = requests.post(train_url, files=train_files)\nresult_df = json.loads(r_train.json())\n```\nreturns:\n```python\n{'featureEngParams': {'default_fc_parameters': \"['median', 'minimum', 'standard_deviation', \n                                                 'sum_values', 'variance', 'maximum', \n                                                 'length', 'mean']\",\n                      'impute_function': 'impute',\n                      ...},\n 'mean_cv_accuracy': 0.865,\n 'mean_cv_roc_auc': 0.932,\n 'modelId': 1,\n 'modelType': \"Pipeline(steps=[('stackingestimator', StackingEstimator(estimator=LinearSVC(...))),\n                               ('logisticregression', LogisticRegressionClassifier(solver='liblinear',...))])\"\n 'trainShape': [1647, 8],\n 'trainTime': 1.953}\n ```\n\nServe pipeline predictions:\n```python\nserve_url = 'http://0.0.0.0:8080/serve_prediction'\ntest_files = {'raw_data': open('data/data_test.json', 'rb'),\n              'params' : open('parameters/test_parameters_model2.yml', 'rb')}\n\n# post request to serve predictions from trained pipeline\nr_test  = requests.post(serve_url, files=test_files)\nresult = pd.read_json(r_test.json()).set_index('id')\n```\n\n| example_id    | prediction    |\n| ------------- | ------------- |\n| 1             | 0.853         |\n| 2             | 0.991         |\n| 3             | 0.060         |\n| 4             | 0.995         |\n| 5             | 0.003         |\n| ...           | ...           |\n\nView all trained models:\n\n```python\nr = requests.get('http://0.0.0.0:8080/models')\npipelines = json.loads(r.json())\n```\n\n```python\n{'1':\n    {'mean_cv_accuracy': 0.873,\n     'modelType': \"RandomForestClassifier(...),\n     ...},\n '2':\n    {'mean_cv_accuracy': 0.895,\n     'modelType': \"GradientBoostingClassifier(...),\n     ...},\n '3':\n    {'mean_cv_accuracy': 0.859,\n     'modelType': \"LogisticRegressionClassifier(...),\n     ...},\n...}\n```\n\n## Running the tests\n\nSupply a user argument for the host.\n\n```bash\n# use local app\npy.test --host http://0.0.0.0:8080\n```\n\n```bash\n# use cloud-deployed app\npy.test --host http://ROUTE-HERE\n```\n\n## Scaling the architecture\n\nFor production, I would suggest splitting training and serving into seperate applications, and incorporating a fascade API. Also it would be best to use a shared cache such as Redis or Pivotal Cloud Cache to allow other applications and multiple instances of the pipeline to access the trained model. Here is a potential architecture.\n\n\u003cp\u003e\n  \u003cimg src=\"/img/cloud_architecture.png?raw=true\" width = 55%\u003e\n\u003c/p\u003e\nA scalable model training and model serving architecture.\n\n## Author\n\n`Chris Rawles`\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcrawles%2Fautoml_service","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcrawles%2Fautoml_service","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcrawles%2Fautoml_service/lists"}