https://github.com/datadrivers/effective-guide-mlops
Example end-to-end ml pipeline build with the Sagemaker Python SDK
https://github.com/datadrivers/effective-guide-mlops
aws aws-api-gateway aws-apigateway aws-sagemaker data-science deep-learning machine-learning mlops mlops-environment mlops-workflow python scikit-learn scikitlearn-machine-learning
Last synced: 3 months ago
JSON representation
Example end-to-end ml pipeline build with the Sagemaker Python SDK
- Host: GitHub
- URL: https://github.com/datadrivers/effective-guide-mlops
- Owner: datadrivers
- License: apache-2.0
- Created: 2021-08-24T08:52:20.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2021-12-16T14:36:11.000Z (over 4 years ago)
- Last Synced: 2025-09-06T05:40:52.253Z (9 months ago)
- Topics: aws, aws-api-gateway, aws-apigateway, aws-sagemaker, data-science, deep-learning, machine-learning, mlops, mlops-environment, mlops-workflow, python, scikit-learn, scikitlearn-machine-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 490 KB
- Stars: 4
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# effective-guide-mlops
#### End-to-end machine learning pipeline with Sagemaker Python SDK
This repository provides an example end-to-end machine learning pipeline on AWS build using the Sagemaker Python SDK. It leans on other resources (e.g. [here](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker_processing/scikit_learn_data_processing_and_model_evaluation/scikit_learn_data_processing_and_model_evaluation.ipynb) and [here](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-deployment.html)), however, it provides a unified end-to-end example in a notebook from data processing to deployment of a REST API. This not production ready, but it will give you a good primary intuition how to orchestrate the ml lifecycle on AWS via the Sagemaker SDK.
The main ressource for this guid is the notebook `ml_pipeline.ipynb` in the folder `notebooks`. The easiest way to follow along the tutorial would be to launch a notebook instance on AWS Sagemaker and pull the repository into your jupyterlab environment.
### 1. Data
The [Penguin Dataset](https://allisonhorst.github.io/palmerpenguins/articles/intro.html) from Alison Horst is an alternative to the famous iris dataset that can be used for demonstrating various ml tasks.
Read more [here](https://allisonhorst.github.io/palmerpenguins/articles/intro.html).

| | species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
|----|-----------|-----------|------------------|-----------------|---------------------|---------------|--------|--------|
| 1 | Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | male | 2007 |
| 2 | Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | female | 2007 |
| 3 | Adelie | Torgersen | 40.3 | 18 | 195 | 3250 | female | 2007 |
### 2. Objective
The goal is to train a classifier that predicts the sex/gender of a penguin based on all other variables available.
### 3. Ressources
##### Notebooks:
- stored in `/notebooks`
- `eda.ipynb` visual exploration of the data
- `ml_pipeline.ipynb` orchestrates preprocessing of the data, model training and deployment of the model as endpoint
### 4 Tutorial Wolkthrough
- head over to `notebooks.ml_pipeline.ipynb` and follow the procedure