https://github.com/benitomartin/mlops-aws-stroke
MLOps Stroke Prediction
https://github.com/benitomartin/mlops-aws-stroke
aws aws-ecr aws-lambda aws-s3 docker flask-api jupyter-notebook python sagemaker
Last synced: 8 months ago
JSON representation
MLOps Stroke Prediction
- Host: GitHub
- URL: https://github.com/benitomartin/mlops-aws-stroke
- Owner: benitomartin
- Created: 2024-05-27T10:30:32.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-05T06:58:25.000Z (over 1 year ago)
- Last Synced: 2024-12-31T14:28:46.016Z (9 months ago)
- Topics: aws, aws-ecr, aws-lambda, aws-s3, docker, flask-api, jupyter-notebook, python, sagemaker
- Language: Jupyter Notebook
- Homepage:
- Size: 905 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MLOPS STROKE PREDICTION ⚱️
![]()
This is a personal MLOps project based on a [Kaggle](https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset) dataset for stroke prediction.
Feel free to ⭐ and clone this repo 😉
## Tech Stack












## Project Structure
The project has been structured with the following folders and files:
- `data:` raw and clean data
- `src:` source code. It is divided into:
- Notebooks with EDA, Baseline Model and AWS Pipelines incl. unit testing
- `code_scripts`: processing, training, evaluation, docker container, serving and lambda
- `requirements.txt:` project requirements## Project Description
The dataset was obtained from Kaggle and contains 5110 rows and 10 columns to detect stroke predictions. To prepare the data for modelling, an **Exploratory Data Analysis** was conducted where it was detected that the dataset is very imbalance (95% no stroke, 5% stroke). For modeling, the categorical features where encoded, XGBoost was use das model and the best roc-auc threshold was selected for the predictions using aditionally threshold-moving for the predictions due to the imbalance. The learning rate was tuned in order to find the best one on the deployed model.
![]()
## Pipeline Deployment
All pipelines where deployed on AWS SageMaker, as well as the Model Registry and Endpoints. The following pipelines where created:
- ✅ Preprocessing
- ✅ Training
- ✅ Tuning
- ✅ Evaluation
- ✅ Model Registry
- ✅ Model Conditional Registry
- ✅ DeploymentAdditionally the experiments were tracked on Comel ML.
![]()