Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/camm93/waterqualitysystem
End-to-End Machine Learning MLOps Project
https://github.com/camm93/waterqualitysystem
ci-cd docker fastapi github-actions machine-learning python render
Last synced: 12 days ago
JSON representation
End-to-End Machine Learning MLOps Project
- Host: GitHub
- URL: https://github.com/camm93/waterqualitysystem
- Owner: camm93
- Created: 2024-12-18T02:00:04.000Z (24 days ago)
- Default Branch: main
- Last Pushed: 2024-12-28T17:10:18.000Z (14 days ago)
- Last Synced: 2024-12-28T18:18:51.613Z (14 days ago)
- Topics: ci-cd, docker, fastapi, github-actions, machine-learning, python, render
- Language: Jupyter Notebook
- Homepage:
- Size: 2.14 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README
# π End-to-End Machine Learning MLOps Project
**Building, Deploying, and Managing a Machine Learning Pipeline with CI/CD, Docker, and Cloud**## π― **Project Overview**
This project demonstrates the implementation of a complete Machine Learning Operations (MLOps) pipeline by building and deploying a classifier to predict water quality as **drinkable** or **not drinkable** based on various water properties (e.g., pH, Turbidity, Chloramines).During the model selection phase, **Logistic Regression** and **K-Nearest Neighbors (KNN)** models were trained and evaluated alongside a **Decision Tree Classifier**, with the Decision Tree ultimately selected for deployment based on its performance and interpretability.
This project integrates **Exploratory Data Analysis**, **model training**, **model selection and hyperparameter tuning**, **deployment**, **containerization**, **CI/CD automation**, and **cloud deployment**, simulating a real-world production environment.
## π οΈ **Tools and Technologies Used**
| **Category** | **Tools** |
|----------------------------|-----------------------------------------|
| **Programming** | Python (numpy, pandas, scikit-learn) |
| **Model Training** | LogisticRegressionClassifier, KNN, **DecisionTreeClassifier** (scikit-learn) |
| **Data Preprocessing** | Pipelines (Imputation, Scaling) |
| **API Development** | FastAPI |
| **Containerization** | Docker, DockerHub |
| **CI/CD** | GitHub Actions |
| **Cloud Deployment** | Render |
| **Version Control** | Git, GitHub |
| **Security** | GitHub Secrets, DockerHub Access Tokens|## π **Project Workflow**
### πΉ **1οΈβ£ Data Preprocessing**
- Imported dataset, handled missing values using **mean imputation**.
- Built a **data preprocessing pipeline** to automate feature scaling and transformation.### πΉ **2οΈβ£ Model Training**
- Trained a **Decision Tree Classifier** for binary classification.
- Fine-tuned hyperparameters (`max_depth`, `min_samples_split`) using grid search.
- Saved the trained model as a serialized file (`.pkl`) using `joblib`.### πΉ **3οΈβ£ API Development**
- Built a **FastAPI application** to serve the trained model.
- Exposed a `/predict` endpoint for predictions.
- API accepts JSON input, preprocesses the data, and returns predictions in real-time.### πΉ **4οΈβ£ Dockerization**
- Packaged the application into a **Docker container** for easy deployment.
- Created a `Dockerfile` to ensure the app runs consistently across environments.
- Pushed the container image to **DockerHub** for reuse and deployment.### πΉ **5οΈβ£ CI/CD Pipeline**
- Configured **GitHub Actions** to automate:
- Building the Docker image.
- Pushing the image to DockerHub.
- Triggering redeployment on Render.### πΉ **6οΈβ£ Cloud Deployment**
- Deployed the containerized application to **Render**, a cloud hosting platform.
- Configured environment variables for security (e.g., DockerHub credentials).
- Monitored application logs for debugging and performance insights.## π **Key Features**
- **End-to-End MLOps Workflow**: Covers every step from data preprocessing to model deployment.
- **Cloud Deployment**: Real-world deployment using **Render**.
- **Reproducibility**: Automated CI/CD ensures consistent builds and deployments.
- **Scalability**: Dockerized application enables horizontal scaling and portability.
- **Security**: Managed sensitive credentials with GitHub Secrets and DockerHub tokens.## π οΈ **How to Run the Project Locally**
### πΉ **1οΈβ£ Clone the Repository**
```bash
git clone https://github.com/camm93/WaterQualitySystem.gitcd WaterQualitySystem
```### πΉ **2οΈβ£ Build and Run the Docker Container**
```bash
docker build -t WaterQualitySystem .docker run -d -p 8000:8000 WaterQualitySystem
```### πΉ **3οΈβ£ Test the API**
Use **Postman** or `curl` to test the `/predict` endpoint.Example `curl` command:
```bash
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{
"pH": 7.13,
"Dureza": 173.69,
"SΓ³lidos": 19309.57,
"Cloraminas": 6.53,
"Sulfatos": 372.54,
"Conductividad": 295.39,
"Carbono_orgΓ‘nico": 7.27,
"Trihalometanos": 88.79,
"Turbidez": 3.40
}'
```Expected Response:
```json
{
"prediction": "NO",
"probability": [
0.9414033798677441,
0.058596620132255695
]
}
```## π οΈ Folder Structure
```
.
βββ app.py # FastAPI application
βββ model.pkl # Serialized Decision Tree model
βββ Dockerfile # Docker container configuration
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
βββ .github/workflows # CI/CD configuration
βββ deploy.yml # GitHub Actions workflow
```
---## π To Recap
- **Model Deployment**: Transitioning from Jupyter notebooks to a production-ready API.
- **Docker**: Containerizing applications for consistent deployments.
- **CI/CD**: Automating builds, tests, and deployments using GitHub Actions.
- **Cloud Hosting**: Deploying machine learning APIs to cloud platforms.
- **Security**: Managing secrets and sensitive credentials for DockerHub and Render.
---## π Potential Improvements
1. Add model monitoring (e.g., API latency, prediction drift) using Prometheus + Grafana.
2. Incorporate MLflow for experiment tracking and model versioning.
3. Deploy the app to AWS (EC2, Lambda) or GCP for real-world scalability.
4. Add unit tests for API endpoints using pytest.
5. Enhance the CI/CD pipeline with rollback mechanisms and more extensive test coverage.---
## π¬ Contact
For questions or collaboration opportunities, feel free to reach out:- **Email**: crismur_93hotmail.com
- **LinkedIn**: [Cristian Murillo](https://www.linkedin.com/in/cristianmurillom/)