Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/camm93/waterqualitysystem

End-to-End Machine Learning MLOps Project
https://github.com/camm93/waterqualitysystem

ci-cd docker fastapi github-actions machine-learning python render

Last synced: 12 days ago
JSON representation

End-to-End Machine Learning MLOps Project

Awesome Lists containing this project

README

        

# πŸš€ End-to-End Machine Learning MLOps Project
**Building, Deploying, and Managing a Machine Learning Pipeline with CI/CD, Docker, and Cloud**

## 🎯 **Project Overview**
This project demonstrates the implementation of a complete Machine Learning Operations (MLOps) pipeline by building and deploying a classifier to predict water quality as **drinkable** or **not drinkable** based on various water properties (e.g., pH, Turbidity, Chloramines).

During the model selection phase, **Logistic Regression** and **K-Nearest Neighbors (KNN)** models were trained and evaluated alongside a **Decision Tree Classifier**, with the Decision Tree ultimately selected for deployment based on its performance and interpretability.

This project integrates **Exploratory Data Analysis**, **model training**, **model selection and hyperparameter tuning**, **deployment**, **containerization**, **CI/CD automation**, and **cloud deployment**, simulating a real-world production environment.

## πŸ› οΈ **Tools and Technologies Used**
| **Category** | **Tools** |
|----------------------------|-----------------------------------------|
| **Programming** | Python (numpy, pandas, scikit-learn) |
| **Model Training** | LogisticRegressionClassifier, KNN, **DecisionTreeClassifier** (scikit-learn) |
| **Data Preprocessing** | Pipelines (Imputation, Scaling) |
| **API Development** | FastAPI |
| **Containerization** | Docker, DockerHub |
| **CI/CD** | GitHub Actions |
| **Cloud Deployment** | Render |
| **Version Control** | Git, GitHub |
| **Security** | GitHub Secrets, DockerHub Access Tokens|

## πŸ“ **Project Workflow**

### πŸ”Ή **1️⃣ Data Preprocessing**
- Imported dataset, handled missing values using **mean imputation**.
- Built a **data preprocessing pipeline** to automate feature scaling and transformation.

### πŸ”Ή **2️⃣ Model Training**
- Trained a **Decision Tree Classifier** for binary classification.
- Fine-tuned hyperparameters (`max_depth`, `min_samples_split`) using grid search.
- Saved the trained model as a serialized file (`.pkl`) using `joblib`.

### πŸ”Ή **3️⃣ API Development**
- Built a **FastAPI application** to serve the trained model.
- Exposed a `/predict` endpoint for predictions.
- API accepts JSON input, preprocesses the data, and returns predictions in real-time.

### πŸ”Ή **4️⃣ Dockerization**
- Packaged the application into a **Docker container** for easy deployment.
- Created a `Dockerfile` to ensure the app runs consistently across environments.
- Pushed the container image to **DockerHub** for reuse and deployment.

### πŸ”Ή **5️⃣ CI/CD Pipeline**
- Configured **GitHub Actions** to automate:
- Building the Docker image.
- Pushing the image to DockerHub.
- Triggering redeployment on Render.

### πŸ”Ή **6️⃣ Cloud Deployment**
- Deployed the containerized application to **Render**, a cloud hosting platform.
- Configured environment variables for security (e.g., DockerHub credentials).
- Monitored application logs for debugging and performance insights.

## πŸ“Š **Key Features**
- **End-to-End MLOps Workflow**: Covers every step from data preprocessing to model deployment.
- **Cloud Deployment**: Real-world deployment using **Render**.
- **Reproducibility**: Automated CI/CD ensures consistent builds and deployments.
- **Scalability**: Dockerized application enables horizontal scaling and portability.
- **Security**: Managed sensitive credentials with GitHub Secrets and DockerHub tokens.

## πŸ› οΈ **How to Run the Project Locally**

### πŸ”Ή **1️⃣ Clone the Repository**
```bash
git clone https://github.com/camm93/WaterQualitySystem.git

cd WaterQualitySystem
```

### πŸ”Ή **2️⃣ Build and Run the Docker Container**
```bash
docker build -t WaterQualitySystem .

docker run -d -p 8000:8000 WaterQualitySystem
```

### πŸ”Ή **3️⃣ Test the API**
Use **Postman** or `curl` to test the `/predict` endpoint.

Example `curl` command:

```bash
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{
"pH": 7.13,
"Dureza": 173.69,
"SΓ³lidos": 19309.57,
"Cloraminas": 6.53,
"Sulfatos": 372.54,
"Conductividad": 295.39,
"Carbono_orgΓ‘nico": 7.27,
"Trihalometanos": 88.79,
"Turbidez": 3.40
}'
```

Expected Response:

```json
{
"prediction": "NO",
"probability": [
0.9414033798677441,
0.058596620132255695
]
}
```

## πŸ› οΈ Folder Structure

```
.
β”œβ”€β”€ app.py # FastAPI application
β”œβ”€β”€ model.pkl # Serialized Decision Tree model
β”œβ”€β”€ Dockerfile # Docker container configuration
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # Project documentation
└── .github/workflows # CI/CD configuration
└── deploy.yml # GitHub Actions workflow
```
---

## πŸ“ˆ To Recap
- **Model Deployment**: Transitioning from Jupyter notebooks to a production-ready API.
- **Docker**: Containerizing applications for consistent deployments.
- **CI/CD**: Automating builds, tests, and deployments using GitHub Actions.
- **Cloud Hosting**: Deploying machine learning APIs to cloud platforms.
- **Security**: Managing secrets and sensitive credentials for DockerHub and Render.
---

## πŸš€ Potential Improvements
1. Add model monitoring (e.g., API latency, prediction drift) using Prometheus + Grafana.
2. Incorporate MLflow for experiment tracking and model versioning.
3. Deploy the app to AWS (EC2, Lambda) or GCP for real-world scalability.
4. Add unit tests for API endpoints using pytest.
5. Enhance the CI/CD pipeline with rollback mechanisms and more extensive test coverage.

---

## πŸ’¬ Contact
For questions or collaboration opportunities, feel free to reach out:

- **Email**: crismur_93hotmail.com
- **LinkedIn**: [Cristian Murillo](https://www.linkedin.com/in/cristianmurillom/)