{"id":23703302,"url":"https://github.com/camm93/waterqualitysystem","last_synced_at":"2026-02-09T15:36:01.619Z","repository":{"id":270133669,"uuid":"905017662","full_name":"camm93/WaterQualitySystem","owner":"camm93","description":"End-to-End Machine Learning MLOps Project","archived":false,"fork":false,"pushed_at":"2024-12-28T17:10:18.000Z","size":2241,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-20T05:17:41.231Z","etag":null,"topics":["ci-cd","docker","fastapi","github-actions","machine-learning","python","render"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/camm93.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-18T02:00:04.000Z","updated_at":"2024-12-28T17:10:21.000Z","dependencies_parsed_at":"2024-12-28T18:18:55.362Z","dependency_job_id":"d3fef311-b0de-4bdc-9f2a-d0e6a998f14c","html_url":"https://github.com/camm93/WaterQualitySystem","commit_stats":null,"previous_names":["camm93/waterqualitysystem"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/camm93%2FWaterQualitySystem","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/camm93%2FWaterQualitySystem/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/camm93%2FWaterQualitySystem/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/camm93%2FWaterQualitySystem/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/camm93","download_url":"https://codeload.github.com/camm93/WaterQualitySystem/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239780142,"owners_count":19695736,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ci-cd","docker","fastapi","github-actions","machine-learning","python","render"],"created_at":"2024-12-30T13:00:59.106Z","updated_at":"2026-01-31T04:30:16.719Z","avatar_url":"https://github.com/camm93.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🚀 End-to-End Machine Learning MLOps Project\n**Building, Deploying, and Managing a Machine Learning Pipeline with CI/CD, Docker, and Cloud**  \n\n\n## 🎯 **Project Overview**  \nThis project demonstrates the implementation of a complete Machine Learning Operations (MLOps) pipeline by building and deploying a classifier to predict water quality as **drinkable** or **not drinkable** based on various water properties (e.g., pH, Turbidity, Chloramines).  \n\nDuring the model selection phase, **Logistic Regression** and **K-Nearest Neighbors (KNN)** models were trained and evaluated alongside a **Decision Tree Classifier**, with the Decision Tree ultimately selected for deployment based on its performance and interpretability.  \n\nThis project integrates **Exploratory Data Analysis**, **model training**, **model selection and hyperparameter tuning**, **deployment**, **containerization**, **CI/CD automation**, and **cloud deployment**, simulating a real-world production environment.\n\n\n\n## 🛠️ **Tools and Technologies Used**  \n| **Category**               | **Tools**                               |\n|----------------------------|-----------------------------------------|\n| **Programming**            | Python (numpy, pandas, scikit-learn)    |\n| **Model Training**         | LogisticRegressionClassifier, KNN, **DecisionTreeClassifier** (scikit-learn)   |\n| **Data Preprocessing**     | Pipelines (Imputation, Scaling)         |\n| **API Development**        | FastAPI                                |\n| **Containerization**       | Docker, DockerHub                      |\n| **CI/CD**                  | GitHub Actions                         |\n| **Cloud Deployment**       | Render                                 |\n| **Version Control**        | Git, GitHub                            |\n| **Security**               | GitHub Secrets, DockerHub Access Tokens|\n\n\n\n## 📝 **Project Workflow**  \n\n### 🔹 **1️⃣ Data Preprocessing**\n- Imported dataset, handled missing values using **mean imputation**.  \n- Built a **data preprocessing pipeline** to automate feature scaling and transformation.  \n\n### 🔹 **2️⃣ Model Training**\n- Trained a **Decision Tree Classifier** for binary classification.  \n- Fine-tuned hyperparameters (`max_depth`, `min_samples_split`) using grid search.  \n- Saved the trained model as a serialized file (`.pkl`) using `joblib`.  \n\n### 🔹 **3️⃣ API Development**\n- Built a **FastAPI application** to serve the trained model.  \n- Exposed a `/predict` endpoint for predictions.  \n- API accepts JSON input, preprocesses the data, and returns predictions in real-time.  \n\n### 🔹 **4️⃣ Dockerization**\n- Packaged the application into a **Docker container** for easy deployment.  \n- Created a `Dockerfile` to ensure the app runs consistently across environments.  \n- Pushed the container image to **DockerHub** for reuse and deployment.  \n\n### 🔹 **5️⃣ CI/CD Pipeline**\n- Configured **GitHub Actions** to automate:  \n  - Building the Docker image.  \n  - Pushing the image to DockerHub.  \n  - Triggering redeployment on Render.  \n\n### 🔹 **6️⃣ Cloud Deployment**\n- Deployed the containerized application to **Render**, a cloud hosting platform.  \n- Configured environment variables for security (e.g., DockerHub credentials).  \n- Monitored application logs for debugging and performance insights.  \n\n\n\n## 📊 **Key Features**\n- **End-to-End MLOps Workflow**: Covers every step from data preprocessing to model deployment.  \n- **Cloud Deployment**: Real-world deployment using **Render**.  \n- **Reproducibility**: Automated CI/CD ensures consistent builds and deployments.  \n- **Scalability**: Dockerized application enables horizontal scaling and portability.  \n- **Security**: Managed sensitive credentials with GitHub Secrets and DockerHub tokens.  \n\n\n\n## 🛠️ **How to Run the Project Locally**  \n\n### 🔹 **1️⃣ Clone the Repository**  \n```bash\ngit clone https://github.com/camm93/WaterQualitySystem.git\n\ncd WaterQualitySystem\n```\n\n### 🔹 **2️⃣ Build and Run the Docker Container**\n```bash\ndocker build -t WaterQualitySystem .\n\ndocker run -d -p 8000:8000 WaterQualitySystem\n```\n\n### 🔹 **3️⃣ Test the API**\nUse **Postman** or `curl` to test the `/predict` endpoint.\n\nExample `curl` command:\n\n```bash\ncurl -X POST http://localhost:8000/predict \\\n-H \"Content-Type: application/json\" \\\n-d '{\n  \"pH\": 7.13,\n  \"Dureza\": 173.69,\n  \"Sólidos\": 19309.57,\n  \"Cloraminas\": 6.53,\n  \"Sulfatos\": 372.54,\n  \"Conductividad\": 295.39,\n  \"Carbono_orgánico\": 7.27,\n  \"Trihalometanos\": 88.79,\n  \"Turbidez\": 3.40\n}'\n```\n\nExpected Response:\n\n```json\n{\n    \"prediction\": \"NO\",\n    \"probability\": [\n        0.9414033798677441,\n        0.058596620132255695\n    ]\n}\n```\n\n\n## 🛠️ Folder Structure\n\n```\n.\n├── app.py                  # FastAPI application\n├── model.pkl               # Serialized Decision Tree model\n├── Dockerfile              # Docker container configuration\n├── requirements.txt        # Python dependencies\n├── README.md               # Project documentation\n└── .github/workflows       # CI/CD configuration\n    └── deploy.yml          # GitHub Actions workflow\n```\n---\n\n## 📈 To Recap\n- **Model Deployment**: Transitioning from Jupyter notebooks to a production-ready API.\n- **Docker**: Containerizing applications for consistent deployments.\n- **CI/CD**: Automating builds, tests, and deployments using GitHub Actions.\n- **Cloud Hosting**: Deploying machine learning APIs to cloud platforms.\n- **Security**: Managing secrets and sensitive credentials for DockerHub and Render.\n---\n\n## 🚀 Potential Improvements\n1. Add model monitoring (e.g., API latency, prediction drift) using Prometheus + Grafana.\n2. Incorporate MLflow for experiment tracking and model versioning.\n3. Deploy the app to AWS (EC2, Lambda) or GCP for real-world scalability.\n4. Add unit tests for API endpoints using pytest.\n5. Enhance the CI/CD pipeline with rollback mechanisms and more extensive test coverage.\n\n---\n\n## 💬 Contact\nFor questions or collaboration opportunities, feel free to reach out:\n\n- **Email**: crismur_93hotmail.com\n- **LinkedIn**: [Cristian Murillo](https://www.linkedin.com/in/cristianmurillom/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcamm93%2Fwaterqualitysystem","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcamm93%2Fwaterqualitysystem","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcamm93%2Fwaterqualitysystem/lists"}