{"id":19829999,"url":"https://github.com/2003harsh/text-classification-using-mlops","last_synced_at":"2026-05-17T00:47:20.435Z","repository":{"id":261062210,"uuid":"882968773","full_name":"2003HARSH/Text-Classification-using-MLOps","owner":"2003HARSH","description":"The Text Classification using MLOps project delivers a production-ready MLOps pipeline with MLflow for experiment tracking, DVC for data versioning on Amazon S3, and CI/CD automation via GitHub Actions. It features AWS Auto Scaling Groups, Load Balancers, and Launch Templates for scalable and resilient deployments. ","archived":false,"fork":false,"pushed_at":"2024-11-22T04:33:57.000Z","size":92,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-03T03:42:12.942Z","etag":null,"topics":["auto-scaling-groups","aws-ec2","aws-s3","blue-green-deployment","cicd","code-deploy","dvc","dvc-pipeline","ecr","flask","load-balancer","mlflow"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/2003HARSH.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-04T06:19:30.000Z","updated_at":"2024-11-24T12:35:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"7f66ecc5-bc9f-43aa-b313-48885079221d","html_url":"https://github.com/2003HARSH/Text-Classification-using-MLOps","commit_stats":null,"previous_names":["2003harsh/text-classification-using-mlops"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/2003HARSH/Text-Classification-using-MLOps","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/2003HARSH%2FText-Classification-using-MLOps","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/2003HARSH%2FText-Classification-using-MLOps/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/2003HARSH%2FText-Classification-using-MLOps/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/2003HARSH%2FText-Classification-using-MLOps/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/2003HARSH","download_url":"https://codeload.github.com/2003HARSH/Text-Classification-using-MLOps/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/2003HARSH%2FText-Classification-using-MLOps/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33124058,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-16T18:38:32.183Z","status":"ssl_error","status_checked_at":"2026-05-16T18:38:29.903Z","response_time":115,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["auto-scaling-groups","aws-ec2","aws-s3","blue-green-deployment","cicd","code-deploy","dvc","dvc-pipeline","ecr","flask","load-balancer","mlflow"],"created_at":"2024-11-12T11:21:13.395Z","updated_at":"2026-05-17T00:47:20.420Z","avatar_url":"https://github.com/2003HARSH.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Text Classification using MLOps\n\nThis project demonstrates a complete MLOps pipeline for a text classification task, implementing end-to-end practices for model experimentation, tracking, packaging, and deployment. The project incorporates advanced features such as AWS CodeDeploy for automated blue-green deployment and AWS Elastic Container Registry (ECR) for model storage. To ensure scalability, reliability, and fault tolerance, it also utilizes **AWS Auto Scaling Groups (ASGs)**, **Load Balancers**, and **Launch Templates**.\n\n---\n\n## Project Overview\n\nThis repository includes:\n- **Experiment Tracking**: Logs all training runs with parameters, metrics, and artifacts in MLflow.\n- **Hyperparameter Tuning**: Uses MLflow to log and compare performance during hyperparameter optimization.\n- **ML Pipeline with DVC**: Structures and manages machine learning pipelines, ensuring reproducibility.\n- **Model Registration**: Registers the best-performing models for deployment using MLflow.\n- **Data Versioning**: Tracks and versions datasets with DVC, storing them in Amazon S3.\n- **Remote Experiment Tracking**: Hosts a centralized MLflow tracking server on DagsHub.\n- **Automated CI/CD Pipelines**: Leverages GitHub Actions to automate testing, pipeline execution, and deployment.\n- **Unit Testing**: Validates API endpoints, model loading, and configurations to ensure robust deployments.\n- **AWS CodeDeploy with Blue-Green Deployment**: Deploys the application using AWS CodeDeploy to minimize downtime.\n- **AWS ECR Integration**: Stores and retrieves Docker images for deployment.\n- **Production Deployment**: Automates testing and model promotion to production, ensuring deployment readiness.\n- **Scalability Features**:\n  - **Auto Scaling Groups (ASGs)**: Automatically adjusts the number of EC2 instances based on traffic and system load.\n  - **Load Balancers**: Distributes traffic evenly across instances to ensure high availability and fault tolerance.\n  - **Launch Templates**: Defines instance configurations for easy scaling and reproducibility.\n\n---\n\n## Key Features\n\n### 1. **Experiment Tracking with MLflow**\n   - Logs all experiments, hyperparameters, metrics, and artifacts to an MLflow server hosted on DagsHub.\n   - Simplifies comparison and selection of the best-performing models.\n\n### 2. **Hyperparameter Tuning**\n   - Uses MLflow’s tracking capabilities for hyperparameter tuning.\n   - Tracks each experiment run and selects the best configuration for deployment.\n\n### 3. **Structured ML Pipeline with DVC**\n   - Employs DVC to define and manage an end-to-end ML pipeline from data ingestion to model training.\n   - Tracks all pipeline stages, ensuring reproducibility and efficient updates.\n\n### 4. **Model Registration in MLflow**\n   - Registers the best-performing models to MLflow Model Registry.\n   - Supports staging and production environments for model lifecycle management.\n\n### 5. **Data Versioning with DVC and S3**\n   - Tracks data changes and stores datasets securely in an Amazon S3 bucket.\n   - Allows easy rollback and version comparison for datasets.\n\n### 6. **Scalable Deployment with AWS**\n   - **Auto Scaling Groups (ASGs)**:\n     - Dynamically adjusts the number of EC2 instances based on predefined scaling policies (e.g., CPU usage, memory usage).\n     - Ensures cost efficiency by scaling down during low traffic and scaling up during peak traffic.\n   - **Load Balancers**:\n     - Elastic Load Balancer (ELB) ensures that incoming traffic is evenly distributed across all running instances.\n     - Provides fault tolerance by automatically routing traffic away from unhealthy instances.\n   - **Launch Templates**:\n     - Predefined configurations for EC2 instances, including AMIs, instance types, security groups, and networking settings.\n     - Simplifies instance management and ensures consistency across scaling operations.\n\n### 7. **Automated Deployment with AWS CodeDeploy**\n   - Implements blue-green deployment for seamless updates to Auto Scaling Groups (ASGs) behind a load balancer.\n   - Ensures minimal downtime and safe transitions between application versions.\n\n### 8. **Integration with AWS ECR**\n   - After passing all tests, Docker images are built and pushed to AWS Elastic Container Registry (ECR).\n   - CodeDeploy pulls these images for deployment to ASGs.\n\n### 9. **CI/CD with GitHub Actions**\n   - Automates the workflow for testing, building, and deploying updates.\n   - Triggers deployment only after passing all unit tests and validations.\n\n### 10. **Unit Testing**\n   - Comprehensive unit tests for:\n     - Flask API endpoints\n     - Model loading\n     - Model signature validation\n   - Ensures reliability before deployment.\n\n---\n\n## Deployment Workflow\n\n1. **Build and Push to AWS ECR**:\n   - After successful testing, the application is containerized using Docker.\n   - The Docker image is pushed to AWS ECR for centralized storage.\n\n2. **Automated Deployment**:\n   - AWS CodeDeploy retrieves the Docker image from ECR.\n   - Deploys the application to ASGs using blue-green deployment to minimize downtime.\n   - Load balancers ensure high availability, routing traffic only to healthy instances.\n\n3. **Scaling and Traffic Management**:\n   - ASGs adjust the number of instances based on traffic patterns.\n   - Load balancers handle incoming requests and distribute them to available instances, ensuring optimal performance.\n\n4. **Continuous Integration/Delivery**:\n   - GitHub Actions automatically trigger the deployment pipeline on new commits.\n\n---\n\n## Setup\n\n1. **Clone the Repository**:\n   ```bash\n   git clone https://github.com/2003HARSH/Text-Classification-using-MLOps.git\n   cd Text-Classification-using-MLOps\n   ```\n\n2. **Install Dependencies**:\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n3. **Set Up AWS Services**:\n   - **Auto Scaling Groups (ASGs)**: Define scaling policies for EC2 instances.\n   - **Load Balancers**: Configure ELB to distribute traffic across instances.\n   - **Launch Templates**: Create templates for consistent instance configurations.\n\n4. **Configure AWS CodeDeploy**:\n   - Set up a CodeDeploy application with blue-green deployment using ASGs and a load balancer.\n\n5. **Push Docker Image to ECR**:\n   ```bash\n   aws ecr get-login-password --region \u003cregion\u003e | docker login --username AWS --password-stdin \u003caccount_id\u003e.dkr.ecr.\u003cregion\u003e.amazonaws.com\n   docker build -t text-classification .\n   docker tag text-classification:latest \u003caccount_id\u003e.dkr.ecr.\u003cregion\u003e.amazonaws.com/text-classification:latest\n   docker push \u003caccount_id\u003e.dkr.ecr.\u003cregion\u003e.amazonaws.com/text-classification:latest\n   ```\n\n---\n\n## Usage\n\n1. **Run the ML Pipeline**:\n   - Execute the pipeline defined in `dvc.yaml`:\n     ```bash\n     dvc repro\n     ```\n\n2. **Experiment Tracking**:\n   - Start and track experiments using MLflow:\n     ```python\n     import mlflow\n     mlflow.start_run()\n     ```\n\n3. **Hyperparameter Tuning**:\n   - Use hyperparameter tuning code within `mlflow_experiment/` to find the best model.\n\n4. **Deploy Model**:\n   - Deploy the final model after passing all tests:\n     ```bash\n     dvc push  # Pushes tracked data to S3\n     ```\n\n---\n\n## Testing\n\n- Run unit tests locally:\n  ```bash\n  python -m unittest \u003ctest_file_name\u003e.py\n  ```\n- CI/CD workflows execute these tests automatically.\n\n---\n\n## Future Enhancements\n\n1. **Enhanced Deployment**:\n   - Deployment of the application using AWS Elastic Container Service (ECS) for scaling and fault tolerance.\n   - Further integration with AWS CodePipeline to orchestrate the end-to-end deployment process.\n\n2. **Model Monitoring**:\n   - Integration of tools for monitoring model performance in production and detecting drift.\n\n---\n\n## Contact\n\nFeel free to reach out at [harshnkgupta@gmail.com](harshnkgupta@gmail.com) or create an issue in the repository for questions or collaboration opportunities!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F2003harsh%2Ftext-classification-using-mlops","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F2003harsh%2Ftext-classification-using-mlops","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F2003harsh%2Ftext-classification-using-mlops/lists"}