{"id":28480473,"url":"https://github.com/taimoorkhan10/mlops-forge","last_synced_at":"2026-05-02T11:31:56.861Z","repository":{"id":296454834,"uuid":"993432140","full_name":"TaimoorKhan10/MLOps-Forge","owner":"TaimoorKhan10","description":"A complete production-ready MLOps framework with built-in distributed training, monitoring, and CI/CD. Deploy ML models to production with confidence using our battle-tested infrastructure.","archived":false,"fork":false,"pushed_at":"2025-05-30T21:57:34.000Z","size":226,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-06-07T19:06:32.613Z","etag":null,"topics":["ai-infrastructure","ci-cd","datascience","devops","distributed-training","docker","feature-store","kubernetes","machine-learning","ml-engineering","ml-platform","mlflow","mlops","mlops-framework","mlops-pipeline","mlops-tools","model-deployment","model-monitoring","python","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TaimoorKhan10.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-30T19:27:29.000Z","updated_at":"2025-05-30T21:57:35.000Z","dependencies_parsed_at":"2025-05-31T06:14:25.796Z","dependency_job_id":"82294aa2-1cad-4a9e-b910-c0e1cea739b0","html_url":"https://github.com/TaimoorKhan10/MLOps-Forge","commit_stats":null,"previous_names":["taimoorkhan10/mlops-forge"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/TaimoorKhan10/MLOps-Forge","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TaimoorKhan10%2FMLOps-Forge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TaimoorKhan10%2FMLOps-Forge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TaimoorKhan10%2FMLOps-Forge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TaimoorKhan10%2FMLOps-Forge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TaimoorKhan10","download_url":"https://codeload.github.com/TaimoorKhan10/MLOps-Forge/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TaimoorKhan10%2FMLOps-Forge/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263388668,"owners_count":23459247,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-infrastructure","ci-cd","datascience","devops","distributed-training","docker","feature-store","kubernetes","machine-learning","ml-engineering","ml-platform","mlflow","mlops","mlops-framework","mlops-pipeline","mlops-tools","model-deployment","model-monitoring","python","pytorch"],"created_at":"2025-06-07T19:06:32.568Z","updated_at":"2026-05-02T11:31:56.825Z","avatar_url":"https://github.com/TaimoorKhan10.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MLOps-Forge \n\n[![CI/CD](https://img.shields.io/badge/CI/CD-passing-success?style=flat-square)](https://github.com/TaimoorKhan10/MLOps-Forge/actions) [![GITHUB ACTIONS](https://img.shields.io/badge/GITHUB_ACTIONS-enabled-blue?style=flat-square)](https://github.com/TaimoorKhan10/MLOps-Forge/actions) [![COVERAGE](https://img.shields.io/badge/COVERAGE-80%25-success?style=flat-square)](https://github.com/TaimoorKhan10/MLOps-Forge) [![CODECOV](https://img.shields.io/badge/CODECOV-enabled-ff69b4?style=flat-square)](https://codecov.io/gh/TaimoorKhan10/MLOps-Forge) [![PYTHON](https://img.shields.io/badge/PYTHON-3.9_|_3.10-blue?style=flat-square)](https://www.python.org/) [![LICENSE](https://img.shields.io/badge/LICENSE-MIT-yellowgreen?style=flat-square)](https://github.com/TaimoorKhan10/MLOps-Forge/blob/master/LICENSE) [![GITHUB](https://img.shields.io/badge/GITHUB-repo-black?style=flat-square)](https://github.com/TaimoorKhan10/MLOps-Forge) [![REPOSITORY](https://img.shields.io/badge/REPOSITORY-MLOps--Forge-black?style=flat-square)](https://github.com/TaimoorKhan10/MLOps-Forge)\n\nA complete production-ready MLOps framework with built-in distributed training, monitoring, and CI/CD. Deploy ML models to production with confidence using our battle-tested infrastructure. This project implements an end-to-end ML pipeline that follows industry best practices for developing, deploying, and maintaining ML models in production environments at scale.\n\n## 📋 Table of Contents\n- [Features](#-features)\n- [Architecture](#️-architecture)\n  - [Component Details](#component-details)\n  - [System Flow](#system-flow)\n- [Getting Started](#-getting-started)\n  - [Prerequisites](#prerequisites)\n  - [Installation](#installation)\n  - [Configuration](#configuration)\n- [Usage](#-usage)\n  - [Data Pipeline](#data-pipeline)\n  - [Model Training](#model-training)\n  - [Model Deployment](#model-deployment)\n  - [Monitoring](#monitoring)\n- [CI/CD Pipeline](#-cicd-pipeline)\n- [Development](#-development)\n  - [Project Structure](#project-structure)\n  - [Contributing](#contributing)\n- [Advanced Usage](#-advanced-usage)\n  - [Distributed Training](#distributed-training)\n  - [A/B Testing](#ab-testing)\n  - [Drift Detection](#drift-detection)\n- [Security](#-security)\n- [License](#-license)\n\n## 🚀 Features\n\n- **Automated Data Pipeline**: Robust data validation, cleaning, and feature engineering\n- **Experiment Tracking**: Comprehensive version control for models, datasets, and hyperparameters with MLflow\n- **Distributed Training**: GPU-accelerated training across multiple nodes for large models\n- **Model Registry**: Centralized model storage and versioning with lifecycle management\n- **Continuous Integration/Deployment**: Automated testing, validation, and deployment pipelines\n- **Model Serving API**: Fast and scalable REST API with input validation and automatic documentation\n- **Model Monitoring**: Performance tracking, drift detection, and automated retraining triggers\n- **A/B Testing**: Framework for model experimentation and controlled rollouts\n- **Infrastructure as Code**: Docker containers and Kubernetes configurations for reliable deployments\n\n## 🏗️ Architecture\n\nThis system follows a modular microservice architecture with the following components:\n\n```mermaid\ngraph TD\n    %% Main title and styles\n    classDef pipeline fill:#f0f6ff,stroke:#3273dc,color:#3273dc,stroke-width:2px\n    classDef component fill:#ffffff,stroke:#209cee,color:#209cee,stroke-width:1.5px\n    classDef note fill:#fffaeb,stroke:#ffdd57,color:#946c00,stroke-width:1px,stroke-dasharray:5 5\n    classDef infra fill:#e3fcf7,stroke:#00d1b2,color:#00d1b2,stroke-width:1.5px,stroke-dasharray:5 5\n    \n    %% Infrastructure\n    subgraph K8S[\"Kubernetes Cluster\"]\n        %% Data Pipeline\n        subgraph DP[\"Data Pipeline\"]\n            DI[Data Ingestion]:::component\n            DV[Data Validation]:::component\n            FE[Feature Engineering]:::component\n            FSN[Feature Store Integration]:::note\n            \n            DI --\u003e DV\n            DV --\u003e FE\n        end\n        \n        %% Model Training\n        subgraph MT[\"Model Training\"]\n            ET[Experiment Tracking - MLflow]:::component\n            DT[Distributed Training]:::component\n            ME[Model Evaluation]:::component\n            ABN[A/B Testing Framework]:::note\n            \n            ET --\u003e DT\n            DT --\u003e ME\n        end\n        \n        %% Model Registry\n        subgraph MR[\"Model Registry\"]\n            MV[Model Versioning]:::component\n            MS[Metadata Storage]:::component\n            MCI[CI/CD Integration]:::note\n            \n            MV --\u003e MS\n        end\n        \n        %% API Layer\n        subgraph API[\"API Layer\"]\n            FA[FastAPI Application]:::component\n            PE[Prediction Endpoints]:::component\n            HM[Health \u0026 Metadata APIs]:::component\n            HPA[Horizontal Pod Autoscaling]:::note\n            \n            FA --\u003e PE\n            FA --\u003e HM\n        end\n        \n        %% Monitoring\n        subgraph MON[\"Monitoring\"]\n            PM[Prometheus Metrics]:::component\n            GD[Grafana Dashboards]:::component\n            DD[Feature-level Drift Detection]:::component\n            RT[Automated Retraining Triggers]:::component\n            AM[Alert Manager Integration]:::note\n            \n            MPT[Model Performance Tracking]:::component\n            DQM[Data Quality Monitoring]:::component\n            ABT[A/B Testing Analytics]:::component\n            LA[Log Aggregation]:::component\n            DT2[Distributed Tracing]:::note\n            \n            PM --\u003e GD\n            PM --\u003e DD\n            DD --\u003e RT\n            MPT --\u003e DQM\n            DQM --\u003e ABT\n            ABT --\u003e LA\n        end\n        \n        %% Component relationships\n        DP --\u003e|Training Data| MT\n        DP --\u003e|Metadata| MR\n        MT --\u003e|Model Artifacts| MR\n        MR --\u003e|Latest Model| API\n        API --\u003e|Metrics| MON\n        MT --\u003e|Performance Metrics| MON\n    end\n    \n    %% CI/CD Pipeline\n    CICD[CI/CD Pipeline: GitHub Actions]:::infra\n    CICD --\u003e|Deploy| K8S\n    \n    %% Apply classes\n    class DP,MT,MR,API,MON pipeline\n```\n\n### Component Details\n\n1. **Data Pipeline**\n   - **Data Ingestion**: Connectors for various data sources (databases, object storage, streaming)\n   - **Data Validation**: Schema validation, data quality checks, and anomaly detection\n   - **Feature Engineering**: Feature transformation, normalization, and feature store integration\n\n2. **Model Training**\n   - **Experiment Tracking**: MLflow integration for tracking parameters, metrics, and artifacts\n   - **Distributed Training**: PyTorch distributed training for efficient model training\n   - **Model Evaluation**: Comprehensive metrics calculation and validation\n\n3. **Model Registry**\n   - **Model Versioning**: Storage and versioning of models with metadata\n   - **Artifact Management**: Efficient storage of model artifacts and associated files\n   - **Deployment Management**: Tracking of model deployment status\n\n4. **API Layer**\n   - **FastAPI Application**: High-performance API with automatic OpenAPI documentation\n   - **Prediction Endpoints**: RESTful endpoints for model inference\n   - **Health \u0026 Metadata**: Endpoints for system health checks and model metadata\n\n5. **Monitoring System**\n   - **Metrics Collection**: Prometheus integration for metrics collection\n   - **Drift Detection**: Statistical methods to detect data and concept drift\n   - **Performance Tracking**: Continuous monitoring of model performance metrics\n   - **Automated Retraining**: Triggers for retraining based on drift detection\n\n### System Flow\n\n1. **Development Workflow**:\n   ```mermaid\n   flowchart LR\n       DS[Data Scientist] --\u003e |Develops Model| DEV[Development Environment]\n       DEV --\u003e |Commits Code| GIT[Git Repository]\n       GIT --\u003e |Triggers| CI[CI/CD Pipeline]\n       CI --\u003e |Runs Tests| TEST[Test Suite]\n       TEST --\u003e |Validates Model| VAL[Model Validation]\n       VAL --\u003e |Performance Testing| PERF[Performance Tests]\n       PERF --\u003e |Builds| BUILD[Docker Image]\n       BUILD --\u003e |Deploys| DEPLOY[Kubernetes Cluster]\n   ```\n\n2. **Production Data Flow**:\n   ```mermaid\n   flowchart LR\n       DATA[Data Sources] --\u003e |Ingestion| PIPE[Data Pipeline]\n       PIPE --\u003e |Validated Data| TRAIN[Training Pipeline]\n       TRAIN --\u003e |Trained Model| REG[Model Registry]\n       REG --\u003e |Latest Model| API[API Service]\n       API --\u003e |Predictions| USERS[End Users]\n       API --\u003e |Metrics| MON[Monitoring]\n       MON --\u003e |Drift Detected| RETRAIN[Retraining Trigger]\n       RETRAIN --\u003e TRAIN\n   ```\n\n## 🚀 Quick Start\n\n### Installation\n\nInstall the latest stable version from PyPI:\n\n```bash\npip install mlops-forge\n```\n\nFor development, install from source:\n\n```bash\n# Clone the repository\ngit clone https://github.com/TaimoorKhan10/MLOps-Forge.git\ncd MLOps-Forge\n\n# Create and activate virtual environment\npython -m venv venv\n# On Windows: .\\venv\\Scripts\\activate\n# On macOS/Linux: source venv/bin/activate\n\n# Install in development mode with all dependencies\npip install -e \".[dev]\"\n```\n\n### Prerequisites\n\n- Python 3.9 or 3.10\n- Docker and Docker Compose (for containerization)\n- Kubernetes (for production deployment)\n- Cloud provider account (AWS/GCP/Azure) for cloud deployments\n\n### Configuration\n\n1. **Environment Variables**:\n   Create a `.env` file based on the provided `.env.example`:\n\n   ```\n   # MLflow Configuration\n   MLFLOW_TRACKING_URI=http://mlflow:5000\n   MLFLOW_S3_ENDPOINT_URL=http://minio:9000\n   \n   # AWS Configuration for Deployment\n   AWS_ACCESS_KEY_ID=your-access-key\n   AWS_SECRET_ACCESS_KEY=your-secret-key\n   AWS_REGION=us-west-2\n   \n   # Kubernetes Configuration\n   K8S_NAMESPACE=mlops-production\n   ```\n\n2. **Infrastructure Setup**:\n   ```bash\n   # For local development with Docker Compose\n   docker-compose up -d\n   \n   # For Kubernetes deployment\n   kubectl apply -f infrastructure/kubernetes/\n   ```\n\n## 🧰 Usage\n\n### Data Pipeline\n\n```python\nfrom mlops_production_system.pipeline import DataPipeline\n\n# Initialize the pipeline\npipeline = DataPipeline(config_path=\"config/pipeline_config.yaml\")\n\n# Run the pipeline\nprocessed_data = pipeline.run(input_data_path=\"data/raw/training_data.csv\")\n```\n\n### Model Training\n\n```python\nfrom mlops_production_system.models import ModelTrainer\nfrom mlops_production_system.training import distributed_trainer\n\n# For single-node training\ntrainer = ModelTrainer(model_config=\"config/model_config.yaml\")\nmodel = trainer.train(X_train, y_train)\nmetrics = trainer.evaluate(X_test, y_test)\n\n# For distributed training\ndistributed_trainer.run(\n    model_class=\"mlops_production_system.models.CustomModel\",\n    data_path=\"data/processed/training_data.parquet\",\n    num_nodes=4\n)\n```\n\n### Model Deployment\n\n```bash\n# Deploy model using CLI\nmlops deploy --model-name=\"my-model\" --model-version=1 --environment=production\n\n# Or using the Python API\nfrom mlops_production_system.deployment import ModelDeployer\n\ndeployer = ModelDeployer()\ndeployer.deploy(model_name=\"my-model\", model_version=1, environment=\"production\")\n```\n\n### Monitoring\n\n```python\nfrom mlops_production_system.monitoring import DriftDetector, PerformanceMonitor\n\n# Monitor for drift\ndrift_detector = DriftDetector(reference_data=\"data/reference.parquet\")\ndrift_results = drift_detector.detect(new_data=\"data/production_data.parquet\")\n\n# Monitor model performance\nperformance_monitor = PerformanceMonitor(model_name=\"my-model\", model_version=1)\nperformance_metrics = performance_monitor.get_metrics(timeframe=\"last_24h\")\n```\n\n## 🔄 CI/CD Pipeline\n\nThe system uses GitHub Actions for CI/CD pipeline, configured in `.github/workflows/main.yml`. The pipeline includes:\n\n1. **Code Quality**:\n   - Linting with flake8\n   - Type checking with mypy\n   - Security scanning with bandit\n\n2. **Testing**:\n   - Unit tests with pytest\n   - Integration tests\n   - Code coverage reporting\n\n3. **Model Validation**:\n   - Performance benchmarking\n   - Model quality checks\n   - Validation against baseline metrics\n\n4. **Deployment**:\n   - Docker image building\n   - Image pushing to container registry\n   - Kubernetes deployment updates\n\nAll secrets and credentials are stored securely in GitHub Secrets and only accessed during workflow execution.\n\n## 👨‍💻 Development\n\n### Project Structure\n\n```\nMLOps-Production-System/\n├── .github/                  # GitHub Actions workflows\n├── config/                   # Configuration files\n├── data/                     # Data directories (gitignored)\n├── docs/                     # Documentation\n├── infrastructure/           # Infrastructure as code\n│   ├── docker/               # Docker configurations\n│   ├── kubernetes/           # Kubernetes manifests\n│   └── terraform/            # Terraform for cloud resources\n├── notebooks/                # Jupyter notebooks\n├── scripts/                  # Utility scripts\n├── src/                      # Source code\n│   └── mlops_production_system/\n│       ├── api/              # FastAPI application\n│       ├── models/           # ML models\n│       ├── pipeline/         # Data pipeline\n│       ├── training/         # Training code\n│       ├── monitoring/       # Monitoring tools\n│       └── utils/            # Utilities\n├── tests/                    # Test suite\n├── .env.example              # Example environment variables\n├── Dockerfile                # Main Dockerfile\n├── pyproject.toml            # Project metadata\n└── README.md                 # This file\n```\n\n### Contributing\n\nWe follow the GitFlow branching model:\n\n1. Create a feature branch from `develop`: `git checkout -b feature/your-feature`\n2. Make your changes and commit: `git commit -m \"Add feature\"`\n3. Push your branch: `git push origin feature/your-feature`\n4. Open a Pull Request against the `develop` branch\n\nAll PRs must pass CI checks and code review before being merged.\n\n## 🔬 Advanced Usage\n\n### Distributed Training\n\nThe system supports distributed training using PyTorch's DistributedDataParallel for efficient multi-node training:\n\n```yaml\n# Example Kubernetes configuration in infrastructure/kubernetes/distributed-training.yaml\napiVersion: batch/v1\nkind: Job\nmetadata:\n  name: distributed-training\nspec:\n  parallelism: 4\n  template:\n    spec:\n      containers:\n      - name: trainer\n        image: your-registry/mlops-trainer:latest\n        resources:\n          limits:\n            nvidia.com/gpu: 1\n        env:\n        - name: WORLD_SIZE\n          value: \"4\"\n```\n\n### A/B Testing\n\nThe A/B testing framework allows comparing multiple models in production:\n\n```python\nfrom mlops_production_system.monitoring import ABTestingFramework\n\n# Set up A/B test between two models\nab_test = ABTestingFramework()\nab_test.create_experiment(\n    name=\"pricing_model_comparison\",\n### Configuration\n\n1. **Environment Variables**:\n   Create a `.env` file based on the provided `.env.example`:\n\n   ```\n   # MLflow Configuration\n   MLFLOW_TRACKING_URI=http://mlflow:5000\n   MLFLOW_S3_ENDPOINT_URL=http://minio:9000\n   \n   # AWS Configuration for Deployment\n   AWS_ACCESS_KEY_ID=your-access-key\n   AWS_SECRET_ACCESS_KEY=your-secret-key\n   AWS_REGION=us-west-2\n   \n   # Kubernetes Configuration\n   K8S_NAMESPACE=mlops-production\n   ```\n\n2. **Infrastructure Setup**:\n   ```bash\n   # For local development with Docker Compose\n   docker-compose up -d\n   \n   # For Kubernetes deployment\n   kubectl apply -f infrastructure/kubernetes/\n   ```\n\n## 🧰 Usage\n\n### Data Pipeline\n\n```python\nfrom mlops_production_system.pipeline import DataPipeline\n\n# Initialize the pipeline\npipeline = DataPipeline(config_path=\"config/pipeline_config.yaml\")\n\n# Run the pipeline\nprocessed_data = pipeline.run(input_data_path=\"data/raw/training_data.csv\")\n```\n\n### Model Training\n\n```python\nfrom mlops_production_system.models import ModelTrainer\nfrom mlops_production_system.training import distributed_trainer\n\n# For single-node training\ntrainer = ModelTrainer(model_config=\"config/model_config.yaml\")\nmodel = trainer.train(X_train, y_train)\nmetrics = trainer.evaluate(X_test, y_test)\n\n# For distributed training\ndistributed_trainer.run(\n    model_class=\"mlops_production_system.models.CustomModel\",\n    data_path=\"data/processed/training_data.parquet\",\n    num_nodes=4\n)\n```\n\n### Model Deployment\n\n```bash\n# Deploy model using CLI\nmlops deploy --model-name=\"my-model\" --model-version=1 --environment=production\n\n# Or using the Python API\nfrom mlops_production_system.deployment import ModelDeployer\n\ndeployer = ModelDeployer()\ndeployer.deploy(model_name=\"my-model\", model_version=1, environment=\"production\")\n```\n\n### Monitoring\n\n```python\nfrom mlops_production_system.monitoring import DriftDetector, PerformanceMonitor\n\n# Monitor for drift\ndrift_detector = DriftDetector(reference_data=\"data/reference.parquet\")\ndrift_results = drift_detector.detect(new_data=\"data/production_data.parquet\")\n\n# Monitor model performance\nperformance_monitor = PerformanceMonitor(model_name=\"my-model\", model_version=1)\nperformance_metrics = performance_monitor.get_metrics(timeframe=\"last_24h\")\n```\n\n## 🔄 CI/CD Pipeline\n\nThe system uses GitHub Actions for CI/CD pipeline, configured in `.github/workflows/main.yml`. The pipeline includes:\n\n1. **Code Quality**:\n   - Linting with flake8\n   - Type checking with mypy\n   - Security scanning with bandit\n\n2. **Testing**:\n   - Unit tests with pytest\n   - Integration tests\n   - Code coverage reporting\n\n3. **Model Validation**:\n   - Performance benchmarking\n   - Model quality checks\n   - Validation against baseline metrics\n\n4. **Deployment**:\n   - Docker image building\n   - Image pushing to container registry\n   - Kubernetes deployment updates\n\nAll secrets and credentials are stored securely in GitHub Secrets and only accessed during workflow execution.\n\n## 👨‍💻 Development\n\n### Project Structure\n\n```\nMLOps-Production-System/\n├── .github/                  # GitHub Actions workflows\n├── config/                   # Configuration files\n├── data/                     # Data directories (gitignored)\n├── docs/                     # Documentation\n├── infrastructure/           # Infrastructure as code\n│   ├── docker/               # Docker configurations\n│   ├── kubernetes/           # Kubernetes manifests\n│   └── terraform/            # Terraform for cloud resources\n├── notebooks/                # Jupyter notebooks\n├── scripts/                  # Utility scripts\n├── src/                      # Source code\n│   └── mlops_production_system/\n│       ├── api/              # FastAPI application\n│       ├── models/           # ML models\n│       ├── pipeline/         # Data pipeline\n│       ├── training/         # Training code\n│       ├── monitoring/       # Monitoring tools\n│       └── utils/            # Utilities\n├── tests/                    # Test suite\n├── .env.example              # Example environment variables\n├── Dockerfile                # Main Dockerfile\n├── pyproject.toml            # Project metadata\n└── README.md                 # This file\n```\n\n### Contributing\n\nWe follow the GitFlow branching model:\n\n1. Create a feature branch from `develop`: `git checkout -b feature/your-feature`\n2. Make your changes and commit: `git commit -m \"Add feature\"`\n3. Push your branch: `git push origin feature/your-feature`\n4. Open a Pull Request against the `develop` branch\n\nAll PRs must pass CI checks and code review before being merged.\n\n## 🔬 Advanced Usage\n\n### Distributed Training\n\nThe system supports distributed training using PyTorch's DistributedDataParallel for efficient multi-node training:\n\n```yaml\n# Example Kubernetes configuration in infrastructure/kubernetes/distributed-training.yaml\napiVersion: batch/v1\nkind: Job\nmetadata:\n  name: distributed-training\nspec:\n  parallelism: 4\n  template:\n    spec:\n      containers:\n      - name: trainer\n        image: your-registry/mlops-trainer:latest\n        resources:\n          limits:\n            nvidia.com/gpu: 1\n        env:\n        - name: WORLD_SIZE\n          value: \"4\"\n```\n\n### A/B Testing\n\nThe A/B testing framework allows comparing multiple models in production:\n\n```python\nfrom mlops_production_system.monitoring import ABTestingFramework\n\n# Set up A/B test between two models\nab_test = ABTestingFramework()\nab_test.create_experiment(\n    name=\"pricing_model_comparison\",\n    models=[\"pricing_model_v1\", \"pricing_model_v2\"],\n    traffic_split=[0.5, 0.5],\n    evaluation_metric=\"conversion_rate\"\n)\n\n# Get results\nresults = ab_test.get_results(experiment_name=\"pricing_model_comparison\")\n```\n\n### Drift Detection\n\nDetect data drift to trigger model retraining:\n\n```python\nfrom mlops_production_system.monitoring import DriftDetector\n\n# Initialize with reference data distribution\ndetector = DriftDetector(\n    reference_data=\"s3://bucket/reference_data.parquet\",\n    features=[\"feature1\", \"feature2\", \"feature3\"],\n    drift_method=\"wasserstein\",\n    threshold=0.1\n)\n\n# Check for drift in new data\ndrift_detected, drift_metrics = detector.detect(\n    current_data=\"s3://bucket/production_data.parquet\"\n)\n\nif drift_detected:\n    # Trigger retraining\n    from mlops_production_system.training import trigger_retraining\n    trigger_retraining(model_name=\"my-model\")\n```\n\n## 🔒 Security\n\nThis project follows security best practices:\n\n- Secrets management via environment variables and Kubernetes secrets\n- Regular dependency scanning for vulnerabilities\n- Least privilege principle for all service accounts\n- Network policies to restrict pod-to-pod communication\n- Encryption of data at rest and in transit\n\n## 📜 License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n---\n\n*MLOps-Forge was created to demonstrate end-to-end machine learning operations and follows industry best practices for deploying ML models in production environments. Star us on [GitHub](https://github.com/TaimoorKhan10/MLOps-Forge) if you find this project useful!*\n\n## 🔧 Technologies\n\n- **ML Framework**: scikit-learn, PyTorch\n- **Feature Store**: feast\n- **Experiment Tracking**: MLflow\n- **API**: FastAPI\n- **Containerization**: Docker\n- **Orchestration**: Kubernetes\n- **CI/CD**: GitHub Actions\n- **Infrastructure as Code**: Terraform\n- **Monitoring**: Prometheus, Grafana\n\n## 🛠️ Installation\n\n### Prerequisites\n\n- Python 3.9+\n- Docker and Docker Compose\n- Kubernetes (optional for local development)\n\n### Setup\n\n1. Clone the repository\n   ```bash\n   git clone https://github.com/TaimoorKhan10/MLOps-Production-System.git\n   cd MLOps-Production-System\n   ```\n\n2. Create a virtual environment and install dependencies\n   ```bash\n   python -m venv venv\n   source venv/bin/activate  # On Windows: venv\\Scripts\\activate\n   pip install -r requirements.txt\n   ```\n\n3. Set up environment variables\n   ```bash\n   cp .env.example .env\n   # Edit .env with your configuration\n   ```\n\n4. Start the development environment\n   ```bash\n   docker-compose up -d\n   ```\n\n## 📊 Demo\n\nAccess the demo application at http://localhost:8000 after starting the containers.\n\nThe demo includes:\n- Model training dashboard\n- Real-time inference API\n- Performance monitoring\n\n## 📚 Documentation\n\nComprehensive documentation is available in the `/docs` directory:\n\n- [Data Pipeline](docs/data_pipeline.md)\n- [Model Training](docs/model_training.md)\n- [API Reference](docs/api_reference.md)\n- [Deployment Guide](docs/deployment.md)\n- [Monitoring](docs/monitoring.md)\n\n## 🧪 Testing\n\nRun the test suite:\n\n```bash\npytest\n```\n\n## 🚢 Deployment\n\n### Local Deployment\n\n```bash\ndocker-compose up -d\n```\n\n### Cloud Deployment (AWS)\n\n```bash\ncd infrastructure/terraform\nterraform init\nterraform apply\n```\n\n## 📈 Monitoring\n\nAccess the monitoring dashboard at http://localhost:3000 after deployment.\n\n## 🤝 Contributing\n\nContributions are welcome! Please check out our [contribution guidelines](CONTRIBUTING.md).\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaimoorkhan10%2Fmlops-forge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftaimoorkhan10%2Fmlops-forge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaimoorkhan10%2Fmlops-forge/lists"}