https://github.com/devsuthar-ai/mlops-pipeline-framework

🔧 Production ML pipeline framework for model training, deployment, and monitoring. Features: MLflow, Airflow, FastAPI serving, Prometheus monitoring, Kubernetes deployment.
https://github.com/devsuthar-ai/mlops-pipeline-framework
airflow ci-cd data-science docker fastapi kubernetes machine-learning ml-pipeline mlflow mlops model-deployment python
Last synced: 7 months ago
JSON representation
🔧 Production ML pipeline framework for model training, deployment, and monitoring. Features: MLflow, Airflow, FastAPI serving, Prometheus monitoring, Kubernetes deployment.
Host: GitHub
URL: https://github.com/devsuthar-ai/mlops-pipeline-framework
Owner: devsuthar-ai
Created: 2025-11-08T06:40:42.000Z (8 months ago)
Default Branch: master
Last Pushed: 2025-11-08T11:11:59.000Z (8 months ago)
Last Synced: 2025-11-08T13:09:45.536Z (8 months ago)
Topics: airflow, ci-cd, data-science, docker, fastapi, kubernetes, machine-learning, ml-pipeline, mlflow, mlops, model-deployment, python
Language: Python
Homepage:
Size: 7.81 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          


# 🔧 MLOps Pipeline Framework

### *Production Machine Learning Operations Platform*

[![Python](https://img.shields.io/badge/Python-3.11+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/)

[![FastAPI](https://img.shields.io/badge/FastAPI-0.104+-009688?style=for-the-badge&logo=fastapi&logoColor=white)](https://fastapi.tiangolo.com/)

[![MLflow](https://img.shields.io/badge/MLflow-2.8+-0194E2?style=for-the-badge&logo=mlflow&logoColor=white)](https://mlflow.org/)

[![Kubernetes](https://img.shields.io/badge/Kubernetes-326CE5?style=for-the-badge&logo=kubernetes&logoColor=white)](https://kubernetes.io/)

[![Docker](https://img.shields.io/badge/Docker-Ready-2496ED?style=for-the-badge&logo=docker&logoColor=white)](https://www.docker.com/)

[![License](https://img.shields.io/badge/License-MIT-yellow?style=for-the-badge)](LICENSE)

**[Live Demo](#) • [Documentation](#) • [API Docs](#) • [Report Bug](../../issues) • [Request Feature](../../issues)**

---

### 🎯 *End-to-end ML pipeline orchestration for production deployments*

Built with ❤️ by [Dev Suthar](https://github.com/devsuthar-ai) | ⭐ **Star us on GitHub!**



---

## 📋 Table of Contents

- [✨ Features](#-features)

- [🎬 Demo](#-demo)

- [🏗️ Architecture](#️-architecture)

- [🚀 Quick Start](#-quick-start)

- [📖 Documentation](#-documentation)

- [🔧 Pipeline Components](#-pipeline-components)

- [🧪 Model Training](#-model-training)

- [🚢 Model Deployment](#-model-deployment)

- [📊 Monitoring](#-monitoring)

- [🛠️ Tech Stack](#️-tech-stack)

- [🤝 Contributing](#-contributing)

- [📄 License](#-license)

---

## ✨ Features

### 🎯 **Core Capabilities**

- 📊 **Data Pipeline** - Automated ingestion & validation

- 🤖 **Model Training** - Distributed training orchestration

- 🎯 **Hyperparameter Tuning** - Automated optimization

- 📈 **Experiment Tracking** - MLflow integration

- 🚀 **Model Deployment** - One-click deployment

- 📉 **Model Monitoring** - Real-time performance tracking

### 🛠️ **Technical Excellence**

- ☸️ **Kubernetes Native** - Auto-scaling & orchestration

- 🔄 **CI/CD Integration** - Automated ML workflows

- 📊 **Observability** - Prometheus + Grafana

- 🧪 **A/B Testing** - Model comparison

- 🔄 **Auto-Retraining** - Scheduled model updates

- 📦 **Model Registry** - Version management

---

## 🎬 Demo



### 🖼️ **Platform Screenshots**

**📊 Pipeline Dashboard**

![Pipeline](https://via.placeholder.com/500x300/667eea/ffffff?text=Pipeline+Orchestration)

**📈 Model Tracking**

![Training](https://via.placeholder.com/500x300/764ba2/ffffff?text=Model+Training)

**🚀 Deployment Manager**

![Deploy](https://via.placeholder.com/500x300/f093fb/ffffff?text=Model+Deployment)

**📉 Performance Monitoring**

![Monitor](https://via.placeholder.com/500x300/4facfe/ffffff?text=Real-time+Monitoring)



---

## 🏗️ Architecture

```mermaid

graph TB

    subgraph "Data Layer"

        A[Data Sources] --> B[Data Ingestion Service]

        B --> C[Data Validation]

        C --> D[(Feature Store)]

    end

    

    subgraph "Training Layer"

        D --> E[Training Pipeline]

        E --> F[Hyperparameter Tuning]

        F --> G[Model Evaluation]

        G --> H[(Model Registry)]

    end

    

    subgraph "Serving Layer"

        H --> I[Model Deployment]

        I --> J[A/B Testing]

        J --> K[Prediction Service]

        K --> L[Load Balancer]

    end

    

    subgraph "Monitoring Layer"

        K --> M[Performance Monitor]

        M --> N[Drift Detection]

        N --> O[Alert System]

        O --> E

    end

    

    subgraph "Orchestration"

        P[Airflow] -.-> E

        P -.-> I

        P -.-> M

    end

    

    subgraph "Observability"

        Q[Prometheus] --> R[Grafana]

        K --> Q

        M --> Q

    end

    

    style E fill:#667eea

    style I fill:#764ba2

    style M fill:#f093fb

    style H fill:#4facfe

```

### 📐 **System Components**

| Component | Technology | Purpose |

|-----------|-----------|---------|

| **Orchestration** | Airflow | Workflow management |

| **Experiment Tracking** | MLflow | Model versioning |

| **Model Serving** | FastAPI | High-performance API |

| **Feature Store** | Feast | Feature management |

| **Model Registry** | MLflow | Model storage |

| **Container Runtime** | Docker | Containerization |

| **Orchestration** | Kubernetes | Container orchestration |

| **Monitoring** | Prometheus + Grafana | Metrics & visualization |

| **Streaming** | Kafka | Real-time data |

| **Storage** | MinIO | Model artifacts |

---

## 🚀 Quick Start

### Prerequisites

```bash

# Required

- Python 3.11+

- Docker & Docker Compose

- Kubernetes cluster (optional)

# Optional for production

- MLflow server

- Airflow instance

- Prometheus + Grafana

```

### ⚡ One-Command Setup

```bash

# Clone repository

git clone https://github.com/devsuthar-ai/mlops-pipeline-framework.git

cd mlops-pipeline-framework

# Start all services

docker-compose up -d

# 🎉 Done! Access services:

# API: http://localhost:8001

# MLflow UI: http://localhost:5000

# Airflow UI: http://localhost:8080

# Grafana: http://localhost:3000

```

### 🐍 Local Development

```bash

# Create virtual environment

python -m venv venv

source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies

pip install -r requirements.txt

# Start API server

python src/main.py

# In new terminal, start MLflow

mlflow server --host 0.0.0.0 --port 5000

# In new terminal, start Airflow

airflow standalone

```

---

## 📖 Documentation

### 📚 **Complete Guides**

- [🏗️ Architecture Overview](docs/ARCHITECTURE.md)

- [📊 Data Pipeline Guide](docs/DATA_PIPELINE.md)

- [🤖 Model Training](docs/TRAINING.md)

- [🚀 Deployment Guide](docs/DEPLOYMENT.md)

- [📈 Monitoring & Alerts](docs/MONITORING.md)

- [🔧 Configuration](docs/CONFIGURATION.md)

---

## 🔧 Pipeline Components

### 📊 **1. Data Ingestion Pipeline**

```python

from src.data.ingestion import DataPipeline

# Initialize pipeline

pipeline = DataPipeline(

    source="s3://my-bucket/data",

    destination="feature_store",

    validation_rules={"schema": "v1.0"}

)

# Run ingestion

result = pipeline.ingest()

print(f"Ingested {result['rows']} rows")

```

**Features:**

- ✅ Multi-source support (S3, GCS, local, databases)

- ✅ Schema validation

- ✅ Data quality checks

- ✅ Incremental loading

- ✅ Error handling & retry logic

---

### 🤖 **2. Model Training Pipeline**

```python

from src.models.training import TrainingPipeline

# Configure training

config = {

    "model_type": "random_forest",

    "hyperparameters": {

        "n_estimators": 100,

        "max_depth": 10,

        "min_samples_split": 5

    },

    "training_data": "feature_store://train_v1",

    "validation_split": 0.2

}

# Initialize and run

pipeline = TrainingPipeline(config)

model = pipeline.train()

# Track with MLflow

pipeline.log_metrics({

    "accuracy": 0.95,

    "f1_score": 0.93,

    "training_time": 120.5

})

```

**Features:**

- ✅ Distributed training (PyTorch, TensorFlow)

- ✅ Automated hyperparameter tuning

- ✅ Cross-validation

- ✅ Early stopping

- ✅ Checkpointing

- ✅ MLflow integration

---

### 📈 **3. Model Evaluation**

```python

from src.models.evaluation import ModelEvaluator

# Evaluate model

evaluator = ModelEvaluator(model)

metrics = evaluator.evaluate(test_data)

print(f"""

Evaluation Results:

- Accuracy: {metrics['accuracy']:.3f}

- Precision: {metrics['precision']:.3f}

- Recall: {metrics['recall']:.3f}

- F1 Score: {metrics['f1']:.3f}

- AUC-ROC: {metrics['auc']:.3f}

""")

# Generate reports

evaluator.generate_report(output_path="reports/")

```

**Metrics Tracked:**

- Accuracy, Precision, Recall, F1

- ROC-AUC, PR-AUC

- Confusion Matrix

- Feature Importance

- Prediction Distribution

---

### 🚀 **4. Model Deployment**

```python

from src.serving.deployment import ModelDeployer

# Deploy model

deployer = ModelDeployer(

    model_uri="models:/production/RandomForest/v3",

    environment="production",

    replicas=3,

    resources={

        "cpu": "2",

        "memory": "4Gi"

    }

)

deployment = deployer.deploy()

print(f"Deployed at: {deployment['endpoint']}")

# Test endpoint

response = requests.post(

    deployment['endpoint'],

    json={"features": [1.2, 3.4, 5.6]}

)

print(f"Prediction: {response.json()['prediction']}")

```

**Deployment Features:**

- ✅ Rolling updates (zero downtime)

- ✅ Canary deployments

- ✅ A/B testing

- ✅ Auto-scaling

- ✅ Health checks

- ✅ Load balancing

---

### 📊 **5. Monitoring & Alerting**

```python

from src.monitoring.monitor import ModelMonitor

# Setup monitoring

monitor = ModelMonitor(

    model_name="RandomForest",

    metrics=["accuracy", "latency", "throughput"],

    alert_thresholds={

        "accuracy_drop": 0.05,

        "latency_p95": 500  # ms

    }

)

# Start monitoring

monitor.start()

# View dashboard

monitor.show_dashboard()

```

**Monitored Metrics:**

- Model performance (accuracy, F1, etc.)

- Prediction latency (p50, p95, p99)

- Throughput (predictions/sec)

- Resource usage (CPU, memory)

- Data drift detection

- Concept drift detection

---

## 🧪 Model Training

### Training Script Example

```python

# train.py

import mlflow

from sklearn.ensemble import RandomForestClassifier

from src.data import load_data

from src.models import train_model, evaluate_model

# Load data

X_train, y_train, X_test, y_test = load_data()

# Start MLflow run

with mlflow.start_run():

    # Train model

    model = RandomForestClassifier(

        n_estimators=100,

        max_depth=10,

        random_state=42

    )

    model.fit(X_train, y_train)

    

    # Evaluate

    metrics = evaluate_model(model, X_test, y_test)

    

    # Log to MLflow

    mlflow.log_params({

        "n_estimators": 100,

        "max_depth": 10

    })

    mlflow.log_metrics(metrics)

    mlflow.sklearn.log_model(model, "model")

    

    print(f"Model trained! Accuracy: {metrics['accuracy']:.3f}")

```

### Run Training

```bash

# Local training

python train.py

# Distributed training

python -m torch.distributed.launch train_distributed.py

# With Airflow

airflow dags trigger training_pipeline

```

---

## 🚢 Model Deployment

### Deployment Configuration

```yaml

# deployment.yaml

apiVersion: apps/v1

kind: Deployment

metadata:

  name: ml-model-serving

spec:

  replicas: 3

  selector:

    matchLabels:

      app: ml-model

  template:

    metadata:

      labels:

        app: ml-model

    spec:

      containers:

      - name: model-server

        image: ml-model:v1.0

        ports:

        - containerPort: 8001

        resources:

          requests:

            memory: "2Gi"

            cpu: "1"

          limits:

            memory: "4Gi"

            cpu: "2"

        env:

        - name: MODEL_URI

          value: "models:/production/latest"

```

### Deploy to Kubernetes

```bash

# Apply deployment

kubectl apply -f deployment.yaml

# Check status

kubectl get pods -l app=ml-model

# Expose service

kubectl expose deployment ml-model-serving --type=LoadBalancer --port=80 --target-port=8001

# Get endpoint

kubectl get svc ml-model-serving

```

---

## 📊 Monitoring

### Prometheus Metrics

```python

from prometheus_client import Counter, Histogram, Gauge

# Define metrics

prediction_counter = Counter(

    'model_predictions_total',

    'Total predictions made'

)

prediction_latency = Histogram(

    'model_prediction_latency_seconds',

    'Prediction latency'

)

model_accuracy = Gauge(

    'model_accuracy',

    'Current model accuracy'

)

# Use in code

@prediction_latency.time()

def predict(features):

    prediction_counter.inc()

    result = model.predict(features)

    return result

```

### Grafana Dashboards

Access dashboards at `http://localhost:3000`

**Available Dashboards:**

1. **Model Performance**

   - Accuracy over time

   - Precision/Recall trends

   - Confusion matrix heatmap

2. **System Metrics**

   - CPU/Memory usage

   - Request rate

   - Error rate

3. **Prediction Analytics**

   - Latency distribution

   - Throughput

   - Feature distribution

4. **Data Drift**

   - Feature drift detection

   - Concept drift alerts

   - Distribution changes

---

## 🛠️ Tech Stack



### **ML & Data**

![Python](https://img.shields.io/badge/Python-3776AB?style=for-the-badge&logo=python&logoColor=white)

![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white)

![TensorFlow](https://img.shields.io/badge/TensorFlow-FF6F00?style=for-the-badge&logo=tensorflow&logoColor=white)

![scikit-learn](https://img.shields.io/badge/scikit--learn-F7931E?style=for-the-badge&logo=scikit-learn&logoColor=white)

### **MLOps Tools**

![MLflow](https://img.shields.io/badge/MLflow-0194E2?style=for-the-badge&logo=mlflow&logoColor=white)

![Airflow](https://img.shields.io/badge/Airflow-017CEE?style=for-the-badge&logo=apache-airflow&logoColor=white)

![Feast](https://img.shields.io/badge/Feast-6E40C9?style=for-the-badge)

### **Infrastructure**

![Docker](https://img.shields.io/badge/Docker-2496ED?style=for-the-badge&logo=docker&logoColor=white)

![Kubernetes](https://img.shields.io/badge/Kubernetes-326CE5?style=for-the-badge&logo=kubernetes&logoColor=white)

![Prometheus](https://img.shields.io/badge/Prometheus-E6522C?style=for-the-badge&logo=prometheus&logoColor=white)

![Grafana](https://img.shields.io/badge/Grafana-F46800?style=for-the-badge&logo=grafana&logoColor=white)

### **API & Serving**

![FastAPI](https://img.shields.io/badge/FastAPI-009688?style=for-the-badge&logo=fastapi&logoColor=white)

![Kafka](https://img.shields.io/badge/Kafka-231F20?style=for-the-badge&logo=apache-kafka&logoColor=white)



---

## 📁 Project Structure

```

mlops-pipeline-framework/

├── src/

│   ├── data/                    # Data pipelines

│   │   ├── ingestion/          # Data ingestion

│   │   ├── preprocessing/      # Data preprocessing

│   │   └── validation/         # Data validation

│   ├── models/                  # Model code

│   │   ├── training/           # Training logic

│   │   ├── evaluation/         # Evaluation

│   │   └── registry/           # Model registry

│   ├── serving/                 # Model serving

│   │   ├── api/               # FastAPI endpoints

│   │   ├── batch/             # Batch inference

│   │   └── streaming/         # Stream processing

│   ├── monitoring/             # Monitoring

│   │   ├── metrics.py         # Metrics collection

│   │   └── alerts.py          # Alert rules

│   └── orchestration/          # Workflow orchestration

│       └── dags/              # Airflow DAGs

├── pipelines/                  # Pipeline definitions

│   ├── training_pipeline.py   # Training workflow

│   ├── inference_pipeline.py  # Inference workflow

│   └── retraining_pipeline.py # Auto-retraining

├── tests/                      # Tests

│   ├── unit/                  # Unit tests

│   ├── integration/           # Integration tests

│   └── e2e/                   # End-to-end tests

├── configs/                    # Configuration files

│   ├── model_config.yaml      # Model configs

│   ├── pipeline_config.yaml   # Pipeline configs

│   └── deployment_config.yaml # Deployment configs

├── deployments/                # Deployment manifests

│   ├── kubernetes/            # K8s manifests

│   └── docker/                # Docker configs

├── docs/                       # Documentation

├── monitoring/                 # Monitoring configs

│   ├── prometheus/            # Prometheus setup

│   └── grafana/              # Grafana dashboards

├── scripts/                    # Utility scripts

├── main.py                     # Application entry

├── requirements.txt           # Dependencies

└── README.md                  # This file

```

---

## 🚀 Usage Examples

### Complete ML Pipeline

```python

from src.pipeline import MLPipeline

# Initialize pipeline

pipeline = MLPipeline(

    name="fraud_detection",

    config_path="configs/fraud_model.yaml"

)

# Run full pipeline

results = pipeline.run(

    data_source="s3://data/transactions.csv",

    experiment_name="fraud_detection_v2"

)

print(f"""

Pipeline Results:

- Model: {results['model_uri']}

- Accuracy: {results['metrics']['accuracy']:.3f}

- Deployment: {results['deployment']['endpoint']}

""")

```

### Batch Inference

```python

from src.serving.batch import BatchPredictor

# Initialize predictor

predictor = BatchPredictor(

    model_uri="models:/production/fraud_model/latest"

)

# Run batch predictions

predictions = predictor.predict_batch(

    input_path="s3://data/new_transactions.csv",

    output_path="s3://predictions/results.csv"

)

print(f"Processed {len(predictions)} predictions")

```

### Real-time Serving

```python

from fastapi import FastAPI

from src.serving import ModelServer

app = FastAPI()

model_server = ModelServer("models:/production/latest")

@app.post("/predict")

async def predict(features: dict):

    prediction = model_server.predict(features)

    return {

        "prediction": prediction,

        "model_version": model_server.version,

        "latency_ms": model_server.last_latency

    }

```

---

## 📊 Performance

### Benchmarks

| Metric | Value | Target |

|--------|-------|--------|

| **Training Time** | 15 min | < 20 min |

| **Inference Latency (p50)** | 25ms | < 50ms |

| **Inference Latency (p95)** | 45ms | < 100ms |

| **Throughput** | 2000 pred/sec | > 1000 pred/sec |

| **Model Accuracy** | 96.5% | > 95% |

| **Deployment Time** | 2 min | < 5 min |

---

## 🤝 Contributing

Contributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

---

## 📄 License

MIT License - see [LICENSE](LICENSE) file.

---

## 📞 Contact



[![GitHub](https://img.shields.io/badge/GitHub-devsuthar--ai-181717?style=for-the-badge&logo=github)](https://github.com/devsuthar-ai)

[![LinkedIn](https://img.shields.io/badge/LinkedIn-Dev_Suthar-0A66C2?style=for-the-badge&logo=linkedin)](https://linkedin.com/in/devsuthar)

[![Email](https://img.shields.io/badge/Email-Contact-D14836?style=for-the-badge&logo=gmail&logoColor=white)](mailto:dev.suthar@example.com)

---

**Made with ❤️ by Dev Suthar**

*Building production ML systems at scale*

⭐ **Star this repo if you find it helpful!**
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/devsuthar-ai/mlops-pipeline-framework

Awesome Lists containing this project

README