https://github.com/devsuthar-ai/mlops-pipeline-framework
π§ Production ML pipeline framework for model training, deployment, and monitoring. Features: MLflow, Airflow, FastAPI serving, Prometheus monitoring, Kubernetes deployment.
https://github.com/devsuthar-ai/mlops-pipeline-framework
airflow ci-cd data-science docker fastapi kubernetes machine-learning ml-pipeline mlflow mlops model-deployment python
Last synced: 7 months ago
JSON representation
π§ Production ML pipeline framework for model training, deployment, and monitoring. Features: MLflow, Airflow, FastAPI serving, Prometheus monitoring, Kubernetes deployment.
- Host: GitHub
- URL: https://github.com/devsuthar-ai/mlops-pipeline-framework
- Owner: devsuthar-ai
- Created: 2025-11-08T06:40:42.000Z (8 months ago)
- Default Branch: master
- Last Pushed: 2025-11-08T11:11:59.000Z (8 months ago)
- Last Synced: 2025-11-08T13:09:45.536Z (8 months ago)
- Topics: airflow, ci-cd, data-science, docker, fastapi, kubernetes, machine-learning, ml-pipeline, mlflow, mlops, model-deployment, python
- Language: Python
- Homepage:
- Size: 7.81 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# π§ MLOps Pipeline Framework
### *Production Machine Learning Operations Platform*
[](https://www.python.org/)
[](https://fastapi.tiangolo.com/)
[](https://mlflow.org/)
[](https://kubernetes.io/)
[](https://www.docker.com/)
[](LICENSE)
**[Live Demo](#) β’ [Documentation](#) β’ [API Docs](#) β’ [Report Bug](../../issues) β’ [Request Feature](../../issues)**
---
### π― *End-to-end ML pipeline orchestration for production deployments*
Built with β€οΈ by [Dev Suthar](https://github.com/devsuthar-ai) | β **Star us on GitHub!**
---
## π Table of Contents
- [β¨ Features](#-features)
- [π¬ Demo](#-demo)
- [ποΈ Architecture](#οΈ-architecture)
- [π Quick Start](#-quick-start)
- [π Documentation](#-documentation)
- [π§ Pipeline Components](#-pipeline-components)
- [π§ͺ Model Training](#-model-training)
- [π’ Model Deployment](#-model-deployment)
- [π Monitoring](#-monitoring)
- [π οΈ Tech Stack](#οΈ-tech-stack)
- [π€ Contributing](#-contributing)
- [π License](#-license)
---
## β¨ Features
### π― **Core Capabilities**
- π **Data Pipeline** - Automated ingestion & validation
- π€ **Model Training** - Distributed training orchestration
- π― **Hyperparameter Tuning** - Automated optimization
- π **Experiment Tracking** - MLflow integration
- π **Model Deployment** - One-click deployment
- π **Model Monitoring** - Real-time performance tracking
### π οΈ **Technical Excellence**
- βΈοΈ **Kubernetes Native** - Auto-scaling & orchestration
- π **CI/CD Integration** - Automated ML workflows
- π **Observability** - Prometheus + Grafana
- π§ͺ **A/B Testing** - Model comparison
- π **Auto-Retraining** - Scheduled model updates
- π¦ **Model Registry** - Version management
---
## π¬ Demo
### πΌοΈ **Platform Screenshots**
**π Pipeline Dashboard**

**π Model Tracking**

**π Deployment Manager**

**π Performance Monitoring**

---
## ποΈ Architecture
```mermaid
graph TB
subgraph "Data Layer"
A[Data Sources] --> B[Data Ingestion Service]
B --> C[Data Validation]
C --> D[(Feature Store)]
end
subgraph "Training Layer"
D --> E[Training Pipeline]
E --> F[Hyperparameter Tuning]
F --> G[Model Evaluation]
G --> H[(Model Registry)]
end
subgraph "Serving Layer"
H --> I[Model Deployment]
I --> J[A/B Testing]
J --> K[Prediction Service]
K --> L[Load Balancer]
end
subgraph "Monitoring Layer"
K --> M[Performance Monitor]
M --> N[Drift Detection]
N --> O[Alert System]
O --> E
end
subgraph "Orchestration"
P[Airflow] -.-> E
P -.-> I
P -.-> M
end
subgraph "Observability"
Q[Prometheus] --> R[Grafana]
K --> Q
M --> Q
end
style E fill:#667eea
style I fill:#764ba2
style M fill:#f093fb
style H fill:#4facfe
```
### π **System Components**
| Component | Technology | Purpose |
|-----------|-----------|---------|
| **Orchestration** | Airflow | Workflow management |
| **Experiment Tracking** | MLflow | Model versioning |
| **Model Serving** | FastAPI | High-performance API |
| **Feature Store** | Feast | Feature management |
| **Model Registry** | MLflow | Model storage |
| **Container Runtime** | Docker | Containerization |
| **Orchestration** | Kubernetes | Container orchestration |
| **Monitoring** | Prometheus + Grafana | Metrics & visualization |
| **Streaming** | Kafka | Real-time data |
| **Storage** | MinIO | Model artifacts |
---
## π Quick Start
### Prerequisites
```bash
# Required
- Python 3.11+
- Docker & Docker Compose
- Kubernetes cluster (optional)
# Optional for production
- MLflow server
- Airflow instance
- Prometheus + Grafana
```
### β‘ One-Command Setup
```bash
# Clone repository
git clone https://github.com/devsuthar-ai/mlops-pipeline-framework.git
cd mlops-pipeline-framework
# Start all services
docker-compose up -d
# π Done! Access services:
# API: http://localhost:8001
# MLflow UI: http://localhost:5000
# Airflow UI: http://localhost:8080
# Grafana: http://localhost:3000
```
### π Local Development
```bash
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Start API server
python src/main.py
# In new terminal, start MLflow
mlflow server --host 0.0.0.0 --port 5000
# In new terminal, start Airflow
airflow standalone
```
---
## π Documentation
### π **Complete Guides**
- [ποΈ Architecture Overview](docs/ARCHITECTURE.md)
- [π Data Pipeline Guide](docs/DATA_PIPELINE.md)
- [π€ Model Training](docs/TRAINING.md)
- [π Deployment Guide](docs/DEPLOYMENT.md)
- [π Monitoring & Alerts](docs/MONITORING.md)
- [π§ Configuration](docs/CONFIGURATION.md)
---
## π§ Pipeline Components
### π **1. Data Ingestion Pipeline**
```python
from src.data.ingestion import DataPipeline
# Initialize pipeline
pipeline = DataPipeline(
source="s3://my-bucket/data",
destination="feature_store",
validation_rules={"schema": "v1.0"}
)
# Run ingestion
result = pipeline.ingest()
print(f"Ingested {result['rows']} rows")
```
**Features:**
- β
Multi-source support (S3, GCS, local, databases)
- β
Schema validation
- β
Data quality checks
- β
Incremental loading
- β
Error handling & retry logic
---
### π€ **2. Model Training Pipeline**
```python
from src.models.training import TrainingPipeline
# Configure training
config = {
"model_type": "random_forest",
"hyperparameters": {
"n_estimators": 100,
"max_depth": 10,
"min_samples_split": 5
},
"training_data": "feature_store://train_v1",
"validation_split": 0.2
}
# Initialize and run
pipeline = TrainingPipeline(config)
model = pipeline.train()
# Track with MLflow
pipeline.log_metrics({
"accuracy": 0.95,
"f1_score": 0.93,
"training_time": 120.5
})
```
**Features:**
- β
Distributed training (PyTorch, TensorFlow)
- β
Automated hyperparameter tuning
- β
Cross-validation
- β
Early stopping
- β
Checkpointing
- β
MLflow integration
---
### π **3. Model Evaluation**
```python
from src.models.evaluation import ModelEvaluator
# Evaluate model
evaluator = ModelEvaluator(model)
metrics = evaluator.evaluate(test_data)
print(f"""
Evaluation Results:
- Accuracy: {metrics['accuracy']:.3f}
- Precision: {metrics['precision']:.3f}
- Recall: {metrics['recall']:.3f}
- F1 Score: {metrics['f1']:.3f}
- AUC-ROC: {metrics['auc']:.3f}
""")
# Generate reports
evaluator.generate_report(output_path="reports/")
```
**Metrics Tracked:**
- Accuracy, Precision, Recall, F1
- ROC-AUC, PR-AUC
- Confusion Matrix
- Feature Importance
- Prediction Distribution
---
### π **4. Model Deployment**
```python
from src.serving.deployment import ModelDeployer
# Deploy model
deployer = ModelDeployer(
model_uri="models:/production/RandomForest/v3",
environment="production",
replicas=3,
resources={
"cpu": "2",
"memory": "4Gi"
}
)
deployment = deployer.deploy()
print(f"Deployed at: {deployment['endpoint']}")
# Test endpoint
response = requests.post(
deployment['endpoint'],
json={"features": [1.2, 3.4, 5.6]}
)
print(f"Prediction: {response.json()['prediction']}")
```
**Deployment Features:**
- β
Rolling updates (zero downtime)
- β
Canary deployments
- β
A/B testing
- β
Auto-scaling
- β
Health checks
- β
Load balancing
---
### π **5. Monitoring & Alerting**
```python
from src.monitoring.monitor import ModelMonitor
# Setup monitoring
monitor = ModelMonitor(
model_name="RandomForest",
metrics=["accuracy", "latency", "throughput"],
alert_thresholds={
"accuracy_drop": 0.05,
"latency_p95": 500 # ms
}
)
# Start monitoring
monitor.start()
# View dashboard
monitor.show_dashboard()
```
**Monitored Metrics:**
- Model performance (accuracy, F1, etc.)
- Prediction latency (p50, p95, p99)
- Throughput (predictions/sec)
- Resource usage (CPU, memory)
- Data drift detection
- Concept drift detection
---
## π§ͺ Model Training
### Training Script Example
```python
# train.py
import mlflow
from sklearn.ensemble import RandomForestClassifier
from src.data import load_data
from src.models import train_model, evaluate_model
# Load data
X_train, y_train, X_test, y_test = load_data()
# Start MLflow run
with mlflow.start_run():
# Train model
model = RandomForestClassifier(
n_estimators=100,
max_depth=10,
random_state=42
)
model.fit(X_train, y_train)
# Evaluate
metrics = evaluate_model(model, X_test, y_test)
# Log to MLflow
mlflow.log_params({
"n_estimators": 100,
"max_depth": 10
})
mlflow.log_metrics(metrics)
mlflow.sklearn.log_model(model, "model")
print(f"Model trained! Accuracy: {metrics['accuracy']:.3f}")
```
### Run Training
```bash
# Local training
python train.py
# Distributed training
python -m torch.distributed.launch train_distributed.py
# With Airflow
airflow dags trigger training_pipeline
```
---
## π’ Model Deployment
### Deployment Configuration
```yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-model-serving
spec:
replicas: 3
selector:
matchLabels:
app: ml-model
template:
metadata:
labels:
app: ml-model
spec:
containers:
- name: model-server
image: ml-model:v1.0
ports:
- containerPort: 8001
resources:
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "4Gi"
cpu: "2"
env:
- name: MODEL_URI
value: "models:/production/latest"
```
### Deploy to Kubernetes
```bash
# Apply deployment
kubectl apply -f deployment.yaml
# Check status
kubectl get pods -l app=ml-model
# Expose service
kubectl expose deployment ml-model-serving --type=LoadBalancer --port=80 --target-port=8001
# Get endpoint
kubectl get svc ml-model-serving
```
---
## π Monitoring
### Prometheus Metrics
```python
from prometheus_client import Counter, Histogram, Gauge
# Define metrics
prediction_counter = Counter(
'model_predictions_total',
'Total predictions made'
)
prediction_latency = Histogram(
'model_prediction_latency_seconds',
'Prediction latency'
)
model_accuracy = Gauge(
'model_accuracy',
'Current model accuracy'
)
# Use in code
@prediction_latency.time()
def predict(features):
prediction_counter.inc()
result = model.predict(features)
return result
```
### Grafana Dashboards
Access dashboards at `http://localhost:3000`
**Available Dashboards:**
1. **Model Performance**
- Accuracy over time
- Precision/Recall trends
- Confusion matrix heatmap
2. **System Metrics**
- CPU/Memory usage
- Request rate
- Error rate
3. **Prediction Analytics**
- Latency distribution
- Throughput
- Feature distribution
4. **Data Drift**
- Feature drift detection
- Concept drift alerts
- Distribution changes
---
## π οΈ Tech Stack
### **ML & Data**




### **MLOps Tools**



### **Infrastructure**




### **API & Serving**


---
## π Project Structure
```
mlops-pipeline-framework/
βββ src/
β βββ data/ # Data pipelines
β β βββ ingestion/ # Data ingestion
β β βββ preprocessing/ # Data preprocessing
β β βββ validation/ # Data validation
β βββ models/ # Model code
β β βββ training/ # Training logic
β β βββ evaluation/ # Evaluation
β β βββ registry/ # Model registry
β βββ serving/ # Model serving
β β βββ api/ # FastAPI endpoints
β β βββ batch/ # Batch inference
β β βββ streaming/ # Stream processing
β βββ monitoring/ # Monitoring
β β βββ metrics.py # Metrics collection
β β βββ alerts.py # Alert rules
β βββ orchestration/ # Workflow orchestration
β βββ dags/ # Airflow DAGs
βββ pipelines/ # Pipeline definitions
β βββ training_pipeline.py # Training workflow
β βββ inference_pipeline.py # Inference workflow
β βββ retraining_pipeline.py # Auto-retraining
βββ tests/ # Tests
β βββ unit/ # Unit tests
β βββ integration/ # Integration tests
β βββ e2e/ # End-to-end tests
βββ configs/ # Configuration files
β βββ model_config.yaml # Model configs
β βββ pipeline_config.yaml # Pipeline configs
β βββ deployment_config.yaml # Deployment configs
βββ deployments/ # Deployment manifests
β βββ kubernetes/ # K8s manifests
β βββ docker/ # Docker configs
βββ docs/ # Documentation
βββ monitoring/ # Monitoring configs
β βββ prometheus/ # Prometheus setup
β βββ grafana/ # Grafana dashboards
βββ scripts/ # Utility scripts
βββ main.py # Application entry
βββ requirements.txt # Dependencies
βββ README.md # This file
```
---
## π Usage Examples
### Complete ML Pipeline
```python
from src.pipeline import MLPipeline
# Initialize pipeline
pipeline = MLPipeline(
name="fraud_detection",
config_path="configs/fraud_model.yaml"
)
# Run full pipeline
results = pipeline.run(
data_source="s3://data/transactions.csv",
experiment_name="fraud_detection_v2"
)
print(f"""
Pipeline Results:
- Model: {results['model_uri']}
- Accuracy: {results['metrics']['accuracy']:.3f}
- Deployment: {results['deployment']['endpoint']}
""")
```
### Batch Inference
```python
from src.serving.batch import BatchPredictor
# Initialize predictor
predictor = BatchPredictor(
model_uri="models:/production/fraud_model/latest"
)
# Run batch predictions
predictions = predictor.predict_batch(
input_path="s3://data/new_transactions.csv",
output_path="s3://predictions/results.csv"
)
print(f"Processed {len(predictions)} predictions")
```
### Real-time Serving
```python
from fastapi import FastAPI
from src.serving import ModelServer
app = FastAPI()
model_server = ModelServer("models:/production/latest")
@app.post("/predict")
async def predict(features: dict):
prediction = model_server.predict(features)
return {
"prediction": prediction,
"model_version": model_server.version,
"latency_ms": model_server.last_latency
}
```
---
## π Performance
### Benchmarks
| Metric | Value | Target |
|--------|-------|--------|
| **Training Time** | 15 min | < 20 min |
| **Inference Latency (p50)** | 25ms | < 50ms |
| **Inference Latency (p95)** | 45ms | < 100ms |
| **Throughput** | 2000 pred/sec | > 1000 pred/sec |
| **Model Accuracy** | 96.5% | > 95% |
| **Deployment Time** | 2 min | < 5 min |
---
## π€ Contributing
Contributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
---
## π License
MIT License - see [LICENSE](LICENSE) file.
---
## π Contact
[](https://github.com/devsuthar-ai)
[](https://linkedin.com/in/devsuthar)
[](mailto:dev.suthar@example.com)
---
**Made with β€οΈ by Dev Suthar**
*Building production ML systems at scale*
β **Star this repo if you find it helpful!**