{"id":29973321,"url":"https://github.com/mohitgupta0123/fraud_detection_mlops","last_synced_at":"2026-04-11T17:37:09.223Z","repository":{"id":307970967,"uuid":"1029655794","full_name":"MohitGupta0123/Fraud_Detection_MLOps","owner":"MohitGupta0123","description":"End-to-end Fraud Detection MLOps pipeline integrating MLflow, FastAPI, Streamlit, Docker, Kubernetes, Prometheus, and Grafana for real-time fraud prediction, experiment tracking, and monitoring.","archived":false,"fork":false,"pushed_at":"2025-08-03T10:57:08.000Z","size":11617,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-03T11:35:16.891Z","etag":null,"topics":["anomaly-detection","docker","end-to-end-pipeline","fastapi","fraud-detection","grafana","imbalanced-data","kubernetes","machine-learning","mlflow","mlops-project","prometheus","python","streamlit"],"latest_commit_sha":null,"homepage":"https://frauddetectionmlops.streamlit.app/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MohitGupta0123.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-31T11:20:49.000Z","updated_at":"2025-08-03T10:57:12.000Z","dependencies_parsed_at":"2025-08-03T11:52:52.463Z","dependency_job_id":null,"html_url":"https://github.com/MohitGupta0123/Fraud_Detection_MLOps","commit_stats":null,"previous_names":["mohitgupta0123/fraud_detection_mlops"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/MohitGupta0123/Fraud_Detection_MLOps","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MohitGupta0123%2FFraud_Detection_MLOps","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MohitGupta0123%2FFraud_Detection_MLOps/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MohitGupta0123%2FFraud_Detection_MLOps/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MohitGupta0123%2FFraud_Detection_MLOps/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MohitGupta0123","download_url":"https://codeload.github.com/MohitGupta0123/Fraud_Detection_MLOps/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MohitGupta0123%2FFraud_Detection_MLOps/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268660071,"owners_count":24286009,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-04T02:00:09.867Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomaly-detection","docker","end-to-end-pipeline","fastapi","fraud-detection","grafana","imbalanced-data","kubernetes","machine-learning","mlflow","mlops-project","prometheus","python","streamlit"],"created_at":"2025-08-04T07:00:43.030Z","updated_at":"2026-04-11T17:37:09.195Z","avatar_url":"https://github.com/MohitGupta0123.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# **Fraud Detection MLOps Pipeline**\n\nThe **Fraud Detection MLOps Pipeline** is an end-to-end system designed to identify potentially fraudulent financial transactions with high accuracy and scalability. This project integrates **Machine Learning (ML) with MLOps principles** to ensure robust experimentation, deployment, and real-time monitoring of fraud detection models.\n\n\u003c!-- Tech Stack Badges --\u003e\n![Python](https://img.shields.io/badge/Python-3.12-blue?logo=python)\n![MLflow](https://img.shields.io/badge/MLflow-Tracking-orange?logo=mlflow)\n![Streamlit](https://img.shields.io/badge/Streamlit-UI-red?logo=streamlit)\n![FastAPI](https://img.shields.io/badge/FastAPI-Backend-green?logo=fastapi)\n![Docker](https://img.shields.io/badge/Docker-Container-blue?logo=docker)\n![Kubernetes](https://img.shields.io/badge/Kubernetes-Orchestration-blue?logo=kubernetes)\n![Prometheus](https://img.shields.io/badge/Prometheus-Monitoring-orange?logo=prometheus)\n![Grafana](https://img.shields.io/badge/Grafana-Dashboards-yellow?logo=grafana)\n\n\u003c!-- Project Info Badges --\u003e\n![Contributions](https://img.shields.io/badge/Contributions-Welcome-brightgreen)\n![Status](https://img.shields.io/badge/Status-Active-success)\n![Issues](https://img.shields.io/github/issues/MohitGupta0123/Fraud_Detection_MLOps)\n![Last Commit](https://img.shields.io/github/last-commit/MohitGupta0123/Fraud_Detection_MLOps)\n![Repo Size](https://img.shields.io/github/repo-size/MohitGupta0123/Fraud_Detection_MLOps)\n\u003c!-- ![Stars](https://img.shields.io/github/stars/MohitGupta0123/Fraud_Detection_MLOps)\n![Forks](https://img.shields.io/github/forks/MohitGupta0123/Fraud_Detection_MLOps) --\u003e\n\n## DEMO LINK - [LINK](https://frauddetectionmlops.streamlit.app/)\n\n---\n\n## **Table of Contents**\n\n1. [Project Overview](#1-project-overview)\n2. [Tech Stack](#2-tech-stack)\n3. [Architecture Diagrams](#3-architecture-diagrams)\n4. [Features](#4-features)\n5. [Directory Structure](#5-directory-structure)\n6. [Setup Instructions](#6-setup-instructions)\n7. [Running the Streamlit App](#7-running-the-streamlit-app)\n8. [Running the FastAPI Service](#8-running-the-fastapi-service)\n9. [Experiment Tracking with MLflow](#9-experiment-tracking-with-mlflow)\n10. [Monitoring with Prometheus \u0026 Grafana](#10-monitoring-with-prometheus--grafana)\n11. [Model Details](#11-model-details)\n12. [Results \u0026 Metrics](#12-results--metrics)\n13. [Screenshots](#13-screenshots)\n14. [Future Work](#14-future-work)\n15. [Author / Contact](#15-author--contact)\n\n## **1. Project Overview**\n\n### **Objectives**\n\n* This project implements a **complete MLOps pipeline** for **fraud detection** using transactional data. It covers the **entire ML lifecycle**\n* Build a modular **FraudPipeline** capable of feature engineering, preprocessing, resampling (SMOTE), and threshold tuning.\n* Track experiments using **MLflow** for reproducibility and comparative analysis.\n* Deploy the model using **FastAPI** for REST API services and **Streamlit** for an interactive UI.\n* Containerize and orchestrate services using **Docker** and **Kubernetes (Minikube)**.\n* Monitor system health and metrics using **Prometheus** and **Grafana** dashboards.\n\n### **Goal** \n\nDetect fraudulent transactions in real-time with **high recall** while minimizing false positives.\n\n---\n\n## **2. Tech Stack**\n\n### **Languages**\n\n* **Python 3.12+**\n\n### **Core ML \u0026 Data Libraries**\n\n* **Scikit-learn**: Model building, preprocessing, metrics.\n* **Imbalanced-learn**: SMOTE for class imbalance handling.\n* **Pandas / NumPy**: Data manipulation and numerical operations.\n\n### **MLOps \u0026 Deployment Tools**\n\n* **MLflow**: Experiment tracking, logging metrics, model registry.\n* **FastAPI**: Serving the fraud detection model via REST API.\n* **Streamlit**: Interactive web UI for predictions and model insights.\n* **Docker**: Containerization of the FastAPI and Streamlit apps.\n* **Kubernetes (Minikube)**: Local orchestration and scaling of microservices.\n\n### **Monitoring Tools**\n\n* **Prometheus**: Metrics scraping for FastAPI endpoints.\n* **Grafana**: Visualization dashboards for system and API monitoring.\n\n---\n\n## **3. Architecture Diagrams**\n\n### **MLOps Pipeline**\n\nThe complete pipeline involves:\n\n1. **Data Ingestion \u0026 Preprocessing**\n2. **Model Training \u0026 Threshold Optimization**\n3. **Experiment Tracking with MLflow**\n4. **Model Deployment via FastAPI \u0026 Streamlit**\n5. **Containerization with Docker**\n6. **Orchestration using Kubernetes (Minikube)**\n7. **Monitoring using Prometheus + Grafana**\n\n![MLOps Architecture](Images/MLOps_Architecture/image.png)\n\n---\n\n### **Model Pipeline**\n\n1. **Feature Engineering**: Interaction, ratio, binning, time-of-day categorization.\n2. **Preprocessing**: Imputation, encoding, log transform, scaling.\n3. **Resampling**: SMOTE to address class imbalance.\n4. **Model Training**: Logistic Regression (configurable to RandomForest/XGBoost).\n5. **Threshold Tuning**: Optimize precision-recall trade-off for fraud detection.\n\n![Model Architecture](Images/Model_Architecture/image.png)\n\n---\n\n## **4. Features**\n\n* **Real-Time Fraud Prediction**:\n\n  * Streamlit UI for quick predictions.\n  * FastAPI endpoint for programmatic integration.\n\n* **Experiment Tracking**:\n\n  * MLflow logs parameters, metrics, artifacts (confusion matrix, PR curve).\n\n* **Scalable Deployment**:\n\n  * Dockerized microservices deployed on Kubernetes (Minikube).\n\n* **Robust Monitoring**:\n\n  * Prometheus scrapes real-time metrics from FastAPI.\n  * Grafana dashboards visualize system health and request patterns.\n\n* **Data Handling**:\n\n  * Automatic preprocessing (missing values, scaling, encoding).\n  * SMOTE resampling for highly imbalanced fraud datasets.\n\n* **Threshold Optimization**:\n\n  * Dynamically finds the best threshold balancing recall and precision.\n\n## **5. Directory Structure**\n\nThe project follows a modular structure separating API, model, monitoring, and visualization components:\n\n```\nFRAUD_MLOPS_PROJECT/\n│\n├── API/                         # FastAPI microservice\n│   ├── main.py                   # API entry point\n│   ├── schemas.py                # Pydantic models for request/response\n│   ├── services.py               # Core service logic\n│   └── mlruns/                   # MLflow experiment tracking logs\n│\n├── Data/                         # Datasets\n│   ├── payment_fraud.csv\n│   └── combined_holdout.csv\n│\n├── Images/                       # Project diagrams \u0026 screenshots\n│   ├── Docker/\n│   ├── FastAPI/\n│   ├── Grafana/\n│   ├── MLFlow/\n│   ├── MLOps_Architecture/\n│   ├── Model_Architecture/\n│   └── Prometheus/\n│\n├── K8s/                          # Kubernetes manifests\n│   ├── fraud-api-deployment.yaml\n│   ├── fraud-api-service.yaml\n│   ├── grafana-deployment.yaml\n│   └── prometheus-deployment.yaml\n│\n├── Notebooks/                    # Jupyter Notebooks\n│   ├── EDA.ipynb\n│   ├── training_model.ipynb\n│   ├── test_files.ipynb\n│   └── artifacts/                # Trained model artifacts\n│       ├── confusion_matrix.png\n│       ├── pr_curve.png\n│       └── fraud_pipeline_deployed.pkl\n│\n├── Pages/                        # Streamlit multi-page app\n│   ├── home.py\n│   ├── about_model.py\n│   ├── metrics_page.py\n│   └── about_me.py\n│\n├── Src/                          # Core ML pipeline code\n│   ├── model.py                   # FraudPipeline, FeatureEngineering, Preprocessing\n│   ├── utils.py                   # Helper functions\n│   ├── config.py                  # Configurations\n│   └── artifacts/                 # MLflow model logs\n│\n├── app.py                         # Streamlit entry point\n├── Dockerfile                      # Docker setup for Streamlit/FastAPI\n├── requirements.txt                # Dependencies\n├── .gitignore\n└── README.md\n```\n\n---\n\n## **6. Setup Instructions**\n\n### **Prerequisites**\n\n* Python 3.10 or higher\n* Docker Desktop\n* Minikube (for Kubernetes)\n* kubectl CLI\n* Prometheus \u0026 Grafana (installed via Helm or K8s manifests)\n\n---\n\n### **Local Development Setup**\n\n1. **Clone the repository**\n\n```bash\ngit clone https://github.com/MohitGupta0123/Fraud_Detection_MLOps.git\ncd Fraud_Detection_MLOps\n```\n\n2. **Create virtual environment \u0026 install dependencies**\n\n```bash\npython -m venv .venv\nsource .venv/bin/activate    # Linux/Mac\n.venv\\Scripts\\activate       # Windows\npip install -r requirements.txt\n```\n\n3. **Run Streamlit app locally**\n\n```bash\nstreamlit run app.py\n```\n\n4. **Run FastAPI service locally**\n\n```bash\ncd API\nuvicorn main:app --reload --host 0.0.0.0 --port 8000\n```\n\n---\n\n### **Docker Setup**\n\n1. **Build Docker images**\n\n```bash\ndocker build -t fraud-streamlit -f Dockerfile .\ndocker build -t fraud-fastapi -f Dockerfile ./API\n```\n\n2. **Run containers**\n\n```bash\ndocker run -p 8501:8501 fraud-streamlit\ndocker run -p 8000:8000 fraud-fastapi\n```\n\n---\n\n### **Kubernetes Deployment (Minikube)**\n\n1. **Start Minikube**\n\n```bash\nminikube start --driver=docker\n```\n\n2. **Apply Kubernetes manifests**\n\n```bash\nkubectl apply -f K8s/fraud-api-deployment.yaml\nkubectl apply -f K8s/fraud-api-service.yaml\nkubectl apply -f K8s/prometheus-deployment.yaml\nkubectl apply -f K8s/grafana-deployment.yaml\n```\n\n3. **Access services**\n\n```bash\nminikube service fraud-api-service\nminikube service prometheus -n monitoring\nminikube service grafana -n monitoring\n```\n\n## **7. Running the Streamlit App**\n\nThe Streamlit app provides an interactive interface to test fraud detection predictions and visualize model metrics.\n\n### **Local Run**\n\n```bash\nstreamlit run app.py\n```\n\n* Access at: `http://localhost:8501`\n\n### **Features**\n\n* Input transaction details (Category, Payment Method, Account Age, etc.)\n* Auto-fill examples for **Legitimate** and **Fraudulent** transactions\n* Real-time prediction with threshold-based confidence\n* Navigation to **About Model**, **Metrics**, and **About Me** pages\n\n---\n\n## **8. Running the FastAPI Service**\n\nFastAPI serves the fraud prediction model as a REST API, useful for production-grade deployment and integration with external systems.\n\n### **Local Run**\n\n```bash\ncd API\nuvicorn main:app --reload --host 0.0.0.0 --port 8000\n```\n\n* Access API docs at: `http://localhost:8000/docs`\n\n### **Key Endpoints**\n\n* `POST /predict` – Accepts JSON payload and returns prediction\n* `GET /health` – Health check endpoint\n\n### **Docker Run**\n\n```bash\ndocker build -t fraud-fastapi -f Dockerfile ./API\ndocker run -p 8000:8000 fraud-fastapi\n```\n\n---\n\n## **9. Experiment Tracking with MLflow**\n\nMLflow is integrated to log experiments, parameters, metrics, and artifacts (PR curve, confusion matrix, models).\n\n### **Usage**\n\n* Automatically tracks during training via `FraudPipeline`\n* Logs include:\n\n  * Parameters: Steps applied, resampling method, model type\n  * Metrics: Accuracy, Precision, Recall, F1-score, PR-AUC\n  * Artifacts: PR Curve, Confusion Matrix, Serialized Model\n\n### **Access MLflow UI**\n\n```bash\nmlflow ui\n```\n\n* Opens at `http://127.0.0.1:5000`\n* Explore experiment runs and compare metrics visually\n\n---\n\n## **10. Monitoring with Prometheus \u0026 Grafana**\n\nThe deployed FastAPI service exposes metrics for Prometheus, visualized via Grafana dashboards.\n\n### **Prometheus**\n\n* Scrapes FastAPI metrics (request counts, response latency, error rates)\n* Runs on port **9090** in `monitoring` namespace\n\n### **Grafana**\n\n* Visualizes Prometheus data using pre-built dashboards\n* Runs on port **3000** in `monitoring` namespace\n* Import your **saved JSON dashboard** via Grafana UI\n\n### **Steps to Access**\n\n```bash\nminikube service prometheus -n monitoring\nminikube service grafana -n monitoring\n```\n\n## **11. Model Details**\n\nThe fraud detection model is built using a **custom pipeline** with multiple stages:\n\n### **Pipeline Steps**\n\n1. **Feature Engineering**\n\n   * Interaction: `Category x PaymentMethod`\n   * Ratio: `paymentMethodAgeDays / accountAgeDays`\n   * Binning: `accountAgeDays` into `new/medium/old`\n   * Time Feature: Categorize `localTime` into time-of-day bins\n\n2. **Preprocessing**\n\n   * Imputation for missing values (median/mode)\n   * One-hot encoding for categorical variables\n   * Log transformation for skewed features\n   * Scaling: StandardScaler (skewed) + MinMaxScaler (symmetric)\n\n3. **Resampling**\n\n   * **SMOTE** to handle extreme class imbalance\n\n4. **Model Training**\n\n   * Logistic Regression (default)\n   * Supports other models like RandomForest, XGBoost\n\n5. **Threshold Tuning**\n\n   * Optimal threshold found via precision-recall curve\n   * Current best threshold: **0.8370** (Precision = 0.955, Recall = 0.991)\n\n---\n\n## **12. Results \u0026 Metrics**\n\n### **Hold-out Set Performance**\n\n* **Hold-out A**: Accuracy 97%, Recall 100%, Precision 25% (imbalanced case)\n* **Hold-out B**: Accuracy 99%, Recall 100%, Precision 50% (imbalanced case)\n* **Hold-out C**: Accuracy 98%, Recall 98%, Precision 98%\n\n### **PR Curve \u0026 Confusion Matrix**\n\n* Stored in `Notebooks/artifacts/`\n* PR Curve demonstrates strong precision-recall balance\n* Confusion Matrix confirms minimal false negatives (critical for fraud detection)\n\n---\n\n## **13. Screenshots**\n\n## Screenshots\n\n---\n\n### 1. **MLFlow**\n- ![MLFlow Experiment 1](Images/MLFlow/1.png)\n- ![MLFlow Experiment 2](Images/MLFlow/2.png)\n\n---\n\n### 2. **Docker**\n- ![Docker Setup](Images/Docker/Docker1.png)\n- ![Docker Running](Images/Docker/Docker2.png)\n- ![Docker Hub](Images/Docker/Dockerhub.png)\n\n---\n\n### 3. **FastAPI**\n- ![FastAPI Endpoint 1](Images/FastAPI/1.png)\n- ![FastAPI Endpoint 2](Images/FastAPI/2.png)\n- ![FastAPI Endpoint 3](Images/FastAPI/3.png)\n\n---\n\n### 4. **Prometheus**\n- ![Prometheus Monitoring](Images/Prometheus/1.png)\n\n---\n\n### 5. **Grafana**\n- ![Grafana Dashboard 1](Images/Grafana/1.png)\n- ![Grafana Dashboard 2](Images/Grafana/2.png)\n- ![Grafana Dashboard 3](Images/Grafana/3.png)\n- ![Grafana Dashboard 4](Images/Grafana/4.png)\n- ![Grafana Dashboard 5](Images/Grafana/5.png)\n\n---\n\n## **14. Future Work**\n\n* Integrate **CI/CD pipelines** with GitHub Actions or Jenkins\n* Add **model registry** using MLflow’s registry or Seldon Core\n* Deploy **cloud-native** on AWS/GCP/Azure (EKS/GKE/AKS)\n* Implement **real-time streaming predictions** with Kafka\n* Add **explainability (SHAP/LIME)** for fraud predictions\n\n---\n\n## **15. Author / Contact**\n\n**Author:** Mohit Gupta\n\n* [Mail](mailto:mgmohit1111@gmail.com)\n* [GitHub](https://github.com/MohitGupta0123)\n* [LinkedIn](https://www.linkedin.com/in/mohitgupta012/)\n\nFeel free to connect for feedback, contributions, or collaborations.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmohitgupta0123%2Ffraud_detection_mlops","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmohitgupta0123%2Ffraud_detection_mlops","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmohitgupta0123%2Ffraud_detection_mlops/lists"}