https://github.com/shubhamgoyal575/credit-card-fraud-detection
https://github.com/shubhamgoyal575/credit-card-fraud-detection
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/shubhamgoyal575/credit-card-fraud-detection
- Owner: shubhamgoyal575
- Created: 2025-03-14T14:00:57.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-03-14T15:07:47.000Z (7 months ago)
- Last Synced: 2025-03-14T15:24:09.391Z (7 months ago)
- Size: 2.93 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Credit Card Fraud Detection
## 🚀 Overview
This project aims to **detect fraudulent credit card transactions** using **machine learning** techniques. Given the highly imbalanced nature of fraud detection, we apply various resampling techniques and model evaluation metrics to ensure robust predictions.## 📂 Dataset
The dataset used for this project is the **Credit Card Fraud Detection dataset** available on [Kaggle](https://www.kaggle.com/mlg-ulb/creditcardfraud). It consists of:
- **284,807** transactions
- **492** fraudulent transactions (**0.172% of total data**)
- **Features**: 30 columns (**V1-V28 are PCA-transformed**), `Time`, `Amount`, and `Class` (target variable: **0 = Legitimate, 1 = Fraudulent**)## 🛠️ Approach
### 1️⃣ Data Preprocessing
- **Handling class imbalance** using **SMOTE (Synthetic Minority Over-sampling Technique)** and **undersampling**.
- **Scaling numerical features** using `StandardScaler` or `MinMaxScaler`.
- **Feature engineering** to extract meaningful transaction patterns.### 2️⃣ Exploratory Data Analysis (EDA)
- Distribution of transaction amounts and time.
- Fraudulent vs. legitimate transaction patterns.
- Correlation analysis of PCA-transformed features.### 3️⃣ Model Selection & Training
We experimented with various machine learning models:
✅ **Random Forest**
✅ **AdaBoost**
✅ **XGBoost**
✅ **Lightbgm**
✅ **Neural Networks (Deep Learning Approach)**### 4️⃣ Evaluation Metrics
Since fraud detection is an **imbalanced classification problem**, we focus on:
- **Precision, Recall, F1-score** (to minimize false negatives)
- **AUC-ROC Curve** (to evaluate the model’s discriminatory power)
- **Confusion Matrix** (to analyze misclassification rates)## ☁️ Deployment with AWS SageMaker
This project utilizes **AWS SageMaker** for deploying the trained fraud detection model. The deployment steps include:### **1️⃣ Model Training on SageMaker**
- Using **built-in SageMaker algorithms** or **custom scripts**.
- Training the model with **SageMaker’s managed Jupyter notebooks**.### **2️⃣ Model Deployment**
- Deploying the trained model as a **real-time endpoint**.
- Using **SageMaker Inference** for making predictions on new transactions.## 📊 Results
- The **best model achieved:**
- **88.1% roc auc score**
- **85.2% F1-score**
- **Random Forest** and **XGBoost** were the most balanced models in terms of performance.
- **Neural Networks** performed exceptionally well in high-computational environments with **AUC close to 0.999**.## 🔮 Future Improvements
- Implementing **deep learning architectures** for better accuracy.
- Deploying the model using **Flask or FastAPI** for API-based fraud detection.
- Enhancing **real-time fraud detection** using **streaming data** (e.g., Apache Kafka).## 🤝 Contributing
Contributions are **welcome**! Feel free to submit **pull requests** or raise **issues**.## 📝 License
This project is licensed under the **MIT License**.## 📢 Acknowledgments
- Open-source **Machine Learning & Data Science Communities**---
### ⭐ **Feel free to star this repository if you find it useful!** 🌟