https://github.com/ashmod/fraud-detection
A Python implementation of Chung & Lee's 2023 fraud detection ensemble approach. Optimized for high recall (≥0.93) on the PaySim dataset
https://github.com/ashmod/fraud-detection
algorithms fraud-detection machine-learning optimization paper python research
Last synced: about 2 months ago
JSON representation
A Python implementation of Chung & Lee's 2023 fraud detection ensemble approach. Optimized for high recall (≥0.93) on the PaySim dataset
- Host: GitHub
- URL: https://github.com/ashmod/fraud-detection
- Owner: ashmod
- License: mit
- Created: 2025-04-23T12:17:40.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-24T10:37:51.000Z (about 1 year ago)
- Last Synced: 2025-08-14T16:49:42.654Z (10 months ago)
- Topics: algorithms, fraud-detection, machine-learning, optimization, paper, python, research
- Language: Jupyter Notebook
- Homepage:
- Size: 1.34 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Credit-Card Fraud Detection (Recall-First, Chung & Lee 2023, PaySim)
This repository implements and extends the high-recall ensemble approach for fraud detection from **Chung & Lee (2023, Sensors 23-7788)** using the [PaySim](https://www.kaggle.com/datasets/ealaxi/paysim1) dataset. The solution is optimized for **perfect or near-perfect recall** (≥0.93), aiming to catch every fraudulent transaction, following the principle that missing fraud is much more costly than a false alarm.
## 🚀 Getting Started
Clone the repo and ensure your environment has Python 3.9+ and the required packages (see `requirements.txt`).
Typical workflow:
```bash
make setup # install dependencies
make preprocess # preprocess data (encoding, split)
make train # fit key models and save them
make ensemble # apply ensemble voting (Algorithm 1)
make evaluate # compute metrics, visualize and save results
```
Artifacts will be saved in `artifacts/`, processed data in `data/processed/`, and results (metrics, confusion matrix) in `results/`.
You can run `make clean` to wipe all outputs and start fresh.
## 🗂️ Project Structure
- **notebooks/**: Main notebook (`fraud-detection.ipynb`) with code, analysis, and visualizations
- **data/**: Raw and processed PaySim data
- **artifacts/**: Saved models (e.g., `knn.pkl`, `lda.pkl`, `lr.pkl`)
- **results/**: Metrics, figures, confusion matrices, etc.
- **docs/**: Literature notes
- **slides/**: Presentation slides
## 🏆 Methodology
- **Dataset:** [PaySim](https://www.kaggle.com/datasets/ealaxi/paysim1) (6.3M mobile money transactions, highly imbalanced)
- **Models:**
- K-Nearest Neighbors (KNN)
- Linear Discriminant Analysis (LDA)
- Linear Regression (thresholded)
- Logistic Regression, Decision Tree, Random Forest, Naive Bayes (for comparison)
- **Ensemble Logic:**
- Inspired by Chung & Lee (2023)
- Prioritizes **recall** (fraud detection), combining KNN, LDA, and Linear Regression predictions using a voting/thresholding strategy
- **Metrics:**
- **Primary:** Recall (for fraud, label=0)
- **Also reported:** Precision, Accuracy, Confusion Matrix (visualized as a heatmap)
## 📈 Results
Summary of model performance (see notebook for details):
| Model | Recall | Precision | Accuracy |
|--------------------|----------|-----------|-----------|
| Decision Tree | 0.9998 | 0.9998 | 0.9997 |
| Naive Bayes | 0.9971 | 0.9988 | 0.9960 |
| **Ensemble** | 0.9998 | 0.7508 | 0.9991 |
> The ensemble achieves nearly perfect recall with competitive precision and accuracy, validating the approach.
## 📚 References
- **Chung, H., & Lee, J. (2023).** “A High-Recall Ensemble Approach for Fraud Detection in Financial Transactions.” [Sensors 23(18), 7788](https://www.mdpi.com/1424-8220/23/18/7788)
- [PaySim Dataset on Kaggle](https://www.kaggle.com/datasets/ealaxi/paysim1)
- Scikit-learn documentation: https://scikit-learn.org/stable/documentation.html
## 👥 Contributors
- [Shehab Mahmoud Salah](https://github.com/dizzydroid)
- [Abdelrahman Hany Mohamed](https://github.com/dopebiscuit)
- [Youssef Ahmed Mohamed](https://github.com/unauthorised-401)
- [Omar Mamon Hamed](https://github.com/Spafic)
- [Seif El Din Tamer Shawky](https://github.com/SeifT101)
- [Seif Eldeen Ahmed Abdulaziz](https://github.com/seifelwarwary)
- [Habiba El-sayed Mowafy](https://github.com/Lucifer3224)
- [Aya Tarek Salem](https://github.com/AyaTarekS)
- [Moaz Ragab](https://github.com/moazragab12)
- [Ahmed Ashraf Ali](https://github.com/AshrafByte)
---
For details, see the [notebook](fraud-detection.ipynb) and [docs/](docs/).