https://github.com/ashmod/fraud-detection

A Python implementation of Chung & Lee's 2023 fraud detection ensemble approach. Optimized for high recall (≥0.93) on the PaySim dataset
https://github.com/ashmod/fraud-detection

algorithms fraud-detection machine-learning optimization paper python research

Last synced: about 2 months ago
JSON representation

A Python implementation of Chung & Lee's 2023 fraud detection ensemble approach. Optimized for high recall (≥0.93) on the PaySim dataset

Host: GitHub
URL: https://github.com/ashmod/fraud-detection
Owner: ashmod
License: mit
Created: 2025-04-23T12:17:40.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-05-24T10:37:51.000Z (about 1 year ago)
Last Synced: 2025-08-14T16:49:42.654Z (10 months ago)
Topics: algorithms, fraud-detection, machine-learning, optimization, paper, python, research
Language: Jupyter Notebook
Homepage:
Size: 1.34 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Credit-Card Fraud Detection (Recall-First, Chung & Lee 2023, PaySim)

This repository implements and extends the high-recall ensemble approach for fraud detection from **Chung & Lee (2023, Sensors 23-7788)** using the [PaySim](https://www.kaggle.com/datasets/ealaxi/paysim1) dataset. The solution is optimized for **perfect or near-perfect recall** (≥0.93), aiming to catch every fraudulent transaction, following the principle that missing fraud is much more costly than a false alarm.

## 🚀 Getting Started

Clone the repo and ensure your environment has Python 3.9+ and the required packages (see `requirements.txt`).  

Typical workflow:

```bash

make setup         # install dependencies

make preprocess    # preprocess data (encoding, split)

make train         # fit key models and save them

make ensemble      # apply ensemble voting (Algorithm 1)

make evaluate      # compute metrics, visualize and save results

```

Artifacts will be saved in `artifacts/`, processed data in `data/processed/`, and results (metrics, confusion matrix) in `results/`.

You can run `make clean` to wipe all outputs and start fresh.

## 🗂️ Project Structure

- **notebooks/**: Main notebook (`fraud-detection.ipynb`) with code, analysis, and visualizations

- **data/**: Raw and processed PaySim data

- **artifacts/**: Saved models (e.g., `knn.pkl`, `lda.pkl`, `lr.pkl`)

- **results/**: Metrics, figures, confusion matrices, etc.

- **docs/**: Literature notes

- **slides/**: Presentation slides

## 🏆 Methodology

- **Dataset:** [PaySim](https://www.kaggle.com/datasets/ealaxi/paysim1) (6.3M mobile money transactions, highly imbalanced)

- **Models:**  

  - K-Nearest Neighbors (KNN)

  - Linear Discriminant Analysis (LDA)

  - Linear Regression (thresholded)

  - Logistic Regression, Decision Tree, Random Forest, Naive Bayes (for comparison)

- **Ensemble Logic:**  

  - Inspired by Chung & Lee (2023)

  - Prioritizes **recall** (fraud detection), combining KNN, LDA, and Linear Regression predictions using a voting/thresholding strategy

- **Metrics:**  

  - **Primary:** Recall (for fraud, label=0)

  - **Also reported:** Precision, Accuracy, Confusion Matrix (visualized as a heatmap)

## 📈 Results

Summary of model performance (see notebook for details):

| Model              | Recall   | Precision | Accuracy  |

|--------------------|----------|-----------|-----------|

| Decision Tree      | 0.9998   | 0.9998    | 0.9997    |

| Naive Bayes        | 0.9971   | 0.9988    | 0.9960    |

| **Ensemble**       | 0.9998   | 0.7508    | 0.9991    |

> The ensemble achieves nearly perfect recall with competitive precision and accuracy, validating the approach.

## 📚 References

- **Chung, H., & Lee, J. (2023).** “A High-Recall Ensemble Approach for Fraud Detection in Financial Transactions.” [Sensors 23(18), 7788](https://www.mdpi.com/1424-8220/23/18/7788)

- [PaySim Dataset on Kaggle](https://www.kaggle.com/datasets/ealaxi/paysim1)

- Scikit-learn documentation: https://scikit-learn.org/stable/documentation.html

## 👥 Contributors

- [Shehab Mahmoud Salah](https://github.com/dizzydroid)

- [Abdelrahman Hany Mohamed](https://github.com/dopebiscuit)

- [Youssef Ahmed Mohamed](https://github.com/unauthorised-401)

- [Omar Mamon Hamed](https://github.com/Spafic)

- [Seif El Din Tamer Shawky](https://github.com/SeifT101)

- [Seif Eldeen Ahmed Abdulaziz](https://github.com/seifelwarwary)

- [Habiba El-sayed Mowafy](https://github.com/Lucifer3224)

- [Aya Tarek Salem](https://github.com/AyaTarekS)

- [Moaz Ragab](https://github.com/moazragab12)

- [Ahmed Ashraf Ali](https://github.com/AshrafByte)

---

For details, see the [notebook](fraud-detection.ipynb) and [docs/](docs/).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ashmod/fraud-detection

Awesome Lists containing this project

README