{"id":49557301,"url":"https://github.com/ashmod/fraud-detection","last_synced_at":"2026-05-03T05:35:07.461Z","repository":{"id":290656620,"uuid":"971366649","full_name":"ashmod/fraud-detection","owner":"ashmod","description":"A Python implementation of Chung \u0026 Lee's 2023 fraud detection ensemble approach. Optimized for high recall (≥0.93) on the PaySim dataset","archived":false,"fork":false,"pushed_at":"2025-05-24T10:37:51.000Z","size":1407,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-14T16:49:42.654Z","etag":null,"topics":["algorithms","fraud-detection","machine-learning","optimization","paper","python","research"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ashmod.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-23T12:17:40.000Z","updated_at":"2025-05-24T10:37:54.000Z","dependencies_parsed_at":"2025-06-23T11:38:28.696Z","dependency_job_id":"63d1028b-3eb1-4dfc-9faa-e8560eab5b63","html_url":"https://github.com/ashmod/fraud-detection","commit_stats":null,"previous_names":["dizzydroid/fraud-detection","ashmod/fraud-detection"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ashmod/fraud-detection","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashmod%2Ffraud-detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashmod%2Ffraud-detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashmod%2Ffraud-detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashmod%2Ffraud-detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ashmod","download_url":"https://codeload.github.com/ashmod/fraud-detection/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashmod%2Ffraud-detection/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32559716,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-03T03:21:47.309Z","status":"ssl_error","status_checked_at":"2026-05-03T03:21:43.884Z","response_time":103,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithms","fraud-detection","machine-learning","optimization","paper","python","research"],"created_at":"2026-05-03T05:35:06.922Z","updated_at":"2026-05-03T05:35:07.455Z","avatar_url":"https://github.com/ashmod.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Credit-Card Fraud Detection (Recall-First, Chung \u0026 Lee 2023, PaySim)\n\nThis repository implements and extends the high-recall ensemble approach for fraud detection from **Chung \u0026 Lee (2023, Sensors 23-7788)** using the [PaySim](https://www.kaggle.com/datasets/ealaxi/paysim1) dataset. The solution is optimized for **perfect or near-perfect recall** (≥0.93), aiming to catch every fraudulent transaction, following the principle that missing fraud is much more costly than a false alarm.\n\n## 🚀 Getting Started\n\nClone the repo and ensure your environment has Python 3.9+ and the required packages (see `requirements.txt`).  \nTypical workflow:\n\n```bash\nmake setup         # install dependencies\nmake preprocess    # preprocess data (encoding, split)\nmake train         # fit key models and save them\nmake ensemble      # apply ensemble voting (Algorithm 1)\nmake evaluate      # compute metrics, visualize and save results\n```\n\nArtifacts will be saved in `artifacts/`, processed data in `data/processed/`, and results (metrics, confusion matrix) in `results/`.\n\nYou can run `make clean` to wipe all outputs and start fresh.\n\n## 🗂️ Project Structure\n\n- **notebooks/**: Main notebook (`fraud-detection.ipynb`) with code, analysis, and visualizations\n- **data/**: Raw and processed PaySim data\n- **artifacts/**: Saved models (e.g., `knn.pkl`, `lda.pkl`, `lr.pkl`)\n- **results/**: Metrics, figures, confusion matrices, etc.\n- **docs/**: Literature notes\n- **slides/**: Presentation slides\n\n## 🏆 Methodology\n\n- **Dataset:** [PaySim](https://www.kaggle.com/datasets/ealaxi/paysim1) (6.3M mobile money transactions, highly imbalanced)\n- **Models:**  \n  - K-Nearest Neighbors (KNN)\n  - Linear Discriminant Analysis (LDA)\n  - Linear Regression (thresholded)\n  - Logistic Regression, Decision Tree, Random Forest, Naive Bayes (for comparison)\n- **Ensemble Logic:**  \n  - Inspired by Chung \u0026 Lee (2023)\n  - Prioritizes **recall** (fraud detection), combining KNN, LDA, and Linear Regression predictions using a voting/thresholding strategy\n- **Metrics:**  \n  - **Primary:** Recall (for fraud, label=0)\n  - **Also reported:** Precision, Accuracy, Confusion Matrix (visualized as a heatmap)\n\n## 📈 Results\n\nSummary of model performance (see notebook for details):\n\n| Model              | Recall   | Precision | Accuracy  |\n|--------------------|----------|-----------|-----------|\n| Decision Tree      | 0.9998   | 0.9998    | 0.9997    |\n| Naive Bayes        | 0.9971   | 0.9988    | 0.9960    |\n| **Ensemble**       | 0.9998   | 0.7508    | 0.9991    |\n\n\u003e The ensemble achieves nearly perfect recall with competitive precision and accuracy, validating the approach.\n\n## 📚 References\n\n- **Chung, H., \u0026 Lee, J. (2023).** “A High-Recall Ensemble Approach for Fraud Detection in Financial Transactions.” [Sensors 23(18), 7788](https://www.mdpi.com/1424-8220/23/18/7788)\n- [PaySim Dataset on Kaggle](https://www.kaggle.com/datasets/ealaxi/paysim1)\n- Scikit-learn documentation: https://scikit-learn.org/stable/documentation.html\n\n## 👥 Contributors\n\n- [Shehab Mahmoud Salah](https://github.com/dizzydroid)\n- [Abdelrahman Hany Mohamed](https://github.com/dopebiscuit)\n- [Youssef Ahmed Mohamed](https://github.com/unauthorised-401)\n- [Omar Mamon Hamed](https://github.com/Spafic)\n- [Seif El Din Tamer Shawky](https://github.com/SeifT101)\n- [Seif Eldeen Ahmed Abdulaziz](https://github.com/seifelwarwary)\n- [Habiba El-sayed Mowafy](https://github.com/Lucifer3224)\n- [Aya Tarek Salem](https://github.com/AyaTarekS)\n- [Moaz Ragab](https://github.com/moazragab12)\n- [Ahmed Ashraf Ali](https://github.com/AshrafByte)\n\n---\n\nFor details, see the [notebook](fraud-detection.ipynb) and [docs/](docs/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashmod%2Ffraud-detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fashmod%2Ffraud-detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashmod%2Ffraud-detection/lists"}