Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/samkazan/fraud-detection-ml
Machine learning models for enhanced fraud detection in e-commerce transactions, exploring feature engineering, distance prediction, and clustering analysis.
https://github.com/samkazan/fraud-detection-ml
clustering data-science data-visualization dataanalytics dbscan eda hierarchical-clustering kmeans-clustering knn-imputer matplotlib mlxtend python scikit-learn seaborn xgboost
Last synced: about 16 hours ago
JSON representation
Machine learning models for enhanced fraud detection in e-commerce transactions, exploring feature engineering, distance prediction, and clustering analysis.
- Host: GitHub
- URL: https://github.com/samkazan/fraud-detection-ml
- Owner: SamKazan
- Created: 2024-06-27T01:35:17.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-06-27T04:19:43.000Z (4 months ago)
- Last Synced: 2024-06-27T05:55:09.143Z (4 months ago)
- Topics: clustering, data-science, data-visualization, dataanalytics, dbscan, eda, hierarchical-clustering, kmeans-clustering, knn-imputer, matplotlib, mlxtend, python, scikit-learn, seaborn, xgboost
- Language: Jupyter Notebook
- Homepage:
- Size: 11.4 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# Machine Learning for Fraud Detection in E-commerce Transactions
## Overview
This project investigates the application of machine learning techniques to enhance fraud detection in e-commerce transactions. By leveraging a comprehensive dataset from Vesta, we explore feature engineering, distance prediction, and clustering analysis to identify fraudulent activities.## Problem Statement
The increasing sophistication of financial fraud poses significant challenges to businesses and consumers. Traditional rule-based fraud detection systems often struggle to keep pace with evolving fraudulent tactics. This project aims to develop more robust and accurate fraud detection models using machine learning.## Methodology
This project addresses three key research questions:**RQ1: Feature Engineering and Selection**
* **Objective:** Improve fraud detection accuracy by identifying and engineering the most predictive features.
* **Techniques:** Recursive Feature Elimination (RFE), Feature Importance from Gradient Boosting, Principal Component Analysis (PCA).**RQ2: Predicting Transaction Distances**
* **Objective:** Develop models to predict transaction distances and identify geographic anomalies indicative of fraud.
* **Techniques:** Linear Regression, XGBoost.**RQ3: Clustering for Coordinated Fraud Detection**
* **Objective:** Utilize clustering techniques to uncover groups of transactions potentially associated with coordinated fraud.
* **Techniques:** K-Means Clustering, HDBSCAN, Hierarchical Clustering.## Results
* **Feature Engineering:** PCA significantly enhanced model accuracy, highlighting its effectiveness in capturing relevant data structures.
* **Distance Prediction:** XGBoost models demonstrated promising results in predicting transaction distances, aiding in the identification of high-risk transactions.
* **Clustering Analysis:** K-Means Clustering provided the most interpretable and well-separated clusters, potentially revealing patterns of coordinated fraud.## Data
* **Source:** "IEEE-CIS Fraud Detection" dataset from Kaggle, provided by Vesta.
* **Size:** Over 140,000 transactions with 434 features (transaction details, card information, addresses, Vesta-engineered features).## Contact
Cem Kazan - [email protected]# fraud-detection-ml