https://github.com/danishayman/credit-card-fraud-detection
This project implements various machine learning models to detect fraudulent credit card transactions.
https://github.com/danishayman/credit-card-fraud-detection
credit-card credit-card-fraud ipynb ipynb-jupyter-notebook jupyter-notebook mahcine-learning python
Last synced: 4 months ago
JSON representation
This project implements various machine learning models to detect fraudulent credit card transactions.
- Host: GitHub
- URL: https://github.com/danishayman/credit-card-fraud-detection
- Owner: danishayman
- Created: 2025-01-12T16:06:25.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-01-12T16:29:29.000Z (10 months ago)
- Last Synced: 2025-05-18T09:11:32.621Z (6 months ago)
- Topics: credit-card, credit-card-fraud, ipynb, ipynb-jupyter-notebook, jupyter-notebook, mahcine-learning, python
- Language: HTML
- Homepage:
- Size: 1010 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Credit Card Fraud Detection ๐ณ
## Overview ๐
This project implements various machine learning models to detect fraudulent credit card transactions. The system uses a dataset of credit card transactions to train and evaluate different classification algorithms, helping to identify potentially fraudulent activities.
## Dataset ๐
The project uses the [Credit Card Fraud Detection dataset](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud) from Kaggle. The dataset contains transactions made by credit cards in September 2013 by European cardholders.
Dataset characteristics:
- 284,807 transactions
- 492 frauds (0.172% of all transactions)
- 31 features (28 principal components + Time + Amount + Class)
- Highly imbalanced dataset
## Features ๐
- Comprehensive Exploratory Data Analysis (EDA)
- Data preprocessing and feature scaling
- Implementation of multiple machine learning models:
- Logistic Regression
- Decision Tree Classifier
- Random Forest Classifier
- Support Vector Classifier (SVC)
- K-Nearest Neighbors (KNN)
- Gaussian Naive Bayes
- AdaBoost Classifier
- Gradient Boosting Classifier
- Bagging Classifier
- Extra Trees Classifier
- Stochastic Gradient Descent Classifier
- Voting Classifier
- Handling imbalanced data using:
- Undersampling
- Oversampling (SMOTE)
- Model evaluation and comparison
- Model persistence for future use
## Technologies Used ๐ ๏ธ
- Python
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- Imbalanced-learn (SMOTE)
- Joblib
## Key Findings ๐
- Best performing models after SMOTE:
- Decision Tree Classifier (99.82% accuracy)
- Logistic Regression (94.52% accuracy)
- Successfully handled the class imbalance problem
- Time of transaction doesn't significantly impact fraud detection
- Fraudulent transactions typically involve smaller amounts
## Getting Started ๐
### Prerequisites
```bash
pip install pandas numpy matplotlib scikit-learn imbalanced-learn joblib
```
### Usage
1. Clone the repository
2. Download the dataset from Kaggle
3. Run the Jupyter notebook
4. Use the saved model for predictions
## Model Performance ๐
### After SMOTE:
- Decision Tree Classifier:
- Accuracy: 99.82%
- Precision: 99.74%
- Recall: 99.89%
- F1 Score: 99.82%
## Future Improvements ๐ฎ
1. Feature engineering to create new relevant features
2. Implement deep learning models
3. Real-time transaction monitoring system
4. Add more evaluation metrics
5. Create a web interface for predictions
## Contributing ๐ค
Feel free to fork the project and submit pull requests. For major changes, please open an issue first to discuss the proposed changes.
## Acknowledgments ๐
- Dataset provided by Kaggle and the Machine Learning Group of ULB (Universitรฉ Libre de Bruxelles)
- Thanks to all contributors and the open-source community
## Contact ๐ง
Feel free to reach out for any questions or suggestions!
---