https://github.com/obirikan/ml_model_fraud_detection
This project demonstrates how to use Logistic Regression to detect fraudulent transactions using SMOTE for an imbalanced data
https://github.com/obirikan/ml_model_fraud_detection
imbalanced-data logistic-regression smote-oversampler
Last synced: 12 months ago
JSON representation
This project demonstrates how to use Logistic Regression to detect fraudulent transactions using SMOTE for an imbalanced data
- Host: GitHub
- URL: https://github.com/obirikan/ml_model_fraud_detection
- Owner: obirikan
- Created: 2025-06-27T14:31:39.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-07-01T14:35:54.000Z (12 months ago)
- Last Synced: 2025-07-01T15:40:19.550Z (12 months ago)
- Topics: imbalanced-data, logistic-regression, smote-oversampler
- Language: Jupyter Notebook
- Homepage:
- Size: 6.11 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# đź’ł Logistic Regression Fraud Detection
This dataset provides a small but representative sample of anonymized financial transactions intended for building and testing **fraud detection models**.
Each record represents a **single transaction**, including:
- Transaction type (e.g., `CASH_OUT`, `TRANSFER`)
- Transaction amount
- Sender and receiver account balances before and after the transaction
- Fraud indicator flags
It is suitable for:
- Binary classification
- Anomaly detection
- Machine learning tasks related to **financial security**
---
## 📦 Dataset Structure
| Column Name | Description |
|------------------|--------------------------------------------------------------|
| `step` | Time step of the transaction |
| `type` | Type of transaction (e.g., `TRANSFER`, `CASH_OUT`) |
| `amount` | Amount involved in the transaction |
| `nameOrig` | ID of sender account |
| `oldbalanceOrg` | Sender’s balance before the transaction |
| `newbalanceOrig` | Sender’s balance after the transaction |
| `nameDest` | ID of receiver account |
| `oldbalanceDest` | Receiver’s balance before the transaction |
| `newbalanceDest` | Receiver’s balance after the transaction |
| `isFraud` | **Target variable**: 1 if fraudulent, 0 otherwise |
| `isPayment` | Indicates if the transaction is a payment |
| `isMovement` | Indicates if it involved a balance change |
| `accountDiff` | Difference in account balances (derived feature) |
---
## ⚠️ Class Imbalance Notice
> **Important:**
> This dataset is **highly imbalanced** — the number of fraudulent transactions (`isFraud = 1`) is much lower compared to non-fraudulent ones.
> This reflects real-world financial data and may affect model performance if not handled properly.
To improve results, consider:
- **Resampling techniques** like SMOTE or undersampling
- Using **evaluation metrics** like precision, recall, F1-score, or ROC-AUC instead of just accuracy
---
## đź’ˇ Inspiration
This dataset can help you explore:
- How fraud differs from legitimate behavior
- Techniques to detect rare but critical patterns
- How to evaluate models fairly when fraud is rare
---