https://github.com/sushantdhumak/credit-card-fraud-detection

Demonstrates the use of ML for Anomaly Detection for Credit Card Transactions: Identifying Fraudulent Activity using Imbalanced Data
https://github.com/sushantdhumak/credit-card-fraud-detection

anamoly-detection correlation-analysis data-scaling data-visualization decision-tree-classifier exploratory-data-analysis gridsearchcv imbalanced-data knn-classifier logistic-regression near-miss outlier-detection outlier-removal precision-recall-curve random-forest-classifier roc-auc roc-auc-curve svc under-sampling

Last synced: 7 months ago
JSON representation

Demonstrates the use of ML for Anomaly Detection for Credit Card Transactions: Identifying Fraudulent Activity using Imbalanced Data

Host: GitHub
URL: https://github.com/sushantdhumak/credit-card-fraud-detection
Owner: sushantdhumak
Created: 2025-01-07T05:51:51.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-01-07T05:56:46.000Z (9 months ago)
Last Synced: 2025-01-30T23:05:56.921Z (8 months ago)
Topics: anamoly-detection, correlation-analysis, data-scaling, data-visualization, decision-tree-classifier, exploratory-data-analysis, gridsearchcv, imbalanced-data, knn-classifier, logistic-regression, near-miss, outlier-detection, outlier-removal, precision-recall-curve, random-forest-classifier, roc-auc, roc-auc-curve, svc, under-sampling
Language: Jupyter Notebook
Homepage:
Size: 11.9 MB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          ---

### **Credit Card Fraud Detection**

---

#### **Context:**

Credit card fraud detection is crucial to protect customers from unauthorized charges and ensure the integrity of financial transactions. Accurate and timely identification of fraudulent transactions helps prevent financial losses and maintains customer trust.

#### **Dataset Overview:**

This dataset contains credit card transaction records from European cardholders during a two-day period in September 2013. It includes 284,807 transactions, of which only 492 (0.172%) are fraudulent, making it highly imbalanced.

Key characteristics of the dataset:

**Features:**

1. Numerical variables derived from a Principal Component Analysis (PCA) transformation.

2. Original feature details are unavailable due to confidentiality constraints.

3. V1, V2, … V28: Principal components from PCA.

4. Time: Seconds elapsed since the first transaction in the dataset.

5. Amount: Transaction value, useful for cost-sensitive learning.

**Target Variable:**

Class: Indicates whether a transaction is fraudulent (1) or not (0).

#### **Challenge:**

The extreme class imbalance poses a significant challenge: fraud cases account for just 0.172% of all transactions.

#### **Objective:**

To address the imbalance and ensure robust evaluation, we recommend using the Area Under the Precision-Recall Curve (AUPRC) as the primary performance metric. This measure is particularly suited for datasets with skewed class distributions and helps assess the model's ability to correctly identify frauds while minimizing false positives.

**Dataset Link:**

https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sushantdhumak/credit-card-fraud-detection

Awesome Lists containing this project

README