https://github.com/dipeshdimi/credit_card_fraud_detection
https://github.com/dipeshdimi/credit_card_fraud_detection
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/dipeshdimi/credit_card_fraud_detection
- Owner: dipeshdimi
- Created: 2024-01-31T01:39:28.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-03T02:02:01.000Z (over 2 years ago)
- Last Synced: 2025-03-21T20:46:37.092Z (about 1 year ago)
- Language: Jupyter Notebook
- Size: 28.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Credit Card Fraud Detection
## Introduction
This repository contains a Jupyter Notebook (`Credit_Card_Fraud_Detection.ipynb`) that focuses on detecting fraudulent credit card transactions using logistic regression. The notebook includes steps for data loading, exploration, preprocessing, model training, and evaluation.
- [Colab Link](https://colab.research.google.com/drive/1B1IJKPkXi4PAGA9NjJeBSmEQlck9u55-?usp=sharing)
## Dataset
The dataset used in this project can be found on [Credit Card Fraud Dataset](https://www.kaggle.com/mlg-ulb/creditcardfraud). It consists of transactions labeled as legitimate (Class 0) or fraudulent (Class 1). Please note that the dataset is relatively large, so the initial loading may take some time.
## Dependencies
To run the notebook, make sure you have the following dependencies installed:
```python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, f1_score, accuracy_score, precision_score, recall_score
```
## Exploratory Data Analysis
The notebook includes an exploration of the dataset, covering aspects such as data overview, information, checking for missing values, and analyzing the class distribution.
## Data Balancing
Given the highly unbalanced nature of the dataset, with a significant number of legitimate transactions (Class 0) and fewer fraudulent transactions (Class 1), the notebook implements under-sampling to balance the dataset for training purposes.
## Model Training
Logistic Regression is chosen as the classification algorithm for this task. The notebook includes code for training the logistic regression model using the balanced dataset.
## Model Evaluation
The notebook evaluates the trained model on both the training and testing datasets, providing metrics such as accuracy, confusion matrix, precision, recall, and F1 score.
## Results
After training and evaluating the logistic regression model, the notebook displays the performance metrics on both the training and testing datasets.