Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/darshjasani/fraud-detection
https://github.com/darshjasani/fraud-detection
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/darshjasani/fraud-detection
- Owner: darshjasani
- Created: 2024-06-09T15:57:50.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-06-09T16:10:55.000Z (7 months ago)
- Last Synced: 2024-06-09T17:38:50.063Z (7 months ago)
- Language: Jupyter Notebook
- Size: 188 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Fraud Prediction Models and Threshold Analysis
This repository contains a Jupyter Notebook for analyzing and predicting fraud using various machine learning models. The notebook demonstrates the application of logistic regression, Support Vector Classifier (SVC), and XGBoost models, and explores the impact of changing thresholds on recall and F1 score, which are crucial metrics in fraud detection to minimize false negatives.
## Table of Contents
1. [Introduction](#introduction)
2. [Installation](#installation)
3. [Usage](#usage)
4. [Dataset](#dataset)
5. [Modeling](#modeling)
6. [Evaluation](#evaluation)
7. [Results](#results)## Introduction
Fraud detection is a critical application in various industries, aiming to identify and prevent fraudulent activities. Fraud detection is a critical process in the financial industry, particularly in credit card transactions, to identify and prevent fraudulent activities. Fraudsters can employ various techniques to misuse credit cards, such as stealing card information, making unauthorized purchases, or engaging in other illegal activities. Effective fraud detection systems are essential to protect customers, financial institutions, and merchants from financial losses and maintain the integrity of the payment ecosystem. This project focuses on building and evaluating machine learning models to enhance fraud detection accuracy and reduce false negatives by adjusting the decision threshold.
## Installation
To run this project locally, you will need to install the following dependencies:
```bash
pip install numpy pandas scikit-learn xgboost
```## Usage
1. Clone the repository:
```bash
git clone https://github.com/yourusername/fraud-prediction-and-threshold-analysis.git
cd fraud-prediction-and-threshold-analysis
```2. Open the Jupyter Notebook:
```bash
jupyter notebook fraud-prediction-and-threshold-analysis.ipynb
```3. Execute the cells sequentially to reproduce the analysis and results.
## Dataset
The dataset used in this project is sourced from Kaggle's fraud detection dataset. It consists of two files:
- `fraudTrain.csv`
- `fraudTest.csv`Ensure these files are placed in the appropriate directory before running the notebook.
## Modeling
The notebook includes the following steps for modeling:
1. **Data Loading and Preprocessing:** Loading the dataset and performing necessary preprocessing steps.
2. **Model Training:** Training logistic regression, SVC, and XGBoost models on the training dataset.
3. **Threshold Adjustment:** Changing the decision threshold to analyze its impact on recall and F1 score.## Evaluation
The models are evaluated based on the following metrics:
- **Recall:** The ability of the model to identify all relevant instances (fraud cases).
- **F1 Score:** The balance between precision and recall, providing a single metric for evaluation.## Results
The results section provides a comprehensive analysis of the model performance, highlighting the impact of threshold adjustments on recall and F1 score. Detailed plots and tables are included to visualize the findings.