Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/amir-tav/fraud-detection

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/amir-tav/fraud-detection
Owner: Amir-Tav
Created: 2024-11-13T20:00:32.000Z (2 months ago)
Default Branch: main
Last Pushed: 2024-11-13T20:17:20.000Z (2 months ago)
Last Synced: 2024-11-13T21:22:36.654Z (2 months ago)
Language: Jupyter Notebook
Size: 270 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Fraud Detection using Autoencoder

This project applies an unsupervised Autoencoder model to detect fraudulent transactions within a dataset. Autoencoders are ideal for anomaly detection tasks, such as fraud detection, because they learn compressed representations of data, allowing the identification of outliers through reconstruction error.

## Project Overview
- **Goal**: To develop a model that can accurately identify potentially fraudulent transactions based on reconstruction errors using an Autoencoder neural network.
- **Dataset**: Transactions data containing features related to financial transactions, where each entry is labeled as either legitimate or fraudulent.

## Contents of the Notebook

### 1. Data Loading and Preparation
- **Data Import**: The dataset is loaded, and libraries such as Pandas and Numpy are imported.
- **Exploratory Analysis**: Initial analysis to understand class distribution and identify any class imbalance.
- **Data Cleaning and Preprocessing**:
- Handling missing values (if any) and standardizing/normalizing features for model compatibility.
- Data is split into training and testing sets, ensuring a fair distribution of classes.

### 2. Building the Autoencoder Model
- **Model Structure**: A neural network with three main components:
- **Encoder**: Reduces the input dimension, learning key features.
- **Bottleneck**: Central, compressed layer where essential information is retained.
- **Decoder**: Reconstructs the input from compressed information.
- **Compilation**: The model is compiled with Mean Squared Error loss, commonly used for reconstruction tasks.

```python
input_dim = X_train.shape[1]

# Define the Autoencoder model
input_layer = Input(shape=(input_dim,))
encoder = Dense(14, activation="relu")(input_layer)
encoder = Dense(7, activation="relu")(encoder)
encoder = Dense(5, activation="relu")(encoder)
decoder = Dense(7, activation="relu")(encoder)
decoder = Dense(14, activation="relu")(decoder)
decoder = Dense(input_dim, activation="sigmoid")(decoder)

autoencoder = Model(inputs=input_layer, outputs=decoder)
autoencoder.compile(optimizer=Adam(learning_rate=0.001), loss='mse')
autoencoder.summary()

```

### 3. Model Training
- **Training Process**: The model is trained on non-fraudulent transactions to learn typical patterns, using a validation set to track reconstruction error.
- **Epochs and Early Stopping**: To prevent overfitting, training is monitored with early stopping.

### 4. Model Evaluation and Fraud Detection
- **Reconstruction Error**: Transactions are passed through the Autoencoder, and reconstruction error is calculated. Higher errors indicate anomalies (possible fraud).
- **Thresholding**: Based on reconstruction error, a threshold is set to distinguish between normal and anomalous transactions.
- **Evaluation Metrics**:
- **Precision**: Measures how many identified fraud cases are actual frauds.
- **Recall**: Measures the coverage of actual fraud cases detected.

```python
# Calculating reconstruction error and applying threshold
reconstruction_error = np.mean(np.square(X_test - model.predict(X_test)), axis=1)
threshold = np.percentile(reconstruction_error, 95) # Example threshold based on 95th percentile
```

### 5. Results and Analysis
- **Performance**: Metrics like accuracy, precision, and recall provide insight into the model's effectiveness in distinguishing fraudulent from non-fraudulent transactions.
- **Observations**: Discusses strengths, weaknesses, and areas for improvement.

## Usage
To run this notebook:
1. Ensure required libraries (TensorFlow, Numpy, Pandas) are installed.
2. Load the notebook and execute each cell sequentially.
3. Adjust the threshold value based on specific needs or dataset characteristics.

## Conclusion
This project demonstrates the potential of Autoencoders in fraud detection, utilizing unsupervised learning to detect anomalies without labeled data. With fine-tuning, Autoencoder-based anomaly detection offers a flexible approach to identify rare events, such as fraud.