Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/amir-tav/fraud-detection
https://github.com/amir-tav/fraud-detection
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/amir-tav/fraud-detection
- Owner: Amir-Tav
- Created: 2024-11-13T20:00:32.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-11-13T20:17:20.000Z (2 months ago)
- Last Synced: 2024-11-13T21:22:36.654Z (2 months ago)
- Language: Jupyter Notebook
- Size: 270 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Fraud Detection using Autoencoder
This project applies an unsupervised Autoencoder model to detect fraudulent transactions within a dataset. Autoencoders are ideal for anomaly detection tasks, such as fraud detection, because they learn compressed representations of data, allowing the identification of outliers through reconstruction error.
## Project Overview
- **Goal**: To develop a model that can accurately identify potentially fraudulent transactions based on reconstruction errors using an Autoencoder neural network.
- **Dataset**: Transactions data containing features related to financial transactions, where each entry is labeled as either legitimate or fraudulent.## Contents of the Notebook
### 1. Data Loading and Preparation
- **Data Import**: The dataset is loaded, and libraries such as Pandas and Numpy are imported.
- **Exploratory Analysis**: Initial analysis to understand class distribution and identify any class imbalance.
- **Data Cleaning and Preprocessing**:
- Handling missing values (if any) and standardizing/normalizing features for model compatibility.
- Data is split into training and testing sets, ensuring a fair distribution of classes.### 2. Building the Autoencoder Model
- **Model Structure**: A neural network with three main components:
- **Encoder**: Reduces the input dimension, learning key features.
- **Bottleneck**: Central, compressed layer where essential information is retained.
- **Decoder**: Reconstructs the input from compressed information.
- **Compilation**: The model is compiled with Mean Squared Error loss, commonly used for reconstruction tasks.```python
input_dim = X_train.shape[1]# Define the Autoencoder model
input_layer = Input(shape=(input_dim,))
encoder = Dense(14, activation="relu")(input_layer)
encoder = Dense(7, activation="relu")(encoder)
encoder = Dense(5, activation="relu")(encoder)
decoder = Dense(7, activation="relu")(encoder)
decoder = Dense(14, activation="relu")(decoder)
decoder = Dense(input_dim, activation="sigmoid")(decoder)autoencoder = Model(inputs=input_layer, outputs=decoder)
autoencoder.compile(optimizer=Adam(learning_rate=0.001), loss='mse')
autoencoder.summary()```
### 3. Model Training
- **Training Process**: The model is trained on non-fraudulent transactions to learn typical patterns, using a validation set to track reconstruction error.
- **Epochs and Early Stopping**: To prevent overfitting, training is monitored with early stopping.### 4. Model Evaluation and Fraud Detection
- **Reconstruction Error**: Transactions are passed through the Autoencoder, and reconstruction error is calculated. Higher errors indicate anomalies (possible fraud).
- **Thresholding**: Based on reconstruction error, a threshold is set to distinguish between normal and anomalous transactions.
- **Evaluation Metrics**:
- **Precision**: Measures how many identified fraud cases are actual frauds.
- **Recall**: Measures the coverage of actual fraud cases detected.```python
# Calculating reconstruction error and applying threshold
reconstruction_error = np.mean(np.square(X_test - model.predict(X_test)), axis=1)
threshold = np.percentile(reconstruction_error, 95) # Example threshold based on 95th percentile
```### 5. Results and Analysis
- **Performance**: Metrics like accuracy, precision, and recall provide insight into the model's effectiveness in distinguishing fraudulent from non-fraudulent transactions.
- **Observations**: Discusses strengths, weaknesses, and areas for improvement.## Usage
To run this notebook:
1. Ensure required libraries (TensorFlow, Numpy, Pandas) are installed.
2. Load the notebook and execute each cell sequentially.
3. Adjust the threshold value based on specific needs or dataset characteristics.## Conclusion
This project demonstrates the potential of Autoencoders in fraud detection, utilizing unsupervised learning to detect anomalies without labeled data. With fine-tuning, Autoencoder-based anomaly detection offers a flexible approach to identify rare events, such as fraud.