{"id":21864116,"url":"https://github.com/linsanity03/fraud_detection_using_autoencoders","last_synced_at":"2026-04-28T23:38:09.171Z","repository":{"id":264167852,"uuid":"892568735","full_name":"LINSANITY03/Fraud_detection_using_Autoencoders","owner":"LINSANITY03","description":"Leverage autoencoders, a type of neural network, for detecting fraudulent financial transactions by identifying anomalies in transaction patterns that deviate from the norm.","archived":false,"fork":false,"pushed_at":"2024-11-25T10:26:35.000Z","size":106,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-03T05:44:42.682Z","etag":null,"topics":["matplotlib","numpy","pandas","pytorch","seaborn"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LINSANITY03.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-22T11:08:56.000Z","updated_at":"2024-11-25T10:26:39.000Z","dependencies_parsed_at":"2024-11-22T12:20:54.897Z","dependency_job_id":"ff6b1401-69d6-4529-812f-9eb80f12b134","html_url":"https://github.com/LINSANITY03/Fraud_detection_using_Autoencoders","commit_stats":null,"previous_names":["linsanity03/fraud_detection_using_autoencoders"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/LINSANITY03/Fraud_detection_using_Autoencoders","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LINSANITY03%2FFraud_detection_using_Autoencoders","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LINSANITY03%2FFraud_detection_using_Autoencoders/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LINSANITY03%2FFraud_detection_using_Autoencoders/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LINSANITY03%2FFraud_detection_using_Autoencoders/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LINSANITY03","download_url":"https://codeload.github.com/LINSANITY03/Fraud_detection_using_Autoencoders/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LINSANITY03%2FFraud_detection_using_Autoencoders/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273397843,"owners_count":25098234,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-03T02:00:09.631Z","response_time":76,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["matplotlib","numpy","pandas","pytorch","seaborn"],"created_at":"2024-11-28T04:07:32.662Z","updated_at":"2025-10-05T02:37:32.411Z","avatar_url":"https://github.com/LINSANITY03.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Fraud Detection in Financial Transactions Using Autoencoders\n\nAutoencoders are unsupervised neural networks used for anomaly detection. The idea behind using them for fraud detection is that they are trained to learn a compressed representation of the data (normal transactions). When a fraudulent transaction occurs, the reconstruction error of the autoencoder will be significantly higher, signaling that the transaction is anomalous.\n\n## 1. Data Collection\n\nI used [Credit Card Fraud Detection](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud?resource=download) dataset from kaggle.\n\nThe dataset typically includes:\n\nTime: Time elapsed since the first transaction.\nV1, V2, ..., V28: 28 anonymized features representing transaction characteristics.\nAmount: The transaction amount.\nClass: Fraudulent (1) or Non-fraudulent (0) transaction.\n\n## Directory Structure\n\n```\n    Fraud_Detection/\n    │\n    ├── data/\n    │   ├── raw/               # Raw data files (unmodified, original data)\n    │   ├── processed/         # Preprocessed data files ready for modeling\n    │\n    ├── notebooks/\n    │   ├── 01_data_exploration.ipynb  # Notebook for EDA (Exploratory Data Analysis)\n    │   ├── 02_preprocessing.ipynb     # Notebook for data cleaning and preprocessing\n    │   ├── 03_modeling.ipynb          # Notebook for model training\n    │   └── 04_evaluation.ipynb        # Notebook for model evaluation and results\n    │\n    ├── scripts/\n    │   ├── data_processing.py   # Python script for data preprocessing\n    │   ├── train_model.py       # Python script for model training\n    │   └── evaluate_model.py    # Python script for evaluation\n    │\n    ├── models/\n    │   ├── saved_models/        # Serialized models (e.g., .pkl, .h5)\n    │   └── model_logs/          # Logs and checkpoints from training\n    │\n    ├── results/\n    │   ├── figures/             # Visualizations, plots\n    │   ├── reports/             # Analysis reports, markdowns\n    │   └── metrics/             # Saved evaluation metrics (e.g., CSV, JSON)\n    │\n    ├── requirements.txt         # Python dependencies\n    ├── README.md                # Project overview and instructions\n    └── config.yaml              # Configuration file for project-wide parameters\n```\n\n## 2. Data Preprocessing\n\nClean and process the data for modelling purpose and save in a csv file.\n\n## 3. Define the model\n\nAn autoencoder has an encoder (compresses the data) and a decoder (reconstructs the data). We'll use fully connected layers for both the encoder and decoder.\n\n```\n    class Autoencoder(nn.Module):\n        def __init__(self):\n            super(Autoencoder, self).__init__()\n            \n            # Encoder part\n            self.encoder = nn.Sequential(\n                nn.Linear(29, 14),  # Input layer (29 features) -\u003e hidden layer (14 features)\n                nn.ReLU(),\n                nn.Linear(14, 7),   # Hidden layer -\u003e smaller hidden layer (7 features)\n                nn.ReLU(),\n                nn.Linear(7, 3),    # Bottleneck layer (compressed representation)\n                nn.ReLU()\n            )\n            \n            # Decoder part\n            self.decoder = nn.Sequential(\n                nn.Linear(3, 7),    # Bottleneck layer -\u003e hidden layer (7 features)\n                nn.ReLU(),\n                nn.Linear(7, 14),   # Hidden layer -\u003e hidden layer (14 features)\n                nn.ReLU(),\n                nn.Linear(14, 29),  # Hidden layer -\u003e output layer (29 features)\n                nn.Sigmoid()        # To bring the output in range [0,1] (same as input range)\n            )\n        \n        def forward(self, x):\n            x = self.encoder(x)\n            x = self.decoder(x)\n            return x\n```\n\n## 4. Train model\n\nWe will train the autoencoder using only normal transaction data (unsupervised). The goal is for the autoencoder to learn how to reconstruct normal transaction data well. The model is trained using Mean Squared Error (MSE) as the loss function because we want to minimize the difference between the original and reconstructed data. We use Adam optimizer for efficient training.\n\n## 5. Anomaly Detection and Thresholding\n\nAfter the model is trained, we calculate the reconstruction error for all the data points. If the reconstruction error is high, it's likely an anomaly (fraudulent transaction). The threshold is set to the 95th percentile of the reconstruction errors from normal transactions.\n\n## 6. Evaluate model\n\nSince we're working with a labeled dataset, we can evaluate the performance using metrics like confusion matrix, precision, recall, and F1-score.\n\n## Classification report\n\nWe can calculate the performance metrics using the sklearn libraries.\n\n```\n    from sklearn.metrics import classification_report\n\n    # Generate classification report as a dictionary\n    report = classification_report(y_true, y_pred, output_dict=True)\n```\n\nThe generated report from the model.\n```\n                precision    recall  f1-score        support\n    0              1.000000  0.949999  0.974359  284315.000000\n    1              0.000000  0.000000  0.000000       0.000000\n    accuracy       0.949999  0.949999  0.949999       0.949999\n    macro avg      0.500000  0.475000  0.487179  284315.000000\n    weighted avg   1.000000  0.949999  0.974359  284315.000000\n```\n\n## Confusion matrix\n\nSince, we tested the autoencoder on a dataset without including any fraud samples, the bottom row (Fraud) of the confusion matrix would naturally be empty because the model was never evaluated on Fraud cases.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"results/figures/confusion_matrix.png\" alt=\"Algorithm_game\" height=\"360\" width=\"640\"\u003e\n\u003c/p\u003e\n\n## Reconstruction Error\n\nn an autoencoder, the model learns to compress input data into a lower-dimensional representation (latent space) and then reconstruct it back to its original form. The reconstruction error measures the difference between the original input and the reconstructed output.\n\n- Blue indicates autoencoder effectively reconstructs normal transcation with minimal error.\n- Red indicates autoencoder struggles to reconstruct anomalous transcations resulting in significantly higher errors. Since, the autoencoder is trained on non-fraudulent data it results into higher reconstruction errors.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"results/figures/reconstruction_error.png\" alt=\"Algorithm_game\" height=\"360\" width=\"640\"\u003e\n\u003c/p\u003e\n\n## Collaboration\n\nFeel free to use this codebase for your projects. If you want to talk more about this project or have found bugs, create a pull request or contact me on **pujantamang92@gmail.com**.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinsanity03%2Ffraud_detection_using_autoencoders","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinsanity03%2Ffraud_detection_using_autoencoders","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinsanity03%2Ffraud_detection_using_autoencoders/lists"}