Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/aegrah/ae-vae-pca-anomaly_detection-ml

Anomaly Detection in OT datasets through machine learning (AE/VAE/PCA)
https://github.com/aegrah/ae-vae-pca-anomaly_detection-ml

Last synced: 6 days ago
JSON representation

Anomaly Detection in OT datasets through machine learning (AE/VAE/PCA)

Host: GitHub
URL: https://github.com/aegrah/ae-vae-pca-anomaly_detection-ml
Owner: Aegrah
Created: 2022-01-11T09:01:45.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2023-04-25T07:19:09.000Z (over 1 year ago)
Last Synced: 2024-10-29T21:24:57.737Z (about 2 months ago)
Language: Jupyter Notebook
Size: 2.32 MB
Stars: 3
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# AE, VAE and PCA anomaly detection in operational technology infrastructure datasets
This repository contains the Jupyter Notebooks and Python code that were used while writing my Master Thesis paper on "Anomaly Detection in Operational Technology Infrastructures Using Artificial Neural Networks". The models are implemented using Tensorflow and Keras. My thesis paper looked at the following models:

- Auto-Encoders (AE)
- Variational Auto-Encoders (VAE)
- Principal Component Analysis (PCA)

And concluded that the original Auto-Encoder model performed best with regards to anomaly detection in operational technology infrastructure datasets.

The datasets that were used to analyze the models during this research are:
- Water Distribution (WADI)
- Secure Water Treatment (SWaT)
- Battle of Attack Detection Algorithms (BATADAL)

These datasets can be requested at iTrust, Centre for Research in Cyber Security, which is available through the following URL:
```https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/```

The creditcard fraud dataset was used to initially test the different models in order to ensure that they worked correctly. This dataset is available at:
```https://www.kaggle.com/mlg-ulb/creditcardfraud```

Although this repository contains almost all code used for my research, I didn't go through testing all of it before pushing it to GitHub. The reason for this is that I wrote my thesis two years ago and I currently do not have an up-to-date machine learning development environment. Most of the code should still be working and can be used as a starting point to develop your own models.