Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Stream-AD/MemStream
MemStream: Memory-Based Streaming Anomaly Detection
https://github.com/Stream-AD/MemStream
anomaly-detection concept-drift denial-of-service fraud-detection intrusion-dectection memory-based multi-aspect-record real-time streaming
Last synced: 15 days ago
JSON representation
MemStream: Memory-Based Streaming Anomaly Detection
- Host: GitHub
- URL: https://github.com/Stream-AD/MemStream
- Owner: Stream-AD
- License: apache-2.0
- Created: 2021-06-06T12:55:26.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-01-10T03:19:06.000Z (10 months ago)
- Last Synced: 2024-08-01T16:33:35.616Z (3 months ago)
- Topics: anomaly-detection, concept-drift, denial-of-service, fraud-detection, intrusion-dectection, memory-based, multi-aspect-record, real-time, streaming
- Language: Python
- Homepage:
- Size: 312 KB
- Stars: 82
- Watchers: 11
- Forks: 17
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MemStream
Implementation of
- [MemStream: Memory-Based Streaming Anomaly Detection](https://dl.acm.org/doi/pdf/10.1145/3485447.3512221). *Siddharth Bhatia, Arjit Jain, Shivin Srivastava, Kenji Kawaguchi, Bryan Hooi*. The Web Conference (formerly WWW), 2022.
MemStream detects anomalies from a multi-aspect data stream. We output an anomaly score for each record. MemStream is a memory augmented feature extractor, allows for quick retraining, gives a theoretical bound on the memory size for effective drift handling, is robust to memory poisoning, and outperforms 11 state-of-the-art streaming anomaly detection baselines.
![](MemStream.png)
After an initial training of the feature extractor on a small subset of normal data, MemStream processes records in two steps: (i) It outputs anomaly scores for each record by querying the memory for K-nearest neighbours to the record encoding and calculating a discounted distance and (ii) It updates the memory, in a FIFO manner, if the anomaly score is within an update threshold β.## Demo
1. KDDCUP99: Run `python3 memstream.py --dataset KDD --beta 1 --memlen 256`
2. NSL-KDD: Run `python3 memstream.py --dataset NSL --beta 0.1 --memlen 2048`
3. UNSW-NB 15: Run `python3 memstream.py --dataset UNSW --beta 0.1 --memlen 2048`
4. CICIDS-DoS: Run `python3 memstream.py --dataset DOS --beta 0.1 --memlen 2048`
5. SYN: Run `python3 memstream-syn.py --dataset SYN --beta 1 --memlen 16`
6. Ionosphere: Run `python3 memstream.py --dataset ionosphere --beta 0.001 --memlen 4`
7. Cardiotocography: Run `python3 memstream.py --dataset cardio --beta 1 --memlen 64`
8. Statlog Landsat Satellite: Run `python3 memstream.py --dataset statlog --beta 0.01 --memlen 32`
9. Satimage-2: Run `python3 memstream.py --dataset satimage-2 --beta 10 --memlen 256`
10. Mammography: Run `python3 memstream.py --dataset mammography --beta 0.1 --memlen 128`
11. Pima Indians Diabetes: Run `python3 memstream.py --dataset pima --beta 0.001 --memlen 64`
12. Covertype: Run `python3 memstream.py --dataset cover --beta 0.0001 --memlen 2048`## Command line options
* `--dataset`: The dataset to be used for training. Choices 'NSL', 'KDD', 'UNSW', 'DOS'. (default 'NSL')
* `--beta`: The threshold beta to be used. (default: 0.1)
* `--memlen`: The size of the Memory Module (default: 2048)
* `--dev`: Pytorch device to be used for training like "cpu", "cuda:0" etc. (default: 'cuda:0')
* `--lr`: Learning rate (default: 0.01)
* `--epochs`: Number of epochs (default: 5000)## Input file format
MemStream expects the input multi-aspect record stream to be stored in a contains `,` separated file.## Datasets
Processed Datasets can be downloaded from [here](https://drive.google.com/file/d/1JNrhOr8U3Nqef1hBOqvHQPzBNWzDOFdl/view). Please unzip and place the files in the data folder of the repository.1. [KDDCUP99](http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html)
2. [NSL-KDD](https://www.unb.ca/cic/datasets/nsl.html)
3. [UNSW-NB 15](https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/)
4. [CICIDS-DoS](https://www.unb.ca/cic/datasets/ids-2018.html)
5. Synthetic Dataset (Introduced in paper)
6. [Ionosphere](https://archive.ics.uci.edu/ml/index.php)
7. [Cardiotocography](https://archive.ics.uci.edu/ml/index.php)
8. [Statlog Landsat Satellite](https://archive.ics.uci.edu/ml/index.php)
9. [Satimage-2](http://odds.cs.stonybrook.edu)
10. [Mammography](http://odds.cs.stonybrook.edu)
11. [Pima Indians Diabetes](https://archive.ics.uci.edu/ml/index.php)
12. [Covertype](https://archive.ics.uci.edu/ml/index.php)## Environment
This code has been tested on Debian GNU/Linux 9 with a 12GB Nvidia GeForce RTX 2080 Ti GPU, CUDA Version 10.2 and PyTorch 1.5.## Citation
If you use this code for your research, please consider citing our WWW paper.
```bibtex
@inproceedings{bhatia2022memstream,
title={MemStream: Memory-Based Streaming Anomaly Detection},
author={Siddharth Bhatia and Arjit Jain and Shivin Srivastava and Kenji Kawaguchi and Bryan Hooi},
booktitle={The Web Conference (formerly WWW)},
year={2022}
}