{"id":13473636,"url":"https://github.com/Stream-AD/MemStream","last_synced_at":"2025-03-26T19:34:28.967Z","repository":{"id":45209877,"uuid":"374360266","full_name":"Stream-AD/MemStream","owner":"Stream-AD","description":"MemStream: Memory-Based Streaming Anomaly Detection","archived":false,"fork":false,"pushed_at":"2024-01-10T03:19:06.000Z","size":319,"stargazers_count":83,"open_issues_count":0,"forks_count":18,"subscribers_count":11,"default_branch":"main","last_synced_at":"2024-10-30T06:33:02.933Z","etag":null,"topics":["anomaly-detection","concept-drift","denial-of-service","fraud-detection","intrusion-dectection","memory-based","multi-aspect-record","real-time","streaming"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Stream-AD.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-06-06T12:55:26.000Z","updated_at":"2024-10-17T13:44:49.000Z","dependencies_parsed_at":"2024-01-13T18:24:12.008Z","dependency_job_id":"dbc7ea76-fa19-4a17-be16-a5300d320354","html_url":"https://github.com/Stream-AD/MemStream","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stream-AD%2FMemStream","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stream-AD%2FMemStream/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stream-AD%2FMemStream/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stream-AD%2FMemStream/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Stream-AD","download_url":"https://codeload.github.com/Stream-AD/MemStream/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245722877,"owners_count":20661842,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomaly-detection","concept-drift","denial-of-service","fraud-detection","intrusion-dectection","memory-based","multi-aspect-record","real-time","streaming"],"created_at":"2024-07-31T16:01:05.566Z","updated_at":"2025-03-26T19:34:27.716Z","avatar_url":"https://github.com/Stream-AD.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# MemStream\n\n\u003cp\u003e\n  \u003ca href=\"https://arxiv.org/pdf/2106.03837.pdf\"\u003e\u003cimg src=\"http://img.shields.io/badge/Paper-PDF-brightgreen.svg\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/Stream-AD/MemStream/blob/master/LICENSE\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/License-Apache%202.0-blue.svg\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\nImplementation of\n\n- [MemStream: Memory-Based Streaming Anomaly Detection](https://dl.acm.org/doi/pdf/10.1145/3485447.3512221). *Siddharth Bhatia, Arjit Jain, Shivin Srivastava, Kenji Kawaguchi, Bryan Hooi*. The Web Conference (formerly WWW), 2022.\n\nMemStream detects anomalies from a multi-aspect data stream. We output an anomaly score for each record. MemStream is a memory augmented feature extractor, allows for quick retraining, gives a theoretical bound on the memory size for effective drift handling, is robust to memory poisoning, and outperforms 11 state-of-the-art streaming anomaly detection baselines.\n\n![](MemStream.png)\nAfter an initial training of the feature extractor on a small subset of normal data, MemStream processes records in two steps: (i) It outputs anomaly scores for each record by querying the memory for K-nearest neighbours to the record encoding and calculating a discounted distance and (ii) It updates the memory, in a FIFO manner, if the anomaly score is within an update threshold β.\n\n\n## Demo\n\n1. KDDCUP99: Run `python3 memstream.py --dataset KDD --beta 1 --memlen 256`\n2. NSL-KDD: Run `python3 memstream.py --dataset NSL --beta 0.1 --memlen 2048`\n3. UNSW-NB 15: Run `python3 memstream.py --dataset UNSW --beta 0.1 --memlen 2048`\n4. CICIDS-DoS: Run `python3 memstream.py --dataset DOS --beta 0.1 --memlen 2048`\n5. SYN: Run `python3 memstream-syn.py --dataset SYN --beta 1 --memlen 16`\n6. Ionosphere: Run `python3 memstream.py --dataset ionosphere --beta 0.001 --memlen 4`\n7. Cardiotocography: Run `python3 memstream.py --dataset cardio --beta 1 --memlen 64`\n8. Statlog Landsat Satellite: Run `python3 memstream.py --dataset statlog --beta 0.01 --memlen 32`\n9. Satimage-2: Run `python3 memstream.py --dataset satimage-2 --beta 10 --memlen 256`\n10. Mammography: Run `python3 memstream.py --dataset mammography --beta 0.1 --memlen 128`\n11. Pima Indians Diabetes: Run `python3 memstream.py --dataset pima --beta 0.001 --memlen 64`\n12. Covertype: Run `python3 memstream.py --dataset cover --beta 0.0001 --memlen 2048`\n\n\n## Command line options\n  * `--dataset`: The dataset to be used for training. Choices 'NSL', 'KDD', 'UNSW', 'DOS'. (default 'NSL')\n  * `--beta`: The threshold beta to be used. (default: 0.1)\n  * `--memlen`: The size of the Memory Module (default: 2048)\n  * `--dev`: Pytorch device to be used for training like \"cpu\", \"cuda:0\" etc. (default: 'cuda:0')\n  * `--lr`: Learning rate (default: 0.01)\n  * `--epochs`: Number of epochs (default: 5000)\n\n## Input file format\nMemStream expects the input multi-aspect record stream to be stored in a contains `,` separated file.\n\n## Datasets\nProcessed Datasets can be downloaded from [here](https://drive.google.com/file/d/1JNrhOr8U3Nqef1hBOqvHQPzBNWzDOFdl/view). Please unzip and place the files in the data folder of the repository.\n\n1. [KDDCUP99](http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html)\n2. [NSL-KDD](https://www.unb.ca/cic/datasets/nsl.html)\n3. [UNSW-NB 15](https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/)\n4. [CICIDS-DoS](https://www.unb.ca/cic/datasets/ids-2018.html)\n5. Synthetic Dataset (Introduced in paper)\n6. [Ionosphere](https://archive.ics.uci.edu/ml/index.php)\n7. [Cardiotocography](https://archive.ics.uci.edu/ml/index.php)\n8. [Statlog Landsat Satellite](https://archive.ics.uci.edu/ml/index.php)\n9. [Satimage-2](http://odds.cs.stonybrook.edu)\n10. [Mammography](http://odds.cs.stonybrook.edu)\n11. [Pima Indians Diabetes](https://archive.ics.uci.edu/ml/index.php)\n12. [Covertype](https://archive.ics.uci.edu/ml/index.php)\n\n## Environment\nThis code has been tested on Debian GNU/Linux 9 with a 12GB Nvidia GeForce RTX 2080 Ti GPU, CUDA Version 10.2 and PyTorch 1.5.\n\n## Citation\n\nIf you use this code for your research, please consider citing our WWW paper.\n\n```bibtex\n@inproceedings{bhatia2022memstream,\n    title={MemStream: Memory-Based Streaming Anomaly Detection},\n    author={Siddharth Bhatia and Arjit Jain and Shivin Srivastava and Kenji Kawaguchi and Bryan Hooi},\n    booktitle={The Web Conference (formerly WWW)},\n    year={2022}\n}\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FStream-AD%2FMemStream","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FStream-AD%2FMemStream","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FStream-AD%2FMemStream/lists"}