{"id":13415307,"url":"https://github.com/KDD-OpenSource/DeepADoTS","last_synced_at":"2025-03-14T22:33:15.092Z","repository":{"id":36734979,"uuid":"130369068","full_name":"KDD-OpenSource/DeepADoTS","owner":"KDD-OpenSource","description":"Repository of the paper \"A Systematic Evaluation of Deep Anomaly Detection Methods for Time Series\".","archived":false,"fork":false,"pushed_at":"2022-05-25T06:27:30.000Z","size":3760,"stargazers_count":559,"open_issues_count":11,"forks_count":115,"subscribers_count":17,"default_branch":"master","last_synced_at":"2024-07-31T21:53:38.394Z","etag":null,"topics":["anomaly-detection","deep-learning","pytorch","tensorflow","time-series","timeseries"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KDD-OpenSource.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-04-20T13:54:52.000Z","updated_at":"2024-07-27T13:49:09.000Z","dependencies_parsed_at":"2022-09-03T13:51:17.680Z","dependency_job_id":null,"html_url":"https://github.com/KDD-OpenSource/DeepADoTS","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KDD-OpenSource%2FDeepADoTS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KDD-OpenSource%2FDeepADoTS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KDD-OpenSource%2FDeepADoTS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KDD-OpenSource%2FDeepADoTS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KDD-OpenSource","download_url":"https://codeload.github.com/KDD-OpenSource/DeepADoTS/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243658057,"owners_count":20326459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomaly-detection","deep-learning","pytorch","tensorflow","time-series","timeseries"],"created_at":"2024-07-30T21:00:46.885Z","updated_at":"2025-03-14T22:33:15.086Z","avatar_url":"https://github.com/KDD-OpenSource.png","language":"Python","funding_links":[],"categories":["异常检测包","AI for *Ops","Time-series anomaly detection **(need to survey more..)**","Anomaly Detection Software","工具箱与数据集"],"sub_categories":["Observability \u0026 Monitoring with AI","3.2 时间序列异常检测"],"readme":"\n# Anomaly Detection on Time Series: An Evaluation of Deep Learning Methods. [![CircleCI](https://circleci.com/gh/KDD-OpenSource/DeepADoTS/tree/master.svg?style=svg\u0026circle-token=2f20af2255f5f2d1ca22193c1b896d1c97b270d3)](https://circleci.com/gh/KDD-OpenSource/DeepADoTS/tree/master)\n\nThe goal of this repository is to provide a benchmarking pipeline for anomaly detection on time series data for multiple state-of-the-art deep learning methods.\n\n\n## Implemented Algorithms\n\n| Name               | Paper               | \n|--------------------|---------------------|\n| LSTM-AD | [Long short term memory networks for anomaly detection in time series](https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2015-56), ESANN 2015  |\n| LSTM-ED |[LSTM-based encoder-decoder for multi-sensor anomaly detection](https://arxiv.org/pdf/1607.00148.pdf), ICML 2016|\n| Autoencoder | [Outlier detection using replicator neural networks](https://link.springer.com/content/pdf/10.1007%2F3-540-46145-0_17.pdf), DaWaK 2002 |\n| Donut| [Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications](https://arxiv.org/pdf/1802.03903.pdf), WWW 2018 |\n| REBM | [Deep structured energy based models for anomaly detection](http://proceedings.mlr.press/v48/zhai16.pdf), ICML 2016|\n|DAGMM| [Deep autoencoding gaussian mixture model for unsupervised anomaly detection](https://openreview.net/pdf?id=BJJLHbb0-), ICLR 2018|\n|LSTM-DAGMM | Extension of [DAGMM](https://openreview.net/pdf?id=BJJLHbb0-) using an [LSTM](https://www.bioinf.jku.at/publications/older/2604.pdf)-Autoencoder instead of a Neural Network Autoencoder|\n\n## Usage\n\n```bash\ngit clone git://github.com/KDD-OpenSource/DeepADoTS.git  \nvirtualenv venv -p /usr/bin/python3  \nsource venv/bin/activate  \npip install -r requirements.txt  \npython3 main.py\n```\n\n\n## Example\nWe follow the [scikit-learn API](http://scikit-learn.org/dev/developers/contributing.html#different-objects) by offering the interface methods `fit(X)` and `predict(X)`. The former estimates the data distribution in an unsupervised way while the latter returns an anomaly score for each instance - the higher, the more certain is the model that the instance is an anomaly. To compare the performance of methods, we use the [ROC AUC](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html) value.\n\n\nWe use MNIST to demonstrate the usage of a model since it is already available in TensorFlow and does not require downloading external data (even though the data has no temporal aspect).\n\n```python\nimport pandas as pd\nimport tensorflow as tf\nfrom sklearn.metrics import roc_auc_score\n\nfrom src.algorithms import AutoEncoder\nfrom src.datasets import Dataset\n\n\nclass MNIST(Dataset):\n    \"\"\"0 is the outlier class. The training set is free of outliers.\"\"\"\n\n    def __init__(self, seed):\n        super().__init__(name=\"MNIST\", file_name='')  # We do not need to load data from a file\n        self.seed = seed\n\n    def load(self):\n        # 0 is the outlier, all other digits are normal\n        OUTLIER_CLASS = 0\n        mnist = tf.keras.datasets.mnist\n        (x_train, y_train), (x_test, y_test) = mnist.load_data()\n        # Label outliers with 1 and normal digits with 0\n        y_train, y_test = (y_train == OUTLIER_CLASS), (y_test == OUTLIER_CLASS)\n        x_train = x_train[~y_train]  # Remove outliers from the training set\n        x_train, x_test = x_train / 255, x_test / 255\n        x_train, x_test = x_train.reshape(-1, 784), x_test.reshape(-1, 784)\n        self._data = tuple(pd.DataFrame(data=data) for data in [x_train, y_train, x_test, y_test])\n\n\nx_train, y_train, x_test, y_test = MNIST(seed=0).data()\n# Use fewer instances for demonstration purposes\nx_train, y_train = x_train[:1000], y_train[:1000]\nx_test, y_test = x_test[:100], y_test[:100]\n\nmodel = AutoEncoder(sequence_length=1, num_epochs=40, hidden_size=10, lr=1e-4)\nmodel.fit(x_train)\n\nerror = model.predict(x_test)\nprint(roc_auc_score(y_test, error))  # e.g. 0.8614\n```\nWe can visualize the samples with respective error values as follows\n```python\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom matplotlib import offsetbox\n\n\"\"\"Borrowed from https://github.com/scikit-learn/scikit-learn/blob/master/examples/manifold/plot_lle_digits.py#L44\"\"\"\nerror = (error - error.min()) / (error.max() - error.min())  # Normalize error\nx_test = x_test.values\ny_random = np.random.rand(len(x_test)) * 2 - 1\nplt.figure(figsize=(20, 10))\nax = plt.subplot(111)\nif hasattr(offsetbox, 'AnnotationBbox'):\n    shown_images = np.array([[1., 1.]])\n    for i in range(len(x_test)):\n        X_instance = [error[i], y_random[i]]\n        dist = np.sum((X_instance - shown_images) ** 2, 1)\n        if np.min(dist) \u003c 4e-5:\n            # don't show points that are too close\n            continue\n        shown_images = np.r_[shown_images, [X_instance]]\n        imagebox = offsetbox.AnnotationBbox(offsetbox.OffsetImage(x_test[i].reshape(28, 28), cmap=plt.cm.gray_r), X_instance)\n        ax.add_artist(imagebox)\nplt.xlim((0, 1.1))\nplt.ylim((-1.2, 1.2))\nplt.xlabel(\"Anomaly Score\")\nplt.title(\"Predicted Anomaly Score for the Test Set\")\nplt.show()\n```\nWhich creates a plot like this\n![](https://user-images.githubusercontent.com/6676439/48005276-51ee4c80-e113-11e8-8887-ac887e2cdde4.png)\nWe can see that global outliers (zeros) and local outliers (strangely written digits) receive high anomaly scores.\n\n\n## Deployment\n\n- `docker build -t deep-adots .`\n- `docker run -ti deep-adots /bin/bash -c \"python3.6 /repo/main.py\"`\n\n\n## Authors/Contributors\nTeam:\n* [Maxi Fischer](https://github.com/maxifischer)\n* [Willi Gierke](https://github.com/WGierke)\n* [Thomas Kellermeier](https://github.com/Chaoste)\n* [Ajay Kesar](https://github.com/weaslbe)\n* [Axel Stebner](https://github.com/xasetl)\n* [Daniel Thevessen](https://github.com/danthe96)\n\nSupervisors:\n* [Lukas Ruff](https://github.com/lukasruff)\n* [Fabian Geier](https://github.com/fabiangei)\n* [Emmanuel Müller](https://github.com/emmanuel-mueller)\n\n\n## Credits\n[Base implementation for DAGMM](https://github.com/danieltan07/dagmm)  \n[Base implementation for Donut](https://github.com/haowen-xu/donut)  \n[Base implementation for Recurrent EBM](https://github.com/dshieble/Music_RNN_RBM)  \n[Downloader for real-world datasets](https://github.com/chickenbestlover/RNN-Time-series-Anomaly-Detection/blob/master/0_download_dataset.py)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FKDD-OpenSource%2FDeepADoTS","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FKDD-OpenSource%2FDeepADoTS","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FKDD-OpenSource%2FDeepADoTS/lists"}