{"id":15292582,"url":"https://github.com/boniolp/tsb-kit","last_synced_at":"2025-04-13T11:12:00.151Z","repository":{"id":243359478,"uuid":"812207167","full_name":"boniolp/tsb-kit","owner":"boniolp","description":"Univariate Time-Series Anomaly Detection algorithms from TSB-UAD","archived":false,"fork":false,"pushed_at":"2024-09-24T11:47:24.000Z","size":10750,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-10-01T16:18:54.266Z","etag":null,"topics":["anomaly-detection","benchmarking","data-mining","data-science","jupyter-notebook","machine-learning","machine-learning-algorithms","numpy","pandas","pypi-package","python","python3","time-series","time-series-analysis","time-series-anomaly-detection","unsupervised-learning"],"latest_commit_sha":null,"homepage":"https://tsb-kit.readthedocs.io/en/latest/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/boniolp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-08T08:36:13.000Z","updated_at":"2024-09-24T11:47:27.000Z","dependencies_parsed_at":"2024-06-08T10:42:55.118Z","dependency_job_id":null,"html_url":"https://github.com/boniolp/tsb-kit","commit_stats":null,"previous_names":["boniolp/tsb-kit"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boniolp%2Ftsb-kit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boniolp%2Ftsb-kit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boniolp%2Ftsb-kit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boniolp%2Ftsb-kit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/boniolp","download_url":"https://codeload.github.com/boniolp/tsb-kit/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":219846720,"owners_count":16556429,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomaly-detection","benchmarking","data-mining","data-science","jupyter-notebook","machine-learning","machine-learning-algorithms","numpy","pandas","pypi-package","python","python3","time-series","time-series-analysis","time-series-anomaly-detection","unsupervised-learning"],"created_at":"2024-09-30T16:18:56.592Z","updated_at":"2024-10-14T21:03:14.642Z","avatar_url":"https://github.com/boniolp.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n\u003cimg width=\"300\" src=\"./images/logo-tsb-kit.png\"/\u003e\n\u003c/p\u003e\n\u003ch2 align=\"center\"\u003eUnivariate Time-Series Anomaly Detection algorithms from TSB-UAD benchmark\u003c/h2\u003e\n\u003cdiv align=\"center\"\u003e\n\u003cp\u003e\n\u003cimg alt=\"PyPI - Downloads\" src=\"https://pepy.tech/badge/tsb_kit\"\u003e \u003cimg alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/tsb-kit\"\u003e \u003cimg alt=\"License\" src=\"https://img.shields.io/github/license/boniolp/tsb-kit\"\u003e \u003cimg alt=\"GitHub issues\" src=\"https://img.shields.io/github/issues/boniolp/tsb-kit\"\u003e \u003cimg alt=\"PyPI - Python Version\" src=\"https://img.shields.io/pypi/pyversions/tsb-kit\"\u003e \u003cimg alt=\"ReadTheDocs Status\" src=\"https://readthedocs.org/projects/tsb-kit/badge/?version=latest\"\u003e \n\u003c/p\u003e\n\u003c/div\u003e\n\n\nTSB-kit is a library of univariate time-series anomaly detection methods from the [TSB-UAD benchmark](https://github.com/TheDatumOrg/TSB-UAD). Overall, TSB-kit\ncontains 14 anomaly detection methods, and 15 evaluation measures. \nIf you use TSB-kit in your project or research, cite the following two papers:\n\n* [VLDB 2022a](https://www.paparrizos.org/papers/PaparrizosVLDB22a.pdf)\n* [VLDB 2022b](https://www.paparrizos.org/papers/PaparrizosVLDB22b.pdf)\n\n## Quick start\n\nTSB-kit's last version is now included in TSB-UAD. You can now directly install tsb-uad with the following command:\n\n```\npip install tsb-uad\n```\n\nThe old package of TSB-kit can be installed as follows (please read the installation section for more details).\n\n```\npip install tsb-kit\n```\n\n\n\n## References\n\n\u003e \"TSB-UAD: An End-to-End Benchmark Suite for Univariate Time-Series Anomaly Detection\"\u003cbr/\u003e\n\u003e John Paparrizos, Yuhao Kang, Paul Boniol, Ruey Tsay, Themis Palpanas, and Michael Franklin.\u003cbr/\u003e\n\u003e Proceedings of the VLDB Endowment (**PVLDB 2022**) Journal, Volume 15, pages 1697–1711\u003cbr/\u003e\n\n```bibtex\n@article{paparrizos2022tsb,\n  title={Tsb-uad: an end-to-end benchmark suite for univariate time-series anomaly detection},\n  author={Paparrizos, John and Kang, Yuhao and Boniol, Paul and Tsay, Ruey S and Palpanas, Themis and Franklin, Michael J},\n  journal={Proceedings of the VLDB Endowment},\n  volume={15},\n  number={8},\n  pages={1697--1711},\n  year={2022},\n  publisher={VLDB Endowment}\n}\n```\n\n\u003e \"Volume Under the Surface: A New Accuracy Evaluation Measure for Time-Series Anomaly Detection\"\u003cbr/\u003e\n\u003e John Paparrizos, Paul Boniol, Themis Palpanas, Ruey Tsay, Aaron Elmore, and Michael Franklin\u003cbr/\u003e\n\u003e Proceedings of the VLDB Endowment (**PVLDB 2022**) Journal, Volume 15, pages 2774‑2787\u003cbr/\u003e\n\n```bibtex\n@article{paparrizos2022volume,\n  title={{Volume Under the Surface: A New Accuracy Evaluation Measure for Time-Series Anomaly Detection}},\n  author={Paparrizos, John and Boniol, Paul and Palpanas, Themis and Tsay, Ruey S and Elmore, Aaron and Franklin, Michael J},\n  journal={Proceedings of the VLDB Endowment},\n  volume={15},\n  number={11},\n  pages={2774--2787},\n  year={2022},\n  publisher={VLDB Endowment}\n}\n\n```\n\n\u003e \"Local Evaluation of Time Series Anomaly Detection Algorithms\", \n\u003e Accepted in KDD 2022 Research Track: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.\n\u003e  [Affiliation Metrics](https://github.com/ahstat/affiliation-metrics-py)\n\n## Contributors\n\n* Paul Boniol (Inria, ENS)\n* John Paparrizos (Ohio State University)\n* Emmanouil Sylligardos (Inria, ENS)\n* Ashwin Krishna (IIT Madras)\n* Yuhao Kang (University of Chicago)\n* Alex Wu (University of Chicago)\n* Teja Bogireddy (University of Chicago)\n* Themis Palpanas (Université Paris Cité)\n\n## Installation\n\nThe following tools are required to install TSB-kit from source:\n\n- git\n- conda (anaconda or miniconda)\n\n#### Steps\n\n1. You can download the datasets of TSB-UAD using the following links:\n\n- Public: https://www.thedatum.org/datasets/TSB-UAD-Public.zip\n\n- Synthetic: https://www.thedatum.org/datasets/TSB-UAD-Synthetic.zip\n\n- Artificial: https://www.thedatum.org/datasets/TSB-UAD-Artificial.zip\n\n- - The UCR classification datasets used to generate the Artificial datasets: https://www.thedatum.org/datasets/UCR2022_DATASETS.zip\n\n2. Clone this repository using git and change into its root directory.\n\n```bash\ngit clone https://github.com/boniolp/tsb-kit.git\ncd tsb-kit/\n```\n\n3. Create and activate a conda-environment 'tsb-kit'.\n\n```bash\nconda env create --file environment.yml\nconda activate tsb-kit\n```\n\n4. Install TSB-kit:\n\nYou can install TSB-kit with pip.\n\n```\npip install tsb-kit\n```\n\nPlease note that NormA and Series2Graph are not available on the pip package. To use them, please unlock the corresponding zip files and install the package locally:\n\n```\npip install .\n```\n\n## Anomaly Detectors\n\nWe use 14 anomaly detection methods proposed for univariate time series. the following table lists and describes the methods considered in our benchmark:\n\n\u003cp align=\"center\"\u003e\n\u003cimg width=\"1000\" src=\"./images/taxonomy_short.png\"/\u003e\n\u003c/p\u003e\n\n| Anomaly Detection Method    | Description|\n|:--|:---------:|\n|Isolation Forest (IForest) | This method constructs the binary tree based on the space splitting and the nodes with shorter path lengths to the root are more likely to be anomalies. |\n|The Local Outlier Factor (LOF)| This method computes the ratio of the neighboring density to the local density. |\n|The Histogram-based Outlier Score (HBOS)| This method constructs a histogram for the data and the inverse of the height of the bin is used as the outlier score of the data point. |\n|Matrix Profile (MP)| This method calculates as anomaly the subsequence with the most significant 1-NN distance. |\n|NORMA| This method identifies the normal pattern based on clustering and calculates each point's effective distance to the normal pattern. |\n|Principal Component Analysis (PCA)| This method projects data to a lower-dimensional hyperplane, and data points with a significant distance from this plane can be identified as outliers. |\n|Autoencoder (AE)|This method projects data to the lower-dimensional latent space and reconstructs the data, and outliers are expected to have more evident reconstruction deviation. |\n|LSTM-AD| This method build a non-linear relationship between current and previous time series (using Long-Short-Term-Memory cells), and the outliers are detected by the deviation between the predicted and actual values. |\n|Polynomial Approximation (POLY)| This method build a non-linear relationship between current and previous time series (using polynomial decomposition), and the outliers are detected by the deviation between the predicted and actual values. |\n| CNN | This method build a non-linear relationship between current and previous time series (using convolutional Neural Network), and the outliers are detected by the deviation between the predicted and actual values. |\n|One-class Support Vector Machines (OCSVM)| This method fits the dataset to find the normal data's boundary. |\n|*Discord Aware Matrix Profile (DAMP)*| *This method is a scalable matrix Profile-based approach proposed to solves the twin-freak problem.* |\n|*SAND*| *This method identifies the normal pattern based on clustering updated through arriving batches (i.e., subsequences) and calculates each point's effective distance to the normal pattern. This method can be used either online and offline.* |\n|*Series2Graph*| *This method is converting the time series into a directed graph reprenting the evolution of subsequences in time. The anomalies are detected using the weight and the degree of the nodes and edges respectively.* |\n\nYou may find more details (and the references) in our [paper](https://www.paparrizos.org/papers/PaparrizosVLDB22b.pdf). In italics are methods that are available but not evaluated yet.\n\n## Usage\n\n\nWe depicts below a code snippet demonstrating how to use one anomaly detector (in this example, IForest).\n\n```python\nimport os\nimport numpy as np\nimport pandas as pd\nfrom tsb_kit.models.iforest import IForest\nfrom tsb_kit.models.feature import Window\nfrom tsb_kit.utils.slidingWindows import find_length\nfrom tsb_kit.vus.metrics import get_metrics\n\ndf = pd.read_csv('data/benchmark/ECG/MBA_ECG805_data.out', header=None).to_numpy()\ndata = df[:, 0].astype(float)\nlabel = df[:, 1]\n\nslidingWindow = find_length(data)\nX_data = Window(window = slidingWindow).convert(data).to_numpy()\n\nclf = IForest(n_jobs=1)\nclf.fit(X_data)\nscore = clf.decision_scores_\n\nscore = MinMaxScaler(feature_range=(0,1)).fit_transform(score.reshape(-1,1)).ravel()\nscore = np.array([score[0]]*math.ceil((slidingWindow-1)/2) + list(score) + [score[-1]]*((slidingWindow-1)//2))\n\n\nresults = get_metrics(score, label, metric=\"all\", slidingWindow=slidingWindow)\nfor metric in results.keys():\n    print(metric, ':', results[metric])\n```\n\n```\nAUC_ROC : 0.9216216369841076\nAUC_PR : 0.6608577550833885\nPrecision : 0.7342093339374717\nRecall : 0.4010891089108911\nF : 0.5187770129662238\nPrecision_at_k : 0.4010891089108911\nRprecision : 0.7486112853253205\nRrecall : 0.3097733542316151\nRF : 0.438214653167952\nR_AUC_ROC : 0.989123018780308\nR_AUC_PR : 0.9435238401582703\nVUS_ROC : 0.9734357459251715\nVUS_PR : 0.8858037295594041\nAffiliation_Precision : 0.9630674176380548\nAffiliation_Recall : 0.9809813654809071\n```\n\nYou may find more details on how to run each anomaly detection method in the example folder.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fboniolp%2Ftsb-kit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fboniolp%2Ftsb-kit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fboniolp%2Ftsb-kit/lists"}