{"id":13471766,"url":"https://github.com/mononitogoswami/tsad-model-selection","last_synced_at":"2025-03-26T14:32:28.868Z","repository":{"id":155339657,"uuid":"535886093","full_name":"mononitogoswami/tsad-model-selection","owner":"mononitogoswami","description":"Code for \"Unsupervised Model Selection for Time-series Anomaly Detection\", ICLR 2023.","archived":false,"fork":false,"pushed_at":"2023-12-14T15:46:18.000Z","size":11022,"stargazers_count":66,"open_issues_count":2,"forks_count":12,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-10-30T04:09:42.768Z","etag":null,"topics":["anomaly-detection","machine-learning","model-selection","time-series-analysis","unsupervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mononitogoswami.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-12T23:32:40.000Z","updated_at":"2024-10-25T09:36:06.000Z","dependencies_parsed_at":null,"dependency_job_id":"74f92161-163a-4be7-8d24-2b21dae7d377","html_url":"https://github.com/mononitogoswami/tsad-model-selection","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mononitogoswami%2Ftsad-model-selection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mononitogoswami%2Ftsad-model-selection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mononitogoswami%2Ftsad-model-selection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mononitogoswami%2Ftsad-model-selection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mononitogoswami","download_url":"https://codeload.github.com/mononitogoswami/tsad-model-selection/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245670997,"owners_count":20653467,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomaly-detection","machine-learning","model-selection","time-series-analysis","unsupervised-learning"],"created_at":"2024-07-31T16:00:49.097Z","updated_at":"2025-03-26T14:32:27.783Z","avatar_url":"https://github.com/mononitogoswami.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook","2023"],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eUnsupervised Model Selection for Time-series Anomaly Detection\u003c/h1\u003e\n\u003ch3 align=\"center\"\u003eMost time-series anomaly detection models don't need labels for training. So why should we need labels to select good models? \u003c/h3\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003cimg alt=\"License\" src=\"https://img.shields.io/badge/License-Apache_2.0-blue.svg\"\u003e\n\u003c!--     \u003cimg alt=\"Visitors\" src=\"https://visitor-badge.glitch.me/badge?page_id=mononitogoswami/tsad-model-selection\"\u003e --\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\nTL;DR: We introduce `tsadams` for unsupervised \u003cb\u003et\u003c/b\u003eime-\u003cb\u003es\u003c/b\u003eeries \u003cb\u003ea\u003c/b\u003enomaly \u003cb\u003ed\u003c/b\u003eetection \u003cb\u003em\u003c/b\u003eodel \u003cb\u003es\u003c/b\u003eelection!\n\u003c/p\u003e\n\nHundreds of models for anomaly detection in time-series are available to practitioners, but no method exists to select the best model and its hyperparameters for a given dataset when labels are not available. We construct three classes of surrogate metrics which we show to be correlated with common supervised anomaly detection accuracy metrics such as the F1 score. The three classes of metrics are prediction accuracy, centrality, and performance on injected synthetic anomalies. We show that some of the surrogate metrics are useful for unsupervised model selection but not sufficient by themselves. To this end, we treat metric combinations as a rank aggregation problem and propose a robust rank aggregation approach. Large scale experiments on multiple real-world datasets demonstrate that our proposed unsupervised aggregation approach is as effective as selecting the best model based on collecting anomaly labels.\n\n\u003cp align=\"center\"\u003e\n\u003cimg height =\"300px\" src=\"assets/methods.png\"\u003e\n\u003c/p\u003e\n\nFigure 1: *The Model Selection Workflow.* We identify three classes of surrogate metrics of model quality, and propose a novel robust rank aggregation framework to combine multiple rankings from metrics. \n\nIf you use this code, please consider citing our work: \n\u003e [Unsupervised Model Selection for Time-series Anomaly Detection](https://openreview.net/pdf?id=gOZ_pKANaPW)\\\nMononito Goswami, Cristian Ignacio Challu, Laurent Callot, Lenon Minorics, Andrey Kan\\\nInternational Conference on Learning Representations (ICLR), 2023\n\n----\n\n## Contents\n\n1. [Datasets](#datasets)\n2. [Installation](#installation)\n3. [Reproduce Results](#reproduction)\n4. [Citation](#citation)\n\n\u003ca id=\"datasets\"\u003e\u003c/a\u003e\n## Datasets\n\nWe carry out experiments on two popular and widely used real-world collections with diverse time-series and anomalies: (1) UCR Anomaly Archive (UCR) (Wu \u0026 Keogh, 2021), and (2) Server Machine Dataset (SMD) (Su et al., 2019). \n\nThese datasets can be downloaded using the `download_data.py` script in the `scripts` directory and loaded using the `tsadams.datasets.load.load_data(...)` function. \n\nTo load the UCR dataset: \n\n```python  \n    from tsadams.datasets.load import load_data\n\n    # Load the data\n    ENTITY = 'anomaly_archive' # 'anomaly_archive' OR 'smd' \n    \n    DATASET = '028_UCR_Anomaly_DISTORTEDInternalBleeding17' # Name of timeseries in UCR or machine in SMD\n    \n    train_data = load_data(dataset=DATASET, \n                           group='train', \n                           entities=[ENTITY], \n                           downsampling=None, \n                           root_dir='/path_to_dataset_dir', \n                           normalize=True, \n                           verbose=True)\n    \n    test_data = load_data(dataset=DATASET, \n                          group='test', \n                          entities=[ENTITY], \n                          downsampling=None, \n                          root_dir='/path_to_dataset_dir', \n                          normalize=True, \n                          verbose=True)\n\n```\n\n----\n\n\u003ca id=\"installation\"\u003e\u003c/a\u003e\n## Installation\n\nWe recommend installing [Ananconda](https://conda.io/projects/conda/en/latest/index.html) to run our code. To install Anaconda, review the installation instructions [here](https://docs.anaconda.com/anaconda/install/). \n\nTo setup the environment using [`conda`](https://conda.io/projects/conda/en/latest/index.html) (recommended, but optional), run the following commands:\n\n```console\n    # To create environment from environment_explicit.yml file\n    foo@bar:~$ conda env create -f environment_explicit.yml\n    \n    # To activate the environment\n    foo@bar:~$ conda activate modelselect \n    \n    # To verify if the new environment was installed correctly\n    foo@bar:~$ conda env list \n\n```\n\nFor an editable installation of our code from source, run the following commands:\n\n```console\n\n    foo@bar:~$ git clone https://github.com/mononitogoswami/tsad-model-selection.git\n    foo@bar:~$ cd tsad-model-selection/src/\n    foo@bar:~$ pip install -e .\n\n```\n----\n\n\u003ca id=\"reproduction\"\u003e\u003c/a\u003e\n## Reproduce Results\nTo reproduce the results presented in the paper, please follow these steps in the specified order. You can find all the necessary scripts in the `src \u003e scripts` directory of this repository:\n1. Run `download_data.py` to download the Server Machine datasets and the UCR Anomaly Detection archive.\n2. Train multiple anomaly detection models for each dataset using the `train_all_models.py`. You can track the progress of trained models using the `check_number_of_trained_models.py`. After this stage, for each dataset in SMD and the UCR anomaly archive, we should have trained anomaly detection models. Please note that, in some cases, certain models may not have completed training due to potential errors.\n3. Next, get predictions (i.e. use each model to reconstruct time-series in each dataset) for all models and datasets using the `evaluate_all_models.py`. The progress for this step can be tracked using `check_number_of_evaluated_models.py`.\n4. In the paper, we pool the predictions of a particular model on multiple related datasets. This gives us a more robust measure of performance. To pool predictions of multiple models, run the     compute_pooled_results.py`. With this we should be all set to perform model selection!\n5. Perform model selection for each pooled datasets and evaluate it using the `results.ipynb` notebook.\n\n----\n\u003ca id=\"citation\"\u003e\u003c/a\u003e\n## Citation\n\nIf you use our code please cite our paper: \n\n```bibtex\n\n    @article{\n        goswami2023unsupervised,\n        title={Unsupervised Model Selection for Time-series Anomaly Detection},\n        author={Goswami, Mononito and Challu, Cristian and Callot, Laurent and Minorics, Lenon and Kan, Andrey},\n        journal={International Conference on Learning Representations.},\n        year={2023},\n    }\n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmononitogoswami%2Ftsad-model-selection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmononitogoswami%2Ftsad-model-selection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmononitogoswami%2Ftsad-model-selection/lists"}