{"id":13473320,"url":"https://github.com/datamllab/tods","last_synced_at":"2025-05-15T18:08:37.017Z","repository":{"id":37347929,"uuid":"293719013","full_name":"datamllab/tods","owner":"datamllab","description":"TODS: An Automated Time-series Outlier Detection System","archived":false,"fork":false,"pushed_at":"2023-09-11T15:48:11.000Z","size":32670,"stargazers_count":1546,"open_issues_count":79,"forks_count":198,"subscribers_count":28,"default_branch":"master","last_synced_at":"2025-03-31T22:19:09.479Z","etag":null,"topics":["anomaly-detection","automl","machine-learning","outlier-detection","time-series","time-series-analysis","time-series-anomaly-detection"],"latest_commit_sha":null,"homepage":"http://tods-doc.github.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datamllab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-09-08T06:18:12.000Z","updated_at":"2025-03-27T18:17:12.000Z","dependencies_parsed_at":"2024-01-18T03:48:11.578Z","dependency_job_id":null,"html_url":"https://github.com/datamllab/tods","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datamllab%2Ftods","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datamllab%2Ftods/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datamllab%2Ftods/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datamllab%2Ftods/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datamllab","download_url":"https://codeload.github.com/datamllab/tods/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247744334,"owners_count":20988783,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomaly-detection","automl","machine-learning","outlier-detection","time-series","time-series-analysis","time-series-anomaly-detection"],"created_at":"2024-07-31T16:01:02.645Z","updated_at":"2025-04-07T23:08:42.911Z","avatar_url":"https://github.com/datamllab.png","language":"Python","funding_links":[],"categories":["Python","Papers","📦 Packages","AI for *Ops","Industry-strength AD","2021","Outlier and noise detection"],"sub_categories":["Outiler","Python","Observability \u0026 Monitoring with AI"],"readme":"# TODS: Automated Time-series Outlier Detection System\n\n\u003cimg width=\"500\" src=\"./docs/source/img/tods_logo.png\" alt=\"Logo\" /\u003e\n\n[![Actions Status](https://github.com/datamllab/tods/workflows/Build/badge.svg)](https://github.com/datamllab/tods/actions)\n[![codecov](https://codecov.io/gh/datamllab/tods/branch/master/graph/badge.svg?token=M90ZCVTRBF)](https://codecov.io/gh/datamllab/tods)\n\n[中文文档](README.zh-CN.md)\n\nTODS is a full-stack automated machine learning system for outlier detection on multivariate time-series data. TODS provides exhaustive modules for building machine learning-based outlier detection systems, including: data processing, time series processing, feature analysis (extraction), detection algorithms, and reinforcement module. The functionalities provided via these modules include data preprocessing for general purposes, time series data smoothing/transformation, extracting features from time/frequency domains, various detection algorithms, and involving human expertise to calibrate the system. Three common outlier detection scenarios on time-series data can be performed: point-wise detection (time points as outliers), pattern-wise detection (subsequences as outliers), and system-wise detection (sets of time series as outliers), and a wide-range of corresponding algorithms are provided in TODS. This package is developed by [DATA Lab @ Rice University](https://cs.rice.edu/~xh37/index.html).\n\nTODS is featured for:\n* **Full Stack Machine Learning System** which supports exhaustive components from preprocessings, feature extraction, detection algorithms and also human-in-the loop interface. \n\n* **Wide-range of Algorithms**, including all of the point-wise detection algorithms supported by [PyOD](https://github.com/yzhao062/pyod), state-of-the-art pattern-wise (collective) detection algorithms such as [DeepLog](https://www.cs.utah.edu/~lifeifei/papers/deeplog.pdf), [Telemanon](https://arxiv.org/pdf/1802.04431.pdf), and also various ensemble algorithms for performing system-wise detection.\n\n* **Automated Machine Learning** aims to provide knowledge-free process that construct optimal pipeline based on the given data by automatically searching the best combination from all of the existing modules.\n\n## Examples and Tutorials\n* General Usage: [View in Colab](https://colab.research.google.com/drive/1oKKRqAQnkATsALffaf54zkDGpRseNVGZ?usp=sharing)\n* Fraud Detection: [View in Colab](https://colab.research.google.com/drive/15c1Rj60XESwkC2P-BVXUocsXaBJ3M1sr?usp=sharing)\n* BlockChain: [View in Colab](https://colab.research.google.com/drive/1fm6yTayjTssSMb6t0VcplBBHl5MrgLFR?usp=sharing)\n\n## Resources\n* API Documentations: [http://tods-doc.github.io](http://tods-doc.github.io)\n* Paper: [https://arxiv.org/abs/2009.09822](https://arxiv.org/abs/2009.09822)\n* Related Project: [AutoVideo: An Automated Video Action Recognition System](https://github.com/datamllab/autovideo)\n* :loudspeaker: Do you want to learn more about data pipeline search? Please check out our [data-centric AI survey](https://arxiv.org/abs/2303.10158) and [data-centric AI resources](https://github.com/daochenzha/data-centric-AI)!\n\n## Cite this Work:\nIf you find this  work useful, you may cite this work:\n```\n@article{Lai_Zha_Wang_Xu_Zhao_Kumar_Chen_Zumkhawaka_Wan_Martinez_Hu_2021, \n\ttitle={TODS: An Automated Time Series Outlier Detection System}, \n\tvolume={35}, \n\tnumber={18}, \n\tjournal={Proceedings of the AAAI Conference on Artificial Intelligence}, \n\tauthor={Lai, Kwei-Herng and Zha, Daochen and Wang, Guanchu and Xu, Junjie and Zhao, Yue and Kumar, Devesh and Chen, Yile and Zumkhawaka, Purav and Wan, Minyang and Martinez, Diego and Hu, Xia}, \n\tyear={2021}, month={May}, \n\tpages={16060-16062} \n}\n\n```\n\n## Installation\n\nThis package works with **Python 3.7+** and pip 19+. You need to have the following packages installed on the system (for Debian/Ubuntu):\n```\nsudo apt-get install libssl-dev libcurl4-openssl-dev libyaml-dev build-essential libopenblas-dev libcap-dev ffmpeg\n```\n\nClone the repository (if you are in China and Github is slow, you can use the mirror in [Gitee](https://gitee.com/daochenzha/tods)):\n```\ngit clone https://github.com/datamllab/tods.git\n```\nInstall locally with `pip`:\n```\ncd tods\npip install -e .\n```\n\n# Examples\nExamples are available in [/examples](examples/). For basic usage, you can evaluate a pipeline on a given datasets. Here, we provide example to load our default pipeline and evaluate it on a subset of yahoo dataset.\n```python\nimport pandas as pd\n\nfrom tods import schemas as schemas_utils\nfrom tods import generate_dataset, evaluate_pipeline\n\ntable_path = 'datasets/anomaly/raw_data/yahoo_sub_5.csv'\ntarget_index = 6 # what column is the target\nmetric = 'F1_MACRO' # F1 on both label 0 and 1\n\n# Read data and generate dataset\ndf = pd.read_csv(table_path)\ndataset = generate_dataset(df, target_index)\n\n# Load the default pipeline\npipeline = schemas_utils.load_default_pipeline()\n\n# Run the pipeline\npipeline_result = evaluate_pipeline(dataset, pipeline, metric)\nprint(pipeline_result)\n```\nWe also provide AutoML support to help you automatically find a good pipeline for your data.\n```python\nimport pandas as pd\n\nfrom axolotl.backend.simple import SimpleRunner\n\nfrom tods import generate_dataset, generate_problem\nfrom tods.searcher import BruteForceSearch\n\n# Some information\ntable_path = 'datasets/yahoo_sub_5.csv'\ntarget_index = 6 # what column is the target\ntime_limit = 30 # How many seconds you wanna search\nmetric = 'F1_MACRO' # F1 on both label 0 and 1\n\n# Read data and generate dataset and problem\ndf = pd.read_csv(table_path)\ndataset = generate_dataset(df, target_index=target_index)\nproblem_description = generate_problem(dataset, metric)\n\n# Start backend\nbackend = SimpleRunner(random_seed=0)\n\n# Start search algorithm\nsearch = BruteForceSearch(problem_description=problem_description,\n                          backend=backend)\n\n# Find the best pipeline\nbest_runtime, best_pipeline_result = search.search_fit(input_data=[dataset], time_limit=time_limit)\nbest_pipeline = best_runtime.pipeline\nbest_output = best_pipeline_result.output\n\n# Evaluate the best pipeline\nbest_scores = search.evaluate(best_pipeline).scores\n```\n# Acknowledgement\nWe gratefully acknowledge the Data Driven Discovery of Models (D3M) program of the Defense Advanced Research Projects Agency (DARPA)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatamllab%2Ftods","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatamllab%2Ftods","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatamllab%2Ftods/lists"}