{"id":13415322,"url":"https://github.com/datamllab/pyodds","last_synced_at":"2025-03-14T22:33:17.988Z","repository":{"id":35153104,"uuid":"213480284","full_name":"datamllab/pyodds","owner":"datamllab","description":"An End-to-end Outlier Detection System","archived":false,"fork":false,"pushed_at":"2023-03-25T00:15:53.000Z","size":632,"stargazers_count":253,"open_issues_count":7,"forks_count":39,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-02-17T09:39:36.521Z","etag":null,"topics":["anomaly-detection","database","deep-learning","machine-learning","outlier-detection","tdengine","time-series","time-series-analysis"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datamllab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-10-07T20:29:16.000Z","updated_at":"2025-02-07T10:15:54.000Z","dependencies_parsed_at":"2024-10-26T11:22:20.122Z","dependency_job_id":"08e93b4d-2afb-47c0-8b3f-3c1ea18ac793","html_url":"https://github.com/datamllab/pyodds","commit_stats":{"total_commits":101,"total_committers":7,"mean_commits":"14.428571428571429","dds":"0.39603960396039606","last_synced_commit":"b79ea797dca104b12df5ff03ba701a024e36deac"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datamllab%2Fpyodds","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datamllab%2Fpyodds/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datamllab%2Fpyodds/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datamllab%2Fpyodds/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datamllab","download_url":"https://codeload.github.com/datamllab/pyodds/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243658057,"owners_count":20326459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomaly-detection","database","deep-learning","machine-learning","outlier-detection","tdengine","time-series","time-series-analysis"],"created_at":"2024-07-30T21:00:47.119Z","updated_at":"2025-03-14T22:33:17.981Z","avatar_url":"https://github.com/datamllab.png","language":"Python","funding_links":[],"categories":["异常检测包","Anomaly Detection Software","Algorithm"],"sub_categories":[],"readme":"# PyODDS\n[![Build Status](https://travis-ci.com/datamllab/PyODDS.svg?branch=master)](https://travis-ci.com/datamllab/PyODDS)\n[![Coverage Status](https://coveralls.io/repos/github/datamllab/PyODDS/badge.svg?branch=master)](https://coveralls.io/github/datamllab/PyODDS?branch=master)\n[![Documentation Status](https://readthedocs.org/projects/pyodds-handbook/badge/?version=latest)](https://pyodds.github.io/)\n[![Codacy Badge](https://api.codacy.com/project/badge/Grade/3456033f37744ae2a5a69da448ee430d)](https://www.codacy.com/manual/pyodds/PyODDS?utm_source=github.com\u0026amp;utm_medium=referral\u0026amp;utm_content=pyodds/PyODDS\u0026amp;utm_campaign=Badge_Grade)\n[![PyPI version](https://badge.fury.io/py/pyodds.svg)](https://badge.fury.io/py/pyodds)\n\nOfficial Website: [http://pyodds.com/](http://pyodds.com/)\n\n##\n\n**PyODDS** is an end-to end **Python** system for **outlier** **detection** with **database** **support**. PyODDS provides outlier detection algorithms which meet the demands for users in different fields, w/wo data science or machine learning background. PyODDS gives the ability to execute machine learning algorithms in-database without moving data out of the database server or over the network. It also provides access to a wide range of outlier detection algorithms, including statistical analysis and more recent deep learning based approaches.  It is developed by [`DATA Lab`](http://faculty.cs.tamu.edu/xiahu/index.html) at Texas A\u0026M University.\n\nPyODDS is featured for:\n\n  - **Full Stack Service** which supports operations and maintenances from light-weight SQL based database to back-end machine learning algorithms and makes the throughput speed faster;\n\n  - **State-of-the-art Anomaly Detection Approaches** including **Statistical/Machine Learning/Deep Learning** models with unified APIs and detailed documentation;\n\n  - **Powerful Data Analysis Mechanism** which supports both **static and time-series data** analysis with flexible time-slice(sliding-window) segmentation.  \n  \n  - **Automated Machine Learning** PyODDS describes the first attempt to incorporate automated machine learning with outlier detection, and belongs to one of the first attempts to extend automated machine learning concepts into real-world data mining tasks.\n\nThe Full API Reference can be found in [`handbook`](https://pyodds.github.io/).\n\n## API Demo:\n\n\n```sh\nfrom utils.import_algorithm import algorithm_selection\nfrom utils.utilities import output_performance,connect_server,query_data\n\n# connect to the database\nconn,cursor=connect_server(host, user, password)\n\n# query data from specific time range\ndata = query_data(database_name,table_name,start_time,end_time)\n\n# train the anomaly detection algorithm\nclf = algorithm_selection(algorithm_name)\nclf.fit(X_train)\n\n# get outlier result and scores\nprediction_result = clf.predict(X_test)\noutlierness_score = clf.decision_function(test)\n\n#visualize the prediction_result\nvisualize_distribution(X_test,prediction_result,outlierness_score)\n\n```\n\n## Cite this work\n\n\nYuening Li, Daochen Zha, Praveen Kumar Venugopal, Na Zou, Xia Hu. \"PyODDS: An End-to-end Outlier Detection System with Automated Machine Learning\"  ([Download](https://dl.acm.org/doi/abs/10.1145/3366424.3383530))\n\nBiblatex entry:\n\n```sh\n@inproceedings{10.1145/3366424.3383530,\n    author = {Li, Yuening and Zha, Daochen and Venugopal, Praveen and Zou, Na and Hu, Xia},\n    title = {PyODDS: An End-to-End Outlier Detection System with Automated Machine Learning},\n    year = {2020},\n    isbn = {9781450370240},\n    publisher = {Association for Computing Machinery},\n    address = {New York, NY, USA},\n    url = {https://doi.org/10.1145/3366424.3383530},\n    doi = {10.1145/3366424.3383530},\n    booktitle = {Companion Proceedings of the Web Conference 2020},\n    pages = {153--157},\n    numpages = {5},\n    keywords = {Automated Machine Learning, Outlier Detection, Open Source Package, End-to-end System},\n    location = {Taipei, Taiwan},\n    series = {WWW '20}\n  }\n```  \n\n\n\n## Quick Start\n```sh\npython demo.py --ground_truth --visualize_distribution\n```\n\n### Results are shown as\n```sh\nconnect to TDengine success\nLoad dataset and table\nLoading cost: 0.151061 seconds\nLoad data successful\nStart processing:\n100%|████████████████████████████████████| 10/10 [00:00\u003c00:00, 14.02it/s]\n==============================\nResults in Algorithm dagmm are:\naccuracy_score: 0.98\nprecision_score: 0.99\nrecall_score: 0.99\nf1_score: 0.99\nroc_auc_score: 0.99\nprocessing time: 15.330137 seconds\n==============================\nconnection is closed\n\n```\n\u003cimg src=\"https://github.com/datamllab/PyODDS/blob/master/output/img/Result.png\" width=\"50%\" height=\"45%\"\u003e\n\n## Installation\n\nTo install the package, please use the [`pip`](https://pip.pypa.io/en/stable/installing/) installation as follows:\n\n```sh\npip install pyodds\npip install git+git@github.com:datamllab/PyODDS.git\n```\n**Note:** PyODDS is only compatible with **Python 3.6** and above.\n\n### Required Dependencies\n\n```sh\n- pandas\u003e=0.25.0\n- taos==1.4.15\n- tensorflow==2.0.0b1\n- numpy\u003e=1.16.4\n- seaborn\u003e=0.9.0\n- torch\u003e=1.1.0\n- luminol==0.4\n- tqdm\u003e=4.35.0\n- matplotlib\u003e=3.1.1\n- scikit_learn\u003e=0.21.3\n```\nTo compile and package the JDBC driver source code, you should have a Java jdk-8 or higher and Apache Maven 2.7 or higher installed. To install openjdk-8 on Ubuntu:\n\n```sh\nsudo apt-get install openjdk-8-jdk\n```\n\nTo install Apache Maven on Ubuntu:\n\n```sh\nsudo apt-get install maven\n```\nTo install the TDengine as the back-end database service, please refer to [this instruction](https://www.taosdata.com/en/getting-started/#Install-from-Package).\n\nTo enable the Python client APIs for TDengine, please follow [this handbook](https://www.taosdata.com/en/documentation/connector/#Python-Connector). \n\nTo insure the locale in config file is valid:\n\n```sh\nsudo locale-gen \"en_US.UTF-8\"\nexport LC_ALL=\"en_US.UTF-8\"\nlocale\n\n```\nTo start the service after installation, in a terminal, use:\n```sh\ntaosd\n```\n\n## Implemented Algorithms\n### Statistical Based Methods\nMethods | Algorithm | Class API\n------------ | -------------|-------------\nCBLOF | Clustering-Based Local Outlier Factor | :class:`algo.cblof.CBLOF`\nHBOS | Histogram-based Outlier Score | :class:`algo.hbos.HBOS`\nIFOREST | Isolation Forest | :class:`algo.iforest.IFOREST`\nKNN | k-Nearest Neighbors  | :class:`algo.knn.KNN`\nLOF | Local Outlier Factor | :class:`algo.cblof.CBLOF`\nOCSVM | One-Class Support Vector Machines | :class:`algo.ocsvm.OCSVM`\nPCA | Principal Component Analysis | :class:`algo.pca.PCA`\nRobustCovariance | Robust Covariance| :class:`algo.robustcovariance.RCOV`\nSOD | Subspace Outlier Detection| :class:`algo.sod.SOD`\n\n### Deep Learning Based Methods\nMethods | Algorithm | Class API\n------------ | -------------|-------------\nautoencoder | Outlier detection using replicator neural networks | :class:`algo.autoencoder.AUTOENCODER`\ndagmm | Deep autoencoding gaussian mixture model for unsupervised anomaly detection | :class:`algo.dagmm.DAGMM`\n\n### Time Serie Methods\nMethods | Algorithm | Class API\n------------ | -------------|-------------\nlstmad | Long short term memory networks for anomaly detection in time series | :class:`algo.lstm_ad.LSTMAD`\nlstmencdec | LSTM-based encoder-decoder for multi-sensor anomaly detection | :class:`algo.lstm_enc_dec_axl.LSTMED`\nluminol | Linkedin's luminol\t | :class:`algo.luminol.LUMINOL`\n\n## APIs Cheatsheet\n\nThe Full API Reference can be found in [`handbook`](https://pyodds.github.io/).\n\n  - **connect_server(hostname,username,password)**: Connect to Apache backend TDengine Service.\n\n  - **query_data(connection,cursor,database_name,table_name,start_time,end_time)**: Query data from table *table_name* in database *database_name* within a given time range.\n\n  - **algorithm_selection(algorithm_name,contamination)**: Select an algorithm as detector.\n\n  - **fit(X)**: Fit *X* to detector.\n\n  - **predict(X)**: Predict if instance in *X* is outlier or not.\n\n  - **decision_function(X)**: Output the anomaly score of instances in *X*.\n\n  - **output_performance(algorithm_name,ground_truth,prediction_result,outlierness_score)**: Output the prediction result as evaluation matrix in *Accuracy*, *Precision*, *Recall*, *F1 Score*, *ROC-AUC Score*, *Cost time*.\n\n  - **visualize_distribution(X,prediction_result,outlierness_score)**: Visualize the detection result with the the data distribution.\n\n  - **visualize_outlierscore(outlierness_score,prediction_result,contamination)** Visualize the detection result with the outlier score.\n\n\n## License\n\u003c!-- Biblatex entry: --\u003e\n\nYou may use this software under the MIT License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatamllab%2Fpyodds","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatamllab%2Fpyodds","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatamllab%2Fpyodds/lists"}