{"id":13473998,"url":"https://github.com/yzhao062/pytod","last_synced_at":"2025-10-23T15:47:35.304Z","repository":{"id":36992518,"uuid":"421571276","full_name":"yzhao062/pytod","owner":"yzhao062","description":"TOD: GPU-accelerated Outlier Detection via Tensor Operations","archived":false,"fork":false,"pushed_at":"2023-03-02T16:22:06.000Z","size":13710,"stargazers_count":179,"open_issues_count":8,"forks_count":24,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-03-31T06:07:51.072Z","etag":null,"topics":["anomaly-detection","gpu-acceleration","gpu-systems","machine-learning","outlier-detection","unsupervised-learning"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2110.14007","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yzhao062.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGES.txt","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-26T20:15:54.000Z","updated_at":"2025-03-30T06:35:12.000Z","dependencies_parsed_at":"2024-10-26T21:17:48.840Z","dependency_job_id":"665bacb3-1e78-4ce1-a9ad-32b245c6133c","html_url":"https://github.com/yzhao062/pytod","commit_stats":{"total_commits":39,"total_committers":4,"mean_commits":9.75,"dds":"0.28205128205128205","last_synced_commit":"ec43433ad1a0ab939195a5eda0c1a6ab01b96ad2"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yzhao062%2Fpytod","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yzhao062%2Fpytod/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yzhao062%2Fpytod/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yzhao062%2Fpytod/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yzhao062","download_url":"https://codeload.github.com/yzhao062/pytod/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247616807,"owners_count":20967477,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomaly-detection","gpu-acceleration","gpu-systems","machine-learning","outlier-detection","unsupervised-learning"],"created_at":"2024-07-31T16:01:08.675Z","updated_at":"2025-10-23T15:47:35.215Z","avatar_url":"https://github.com/yzhao062.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"(Py)TOD: GPU-accelerated Outlier Detection via Tensor Operations\n================================================================\n\n\n**Deployment \u0026 Documentation \u0026 Stats \u0026 License**\n\n.. image:: https://img.shields.io/pypi/v/pytod.svg?color=brightgreen\n   :target: https://pypi.org/project/pytod/\n   :alt: PyPI version\n\n\n.. image:: https://img.shields.io/github/stars/yzhao062/pytod.svg\n   :target: https://github.com/yzhao062/pytod/stargazers\n   :alt: GitHub stars\n\n\n.. image:: https://img.shields.io/github/forks/yzhao062/pytod.svg?color=blue\n   :target: https://github.com/yzhao062/pytod/network\n   :alt: GitHub forks\n\n.. image:: https://github.com/yzhao062/pytod/actions/workflows/testing.yml/badge.svg\n   :target: https://github.com/yzhao062/pytod/actions/workflows/testing.yml\n   :alt: testing\n\n.. image:: https://img.shields.io/github/license/yzhao062/pytod.svg\n   :target: https://github.com/yzhao062/pytod/blob/master/LICENSE\n   :alt: License\n\n-----\n\n\n**Background**: Outlier detection (OD) is a key data mining task for identifying abnormal objects from general samples with numerous high-stake applications including fraud detection and intrusion detection.\n\nWe propose **TOD**, a system for efficient and scalable outlier detection (OD) on distributed multi-GPU machines.\nA key idea behind TOD is *decomposing OD applications into basic tensor algebra operations for GPU acceleration*.\n\n\n**Citing TOD**\\ : Check out `the design paper \u003chttps://www.andrew.cmu.edu/user/yuezhao2/papers/22-preprint-tod.pdf\u003e`_.\nIf you use TOD in a scientific publication, we would appreciate\ncitations to the following paper::\n\n\n    @article{zhao2021tod,\n      title={TOD: GPU-accelerated Outlier Detection via Tensor Operations},\n      author={Zhao, Yue and Chen, George H and Jia, Zhihao},\n      journal={arXiv preprint arXiv:2110.14007},\n      year={2021}\n    }\n\nor::\n\n    Zhao, Y., Chen, G.H. and Jia, Z., 2021. TOD: GPU-accelerated Outlier Detection via Tensor Operations. arXiv preprint arXiv:2110.14007.\n\n\n\n----\n\n\nOne Reason to Use It:\n^^^^^^^^^^^^^^^^^^^^^\n\nOn average, **TOD is 11 times faster than PyOD** on a diverse group of OD algorithms!\n\nIf you need another reason: it can handle much larger datasets---more than **a million sample** OD within an hour!\n\n**GPU-accelerated Outlier Detection with 5 Lines of Code**\\ :\n\n\n.. code-block:: python\n\n\n    # train the COPOD detector\n    from pytod.models.knn import KNN\n    clf = KNN() # default GPU device is used\n    clf.fit(X_train)\n\n    # get outlier scores\n    y_train_scores = clf.decision_scores_  # raw outlier scores on the train data\n    y_test_scores = clf.decision_function(X_test)  # predict raw outlier scores on test\n\n\n\n**TOD is featured for**:\n\n* **Unified APIs, detailed documentation, and examples** for the easy use (under construction)\n* **More than 5 different OD algorithms** and more are being added\n* **The support of multi-GPU acceleration**\n* **Advanced techniques** including *provable quantization* and *automatic batching*\n\n\n**Table of Contents**\\ :\n\n\n* `Installation \u003c#installation\u003e`_\n* `Implemented Algorithms \u003c#implemented-algorithms\u003e`_\n* `A Motivating Example PyOD vs. PyTOD \u003c#a-motivating-example-pyod-vs-pytod\u003e`_\n* `Paper Reproducibility \u003c#paper-reproducibility\u003e`_\n* `Programming Model Interface \u003c#programming-model-interface\u003e`_\n* `End-to-end Performance Comparison with PyOD \u003c#end-to-end-performance-comparison-with-pyod\u003e`_\n\n----\n\nInstallation\n^^^^^^^^^^^^\n\nIt is recommended to use **pip** for installation. Please make sure\n**the latest version** is installed, as PyTOD is updated frequently:\n\n.. code-block:: bash\n\n   pip install pytod            # normal install\n   pip install --upgrade pytod  # or update if needed\n\nAlternatively, you could clone and run setup.py file:\n\n.. code-block:: bash\n\n   git clone https://github.com/yzhao062/pytod.git\n   cd pyod\n   pip install .\n\n**Required Dependencies**\\ :\n\n\n* Python 3.6+\n* mpmath\n* numpy\u003e=1.13\n* torch\u003e=1.7 (**it is safer if you install by yourself**)\n* scipy\u003e=0.19.1\n* scikit_learn\u003e=0.21\n* pyod\u003e=1.0.4 (**for comparison**)\n\n----\n\n\nImplemented Algorithms\n^^^^^^^^^^^^^^^^^^^^^^\n\nPyTOD toolkit consists of three major functional groups (to be cleaned up):\n\n**(i) Individual Detection Algorithms** :\n\n===================  ==================  ======================================================================================================  =====  ========================================\nType                 Abbr                Algorithm                                                                                               Year   Ref\n===================  ==================  ======================================================================================================  =====  ========================================\nLinear Model         PCA                 Principal Component Analysis (the sum of weighted projected distances to the eigenvector hyperplanes)   2003   [#Shyu2003A]_\nProximity-Based      LOF                 Local Outlier Factor                                                                                    2000   [#Breunig2000LOF]_\nProximity-Based      COF                 Connectivity-Based Outlier Factor                                                                       2002   [#Tang2002Enhancing]_\nProximity-Based      HBOS                Histogram-based Outlier Score                                                                           2012   [#Goldstein2012Histogram]_\nProximity-Based      kNN                 k Nearest Neighbors (use the distance to the kth nearest neighbor as the outlier score)                 2000   [#Ramaswamy2000Efficient]_\nProximity-Based      AvgKNN              Average kNN (use the average distance to k nearest neighbors as the outlier score)                      2002   [#Angiulli2002Fast]_\nProximity-Based      MedKNN              Median kNN (use the median distance to k nearest neighbors as the outlier score)                        2002   [#Angiulli2002Fast]_\nProbabilistic        ABOD                Angle-Based Outlier Detection                                                                           2008   [#Kriegel2008Angle]_\nProbabilistic        COPOD               COPOD: Copula-Based Outlier Detection                                                                   2020   [#Li2020COPOD]_\nProbabilistic        FastABOD            Fast Angle-Based Outlier Detection using approximation                                                  2008   [#Kriegel2008Angle]_\n===================  ==================  ======================================================================================================  =====  ========================================\n\n\n**Code is being released**. Watch and star for the latest news!\n\n\n----\n\n\nA Motivating Example PyOD vs. PyTOD!\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n`kNN example \u003chttps://github.com/yzhao062/pytod/blob/main/examples/knn_example.py\u003e`_\nshows that how fast and how easy PyTOD is. Take the famous kNN outlier detection as an example:\n\n#. Initialize a kNN detector, fit the model, and make the prediction.\n\n   .. code-block:: python\n\n       from pytod.models.knn import KNN   # kNN detector\n\n       # train kNN detector\n       clf_name = 'KNN'\n       clf = KNN()\n       clf.fit(X_train)\n\n\n   .. code-block:: python\n\n       # if GPU is not available, use CPU instead\n       clf = KNN(device='cpu')\n       clf.fit(X_train)\n\n#. Get the prediction results\n\n   .. code-block:: python\n\n       # get the prediction label and outlier scores of the training data\n       y_train_pred = clf.labels_  # binary labels (0: inliers, 1: outliers)\n       y_train_scores = clf.decision_scores_  # raw outlier scores\n\n#. On a simple laptop, let us see how fast it is in comparison to PyOD for 30,000 samples with 20 features\n\n   .. code-block:: python\n\n      KNN-PyOD ROC:1.0, precision @ rank n:1.0\n      Execution time 11.26 seconds\n\n   .. code-block:: python\n\n      KNN-PyTOD-GPU ROC:1.0, precision @ rank n:1.0\n      Execution time 2.82 seconds\n\n   .. code-block:: python\n\n      KNN-PyTOD-CPU ROC:1.0, precision @ rank n:1.0\n      Execution time 3.36 seconds\n\nIt is easy to see, PyTOD shows both better efficiency than PyOD.\n\n----\n\nPaper Reproducibility\n^^^^^^^^^^^^^^^^^^^^^\n\n**Datasets**: OD benchmark datasets are available at `datasets folder \u003chttps://github.com/yzhao062/pytod/tree/main/reproducibility/datasets/ODDS\u003e`_.\n\n**Scripts for reproducibility is available in** `reproducibility folder \u003chttps://github.com/yzhao062/pytod/tree/main/reproducibility\u003e`_.\n\nCleanup is on the way!\n\n----\n\nProgramming Model Interface\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nComplex OD algorithms can be abstracted into common tensor operators.\n\n.. image:: https://raw.githubusercontent.com/yzhao062/pytod/master/figs/abstraction.png\n   :target: https://raw.githubusercontent.com/yzhao062/pytod/master/figs/abstraction.png\n\n\nFor instance, ABOD and COPOD can be assembled by the basic tensor operators.\n\n.. image:: https://raw.githubusercontent.com/yzhao062/pytod/master/figs/abstraction_example.png\n   :target: https://raw.githubusercontent.com/yzhao062/pytod/master/figs/abstraction_example.png\n\n\n----\n\nEnd-to-end Performance Comparison with PyOD\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nOverall, it is much (on avg. 11 times) faster than PyOD takes way less run time.\n\n.. image:: https://raw.githubusercontent.com/yzhao062/pytod/master/figs/run_time.png\n   :target: https://raw.githubusercontent.com/yzhao062/pytod/master/figs/run_time.png\n\n\n----\n\nReference\n^^^^^^^^^\n\n\n.. [#Aggarwal2015Outlier] Aggarwal, C.C., 2015. Outlier analysis. In Data mining (pp. 237-263). Springer, Cham.\n\n.. [#Aggarwal2015Theoretical] Aggarwal, C.C. and Sathe, S., 2015. Theoretical foundations and algorithms for outlier ensembles.\\ *ACM SIGKDD Explorations Newsletter*\\ , 17(1), pp.24-47.\n\n.. [#Aggarwal2017Outlier] Aggarwal, C.C. and Sathe, S., 2017. Outlier ensembles: An introduction. Springer.\n\n.. [#Almardeny2020A] Almardeny, Y., Boujnah, N. and Cleary, F., 2020. A Novel Outlier Detection Method for Multivariate Data. *IEEE Transactions on Knowledge and Data Engineering*.\n\n.. [#Angiulli2002Fast] Angiulli, F. and Pizzuti, C., 2002, August. Fast outlier detection in high dimensional spaces. In *European Conference on Principles of Data Mining and Knowledge Discovery* pp. 15-27.\n\n.. [#Arning1996A] Arning, A., Agrawal, R. and Raghavan, P., 1996, August. A Linear Method for Deviation Detection in Large Databases. In *KDD* (Vol. 1141, No. 50, pp. 972-981).\n\n.. [#Breunig2000LOF] Breunig, M.M., Kriegel, H.P., Ng, R.T. and Sander, J., 2000, May. LOF: identifying density-based local outliers. *ACM Sigmod Record*\\ , 29(2), pp. 93-104.\n\n.. [#Burgess2018Understanding] Burgess, Christopher P., et al. \"Understanding disentangling in beta-VAE.\" arXiv preprint arXiv:1804.03599 (2018).\n\n.. [#Goldstein2012Histogram] Goldstein, M. and Dengel, A., 2012. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm. In *KI-2012: Poster and Demo Track*\\ , pp.59-63.\n\n.. [#Gopalan2019PIDForest] Gopalan, P., Sharan, V. and Wieder, U., 2019. PIDForest: Anomaly Detection via Partial Identification. In Advances in Neural Information Processing Systems, pp. 15783-15793.\n\n.. [#Hardin2004Outlier] Hardin, J. and Rocke, D.M., 2004. Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator. *Computational Statistics \u0026 Data Analysis*\\ , 44(4), pp.625-638.\n\n.. [#He2003Discovering] He, Z., Xu, X. and Deng, S., 2003. Discovering cluster-based local outliers. *Pattern Recognition Letters*\\ , 24(9-10), pp.1641-1650.\n\n.. [#Iglewicz1993How] Iglewicz, B. and Hoaglin, D.C., 1993. How to detect and handle outliers (Vol. 16). Asq Press.\n\n.. [#Janssens2012Stochastic] Janssens, J.H.M., Huszár, F., Postma, E.O. and van den Herik, H.J., 2012. Stochastic outlier selection. Technical report TiCC TR 2012-001, Tilburg University, Tilburg Center for Cognition and Communication, Tilburg, The Netherlands.\n\n.. [#Kingma2013Auto] Kingma, D.P. and Welling, M., 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.\n\n.. [#Kriegel2008Angle] Kriegel, H.P. and Zimek, A., 2008, August. Angle-based outlier detection in high-dimensional data. In *KDD '08*\\ , pp. 444-452. ACM.\n\n.. [#Kriegel2009Outlier] Kriegel, H.P., Kröger, P., Schubert, E. and Zimek, A., 2009, April. Outlier detection in axis-parallel subspaces of high dimensional data. In *Pacific-Asia Conference on Knowledge Discovery and Data Mining*\\ , pp. 831-838. Springer, Berlin, Heidelberg.\n\n.. [#Lazarevic2005Feature] Lazarevic, A. and Kumar, V., 2005, August. Feature bagging for outlier detection. In *KDD '05*. 2005.\n\n.. [#Li2019MADGAN] Li, D., Chen, D., Jin, B., Shi, L., Goh, J. and Ng, S.K., 2019, September. MAD-GAN: Multivariate anomaly detection for time series data with generative adversarial networks. In *International Conference on Artificial Neural Networks* (pp. 703-716). Springer, Cham.\n\n.. [#Li2020COPOD] Li, Z., Zhao, Y., Botta, N., Ionescu, C. and Hu, X. COPOD: Copula-Based Outlier Detection. *IEEE International Conference on Data Mining (ICDM)*, 2020.\n\n.. [#Liu2008Isolation] Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In *International Conference on Data Mining*\\ , pp. 413-422. IEEE.\n\n.. [#Liu2019Generative] Liu, Y., Li, Z., Zhou, C., Jiang, Y., Sun, J., Wang, M. and He, X., 2019. Generative adversarial active learning for unsupervised outlier detection. *IEEE Transactions on Knowledge and Data Engineering*.\n\n.. [#Papadimitriou2003LOCI] Papadimitriou, S., Kitagawa, H., Gibbons, P.B. and Faloutsos, C., 2003, March. LOCI: Fast outlier detection using the local correlation integral. In *ICDE '03*, pp. 315-326. IEEE.\n\n.. [#Pevny2016Loda] Pevný, T., 2016. Loda: Lightweight on-line detector of anomalies. *Machine Learning*, 102(2), pp.275-304.\n\n.. [#Ramaswamy2000Efficient] Ramaswamy, S., Rastogi, R. and Shim, K., 2000, May. Efficient algorithms for mining outliers from large data sets. *ACM Sigmod Record*\\ , 29(2), pp. 427-438.\n\n.. [#Rousseeuw1999A] Rousseeuw, P.J. and Driessen, K.V., 1999. A fast algorithm for the minimum covariance determinant estimator. *Technometrics*\\ , 41(3), pp.212-223.\n\n.. [#Ruff2018Deep] Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S.A., Binder, A., Müller, E. and Kloft, M., 2018, July. Deep one-class classification. In *International conference on machine learning* (pp. 4393-4402). PMLR.\n\n.. [#Scholkopf2001Estimating] Scholkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J. and Williamson, R.C., 2001. Estimating the support of a high-dimensional distribution. *Neural Computation*, 13(7), pp.1443-1471.\n\n.. [#Shyu2003A] Shyu, M.L., Chen, S.C., Sarinnapakorn, K. and Chang, L., 2003. A novel anomaly detection scheme based on principal component classifier. *MIAMI UNIV CORAL GABLES FL DEPT OF ELECTRICAL AND COMPUTER ENGINEERING*.\n\n.. [#Tang2002Enhancing] Tang, J., Chen, Z., Fu, A.W.C. and Cheung, D.W., 2002, May. Enhancing effectiveness of outlier detections for low density patterns. In *Pacific-Asia Conference on Knowledge Discovery and Data Mining*, pp. 535-548. Springer, Berlin, Heidelberg.\n\n.. [#Wang2020adVAE] Wang, X., Du, Y., Lin, S., Cui, P., Shen, Y. and Yang, Y., 2019. adVAE: A self-adversarial variational autoencoder with Gaussian anomaly prior knowledge for anomaly detection. *Knowledge-Based Systems*.\n\n.. [#Zhao2018XGBOD] Zhao, Y. and Hryniewicki, M.K. XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning. *IEEE International Joint Conference on Neural Networks*\\ , 2018.\n\n.. [#Zhao2019LSCP] Zhao, Y., Nasrullah, Z., Hryniewicki, M.K. and Li, Z., 2019, May. LSCP: Locally selective combination in parallel outlier ensembles. In *Proceedings of the 2019 SIAM International Conference on Data Mining (SDM)*, pp. 585-593. Society for Industrial and Applied Mathematics.\n\n.. [#Zhao2021SUOD] Zhao, Y., Hu, X., Cheng, C., Wang, C., Wan, C., Wang, W., Yang, J., Bai, H., Li, Z., Xiao, C., Wang, Y., Qiao, Z., Sun, J. and Akoglu, L. (2021). SUOD: Accelerating Large-scale Unsupervised Heterogeneous Outlier Detection. *Conference on Machine Learning and Systems (MLSys)*.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyzhao062%2Fpytod","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyzhao062%2Fpytod","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyzhao062%2Fpytod/lists"}