{"id":13415311,"url":"https://github.com/yahoo/egads","last_synced_at":"2025-05-15T18:04:04.951Z","repository":{"id":31608974,"uuid":"35173936","full_name":"yahoo/egads","owner":"yahoo","description":"A Java package to automatically detect anomalies in large scale time-series data","archived":false,"fork":false,"pushed_at":"2023-11-14T22:41:19.000Z","size":1369,"stargazers_count":1178,"open_issues_count":25,"forks_count":330,"subscribers_count":112,"default_branch":"master","last_synced_at":"2025-04-07T22:11:14.605Z","etag":null,"topics":["anomaly-detection-models","big-data","java","time-series"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yahoo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-05-06T17:47:52.000Z","updated_at":"2025-03-24T17:06:39.000Z","dependencies_parsed_at":"2022-08-03T12:30:45.047Z","dependency_job_id":"2ff1740d-d401-4568-867f-e190e6a4e8f3","html_url":"https://github.com/yahoo/egads","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yahoo%2Fegads","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yahoo%2Fegads/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yahoo%2Fegads/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yahoo%2Fegads/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yahoo","download_url":"https://codeload.github.com/yahoo/egads/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254394720,"owners_count":22063984,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomaly-detection-models","big-data","java","time-series"],"created_at":"2024-07-30T21:00:46.935Z","updated_at":"2025-05-15T18:04:04.931Z","avatar_url":"https://github.com/yahoo.png","language":"Java","readme":"[![Build Status](https://travis-ci.org/yahoo/egads.svg?branch=master)](https://travis-ci.org/yahoo/egads)\n\nEGADS Java Library\n==========================================================\n\nEGADS (Extensible Generic Anomaly Detection System) is an open-source Java package to automatically detect anomalies in large scale time-series data.\nEGADS is meant to be a library that contains a number of anomaly detection techniques applicable to many use-cases in a single package with the only dependency being Java.\nEGADS works by first building a time-series model which is used to compute the expected value at time *t*. Then a number of errors *E* are computed by comparing the expected\nvalue with the actual value at time *t*. EGADS automatically determines thresholds on *E* and outputs the most probable anomalies. EGADS library can be used in a wide\nvariety of contexts to detect outliers and change points in time-series that can have a various seasonal, trend and noise components.\n\nHow to get started\n===========================\n\nEGADS was designed as a self contained library that has a collection of time-series and anomaly detection models\nthat are applicable to a wide-range of use cases. To compile the library into a single jar, clone the repo and type the following:\n\n```shell\nmvn clean compile assembly:single\n```\n\nYou may have to set your `JAVA_HOME` variable to the appropriate JVM. To do this run:\n\n```shell\nexport JAVA_HOME=/usr/lib/jvm/{JVM directory for desired version}\n```\n\nUsage\n==========================\n\nTo run a simple example type:\n\n```shell\njava -Dlog4j.configurationFile=src/test/resources/log4j2.xml -cp target/egads-*-jar-with-dependencies.jar com.yahoo.egads.Egads src/test/resources/sample_config.ini src/test/resources/sample_input.csv\n```\n\nwhich produces the following picture (Note that you can enable this UI by setting `OUTPUT` config key to `GUI` in `sample_config.ini`).\n\n![gui](doc/ui.png \"EGADS GUI\")\n\nOne can also specify config parameters on a command line. For example to do anomaly detection using Olympic Scoring as a time-series model and a density based method as an anomaly detection model use the following.\n\n```shell\njava -Dlog4j.configurationFile=src/test/resources/log4j2.xml -cp target/egads-*-jar-with-dependencies.jar com.yahoo.egads.Egads \"MAX_ANOMALY_TIME_AGO:999999999;AGGREGATION:1;OP_TYPE:DETECT_ANOMALY;TS_MODEL:OlympicModel;AD_MODEL:ExtremeLowDensityModel;INPUT:CSV;OUTPUT:STD_OUT;BASE_WINDOWS:168;PERIOD:-1;NUM_WEEKS:3;NUM_TO_DROP:0;DYNAMIC_PARAMETERS:0;TIME_SHIFTS:0\" src/test/resources/sample_input.csv\n```\n\nTo run anomaly detection using no time-series model with an auto static threshold for anomaly detection, use the following:\n\n```shell\njava -Dlog4j.configurationFile=src/test/resources/log4j2.xml -cp target/egads-*-jar-with-dependencies.jar com.yahoo.egads.Egads \"MAX_ANOMALY_TIME_AGO:999999999;AGGREGATION:1;OP_TYPE:DETECT_ANOMALY;TS_MODEL:NullModel;AD_MODEL:SimpleThresholdModel;SIMPLE_THRESHOLD_TYPE:AdaptiveMaxMinSigmaSensitivity;INPUT:CSV;OUTPUT:STD_OUT;AUTO_SENSITIVITY_ANOMALY_PCNT:0.2;AUTO_SENSITIVITY_SD:2.0\" src/test/resources/sample_input.csv\n```\nTo embed the EGADs library in an application, pull the compiled JAR from JCenter by adding the proper repository. For example in a Maven POM file add:\n\n```\n\u003crepositories\u003e\n  \u003crepository\u003e\n    \u003cid\u003ejcenter\u003c/id\u003e\n    \u003curl\u003ehttps://jcenter.bintray.com/\u003c/url\u003e\n  \u003c/repository\u003e\n\u003c/repositories\u003e\n```\nThen import the dependency, e.g.:\n\n```\n\u003cdependency\u003e\n  \u003cgroupId\u003ecom.yahoo.egads\u003c/groupId\u003e\n  \u003cartifactId\u003eegads\u003c/artifactId\u003e\n  \u003cversion\u003e0.4.0\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\nOverview\n========\nWhile rapid advances in computing hardware and software have led to powerful applications,\nstill hundreds of software bugs and hardware failures continue to happen in a large cluster\ncompromising user experience and subsequently revenue. Non-stop systems have a strict uptime\nrequirement and continuous monitoring of these systems is critical. From the data analysis point of view,\nthis means non-stop monitoring of large volume of time-series data in order to detect potential faults or anomalies.\nDue to the large scale of the problem, human monitoring of this data is practically infeasible which leads us to\nautomated anomaly detection. An anomaly, or an outlier, is a data point which is significantly different from the rest of\nthe data. Generally, the data in most applications is created by one or more generating processes that reflect the functionality of a system.\n\nWhen the underlying generating process behaves in an unusual way, it creates outliers. Fast and efficient identification of these outliers is useful\nfor many applications including: intrusion detection, credit card fraud, sensor events, medical diagnoses, law enforcement and others.\nCurrent approaches in automated anomaly detection suffer from a large number of false positives which prohibit the usefulness of these systems in practice.\nUse-case, or category specific, anomaly detection models may enjoy a low false positive rate for a specific application, but when the characteristics of\nthe time-series change, these techniques perform poorly without proper retraining.\n\nEGADS (Extensible Generic Anomaly Detection System) enables the accurate and scalable detection of time-series\nanomalies. EGADS separates forecasting and anomaly detection two separate components which allows the person to add her own models into any\nof the components.\n\nArchitecture\n===========\n\nThe EGADS framework consists of two main components: the time-series modeling module (TMM), the anomaly detection module (ADM).\nGiven a time-series the TMM component models the time-series producing an expected value later consumed by the ADM that computes anomaly scores.\nEGADS was built as a framework to be easily integrated into an existing monitoring infrastructure. At Yahoo,\nour internal Yahoo Monitoring Service (YMS) processes millions of data-points every second. Therefore, having a scalable,\naccurate and automated anomaly detection for YMS is critical. For this reason, EGADS can be compiled into a single light-weight jar and deployed easily at scale.\n\nThe TMM and ADM can be found under `main/java/com/yahoo/egads/models`.\n\nThe example of the models supported by TMM and ADM can be found in in the two table below. We expect this collection of models to grow\nas more contribution is put forward by the community.\n\n###### List of current TimeSeries Models\n\n![models](doc/egadstsmd.png \"Supported Time-Series Models\")\n\n###### List of current Anomaly Detection Models\n\n![admodels](doc/egadsadm.png \"Supported Anomaly Detection Models\")\n\nConfiguration\n=============\n\nBelow are the various configuration parameters supported by EGADS.\n\n```\n# Only show anomalies no older than this.\n# If this is set to 0, then only output an anomaly\n# if it occurs on the last time-stamp.\nMAX_ANOMALY_TIME_AGO  99999\n\n# Denotes how much should the time-series be aggregated by.\n# If set to 1 or less, this setting is ignored.\nAGGREGATION\t1\n\n# OP_TYPE specifies the operation type.\n# Options: DETECT_ANOMALY,\n#          UPDATE_MODEL,\n#\t   TRANSFORM_INPUT\nOP_TYPE\tDETECT_ANOMALY\n\n# TS_MODEL specifies the time-series\n# model type.\n# Options: AutoForecastModel\n#          DoubleExponentialSmoothingModel\n#          MovingAverageModel\n#          MultipleLinearRegressionModel\n#          NaiveForecastingModel\n#          OlympicModel\n#          PolynomialRegressionModel\n#          RegressionModel\n#          SimpleExponentialSmoothingModel\n#          TripleExponentialSmoothingModel\n#          WeightedMovingAverageModel\n# \t   SpectralSmoother\n# \t   NullModel\nTS_MODEL\tOlympicModel\n\n# AD_MODEL specifies the anomaly-detection\n# model type.\n# Options: ExtremeLowDensityModel\n#          AdaptiveKernelDensityChangePointDetector\n#          KSigmaModel\n#          NaiveModel\n#          DBScanModel\n#          SimpleThresholdModel\nAD_MODEL\tExtremeLowDensityModel\n\n# Type of the simple threshold model.\n# Options: AdaptiveMaxMinSigmaSensitivity\n#          AdaptiveKSigmaSensitivity\n# SIMPLE_THRESHOLD_TYPE\n\n# Specifies the input src.\n# Options: STDIN\n#          CSV\nINPUT\tCSV\n\n# Specifies the output src.\n# Options: STD_OUT,\n#          ANOMALY_DB\n#          GUI\n#          PLOT\nOUTPUT  STD_OUT\n\n# THRESHOLD specifies the threshold for the\n# anomaly detection model.\n# Comment to auto-detect all thresholds.\n# Options: mapee,mae,smape,mape,mase.\n# THRESHOLD mape#10,mase#15\n\n#####################################\n### Olympic Forecast Model Config ###\n#####################################\n\n# The possible time-shifts for Olympic Scoring.\nTIME_SHIFTS 0,1\n\n# The possible base windows for Olympic Scoring.\nBASE_WINDOWS  24,168\n\n# Period specifies the periodicity of the\n# time-series (e.g., the difference between successive time-stamps).\n# Options: (numeric)\n#          0 - auto detect.\n#          -1 - disable.\nPERIOD\t-1\n\n\n# NUM_WEEKS specifies the number of weeks\n# to use in OlympicScoring.\nNUM_WEEKS 8\n\n# NUM_TO_DROP specifies the number of\n# highest and lowest points to drop.\nNUM_TO_DROP 0\n\n# If dynamic parameters is set to 1, then\n# EGADS will dynamically vary parameters (NUM_WEEKS)\n# to produce the best fit.\nDYNAMIC_PARAMETERS  0\n\n###################################################\n### ExtremeLowDensityModel \u0026 DBScanModel Config ###\n###################################################\n\n# Denotes the expected % of anomalies\n# in your data.\nAUTO_SENSITIVITY_ANOMALY_PCNT\t0.01\n\n# Refers to the cluster standard deviation.\nAUTO_SENSITIVITY_SD\t3.0\n\n############################\n### NaiveModel Config ###\n############################\n\n# Window size where the spike is to be found.\nWINDOW_SIZE\t0.1\n\n#######################################################\n### AdaptiveKernelDensityChangePointDetector Config ###\n#######################################################\n\n# Change point detection parameters\nPRE_WINDOW_SIZE\t48\nPOST_WINDOW_SIZE\t48\nCONFIDENCE\t0.8\n\n###############################\n### SpectralSmoother Config ###\n###############################\n\n# WINDOW_SIZE should be greater than the size of longest important seasonality.\n# By default it is set to 192 = 8 * 24 which is worth of 8 days (\u003e 1 week) for hourly time-series.\nWINDOW_SIZE 192\n\n# FILTERING_METHOD specifies the filtering method for Spectral Smoothing\n# Options:  \t\tGAP_RATIO\t\t(Recommended: FILTERING_PARAM = 0.01)\n#\t\t\tEIGEN_RATIO\t\t(Recommended: FILTERING_PARAM = 0.1)\n#\t\t\tEXPLICIT\t\t(Recommended: FILTERING_PARAM = 10)\n#\t\t\tK_GAP\t\t\t(Recommended: FILTERING_PARAM = 8)\n#\t\t\tVARIANCE\t\t(Recommended: FILTERING_PARAM = 0.99)\n#\t\t\tSMOOTHNESS\t\t(Recommended: FILTERING_PARAM = 0.97)\nFILTERING_METHOD GAP_RATIO\n\nFILTERING_PARAM 0.01\n```\n\nContributions\n================\n\n1. Clone your fork\n2. Hack away\n3. If you are adding new functionality, document it in the README\n4. Verify your code by running `mvn package` and adding additional tests.\n5. Push the branch up to GitHub\n6. Send a pull request to the yahoo/egads project.\n\nWe actively welcome contributions. If you don't know where to start, try\nchecking out the [issue list](https://github.com/yahoo/egads/issues) and\nfixing up the place. Or, you can add a model - a goal of this project\nis to have a robust, lightweight and dependency-free set of models to choose from that are ready to\nbe deployed in production.\n\nReferences\n============\n\u003ca href=\"https://s.yimg.com/ge/labs/v2/uploads/kdd2015.pdf\"\u003eGeneric and Scalable Framework for Automated Time-series Anomaly Detection\u003c/a\u003e by Nikolay Laptev, Saeed Amizadeh, Ian Flint , KDD 2015 (August 10, 2015)\n\nCitation\n============\nIf you use EGADS in your projects, please cite:\n\u003ca href=\"https://s.yimg.com/ge/labs/v2/uploads/kdd2015.pdf\"\u003eGeneric and Scalable Framework for Automated Time-series Anomaly Detection\u003c/a\u003e by Nikolay Laptev, Saeed Amizadeh, Ian Flint , KDD 2015\n\nBibTeX:\n\n```tex\n@inproceedings{laptev2015generic,\n\t\ttitle={Generic and Scalable Framework for Automated Time-series Anomaly Detection},\n\t\tauthor={Laptev, Nikolay and Amizadeh, Saeed and Flint, Ian},\n\t\tbooktitle={Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},\n\t\tpages={1939--1947},\n\t\tyear={2015},\n\t\torganization={ACM}\n}\n```\n\nLicense\n=======\n\nCode licensed under the GPL License. See LICENSE file for terms.\n","funding_links":[],"categories":["Anomaly Detection Software","Java","异常检测包","人工智能","Tools and Algorithms"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyahoo%2Fegads","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyahoo%2Fegads","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyahoo%2Fegads/lists"}