{"id":13473168,"url":"https://github.com/shaido987/riskloc","last_synced_at":"2025-04-05T12:04:20.969Z","repository":{"id":40644118,"uuid":"306542106","full_name":"shaido987/riskloc","owner":"shaido987","description":"Implementation of RiskLoc, a method for localizing multi-dimensional root causes.","archived":false,"fork":false,"pushed_at":"2024-10-16T01:49:05.000Z","size":2228,"stargazers_count":127,"open_issues_count":6,"forks_count":22,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-29T11:06:09.252Z","etag":null,"topics":["adtributor","autoroot","hotspot","multi-dimensional","rca","riskloc","root-cause","root-cause-analysis","root-cause-location","squeeze","time-series"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shaido987.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-10-23T05:54:55.000Z","updated_at":"2025-03-26T09:29:25.000Z","dependencies_parsed_at":"2024-01-13T18:23:55.284Z","dependency_job_id":"df0c0e19-d675-4ed1-928b-4694d4f32283","html_url":"https://github.com/shaido987/riskloc","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shaido987%2Friskloc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shaido987%2Friskloc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shaido987%2Friskloc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shaido987%2Friskloc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shaido987","download_url":"https://codeload.github.com/shaido987/riskloc/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247332560,"owners_count":20921853,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["adtributor","autoroot","hotspot","multi-dimensional","rca","riskloc","root-cause","root-cause-analysis","root-cause-location","squeeze","time-series"],"created_at":"2024-07-31T16:01:01.367Z","updated_at":"2025-04-05T12:04:20.939Z","avatar_url":"https://github.com/shaido987.png","language":"Python","funding_links":[],"categories":["Python","AI for *Ops"],"sub_categories":["Observability \u0026 Monitoring with AI"],"readme":"# RiskLoc\nThis repository contains code for the paper [RiskLoc: Localization of Multi-dimensional Root Causes by Weighted Risk](https://arxiv.org/abs/2205.10004). Both the implementation of RiskLoc itself and all baseline multi-dimensional root cause localization methods in the paper are included, as well as the code to generate synthetic datasets as described in the paper.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"736\" alt=\"architecture\" src=\"https://github.com/shaido987/riskloc/assets/1130029/c9b8d791-ac94-4edc-b70f-8555467b6c2a\"\u003e\n\u003c/p\u003e\n\n**Short problem description:**  \nRiskLoc solves the problem of identifying the root cause of an anomaly occurring in a time series with multi-dimensional attributes. These types of time series can be regarded as aggregations (the total sum in the simplest case) of numerous underlying, more fine-grained, time series.   \n\nFor example, a time series T with 2 dimensions (d1 and d2), each with 3 possible values: \n- d1: [a, b, c]\n- d2: [d, e, f]\n\nis built up of 9 fine-grained time series (two examples of these are the time series corresponding to {d1: a, d2: d} and {d1: b, d2: f}). \n\nThe goal is to find the specific dimension and dimensional values (the elements) of the root cause when an error occurs in the fully aggregated time series T. This is a search problem where any combination of dimensions and values are considered, and there can be multiple elements in the fanal root cause set. For the example time series above, one potential root cause set can be {{d1: a, d2: [d, e]}, {d1: b, d2: e}}. Since any combination and any number of elements needs to be considered, the total search space is huge which is the main challenge.\n\n## Requirements\n- pandas\n- numpy\n- scipy\n- kneed (for squeeze)\n- loguru (for squeeze)\n- pyyaml\n\n## How to run\n\nTo run, use the `run.py` file. There are a couple of options, either to use a single file or to run all files in a directory (including all subdirectories).\n\nExample of running a single file using RiskLoc in debug mode:\n```\npython run.py riskloc --run-path /data/B0/B_cuboid_layer_1_n_ele_1/1450653900.csv --debug\n```\n\nExample of running all files in a particular setting for a dataset (setting derived to True):\n```\npython run.py riskloc --run-path /data/D/B_cuboid_layer_3_n_ele_3 --derived\n```\n\nExample of running all files in a dataset:\n```\npython run.py riskloc --run-path /data/B0\n```\n\nExample of running all datasets with 20 threads:\n```\npython run.py riskloc --n-threads 20\n```\n\nChanging `riskloc` to any of the supported algorithms will run those instead, see below.\n\n## Algorithms \nImplemented algorithms: RiskLoc, AutoRoot, RobustSpot, Squeeze, HotSpot, and Adtributor (normal and recursive).\n\nThey can be run by specifying the algorithm name as the first input parameter to the `run.py` file:\n```\n$ python run.py --help\nusage: run.py [-h] {riskloc,autoroot,squeeze,old squeeze,hotspot,r_adtributor,adtributor} ...\n\nRiskLoc\n\npositional arguments: {riskloc,autoroot,robustspot,squeeze,hotspot,r_adtributor,adtributor}\n\n                        algorithm specific help\n    riskloc             riskloc help\n    autoroot            autoroot help\n    robustspot          robustspot help\n    squeeze             squeeze help\n    hotspot             autoroot help\n    r_adtributor        r_adtributor help\n    adtributor          adtributor help\n\noptional arguments:\n  -h, --help            show this help message and exit\n```\nThe code for Squeeze is adapted from the released code from the original publication: https://github.com/NetManAIOps/Squeeze.\nThe code for RobustSpot is similarly adapted from their recently released code: https://github.com/robustspotproject/RobustSpot.\n\nTo see the algorithm-specific arguments run: `python run.py 'algorithm' --help`. For example, for RiskLoc: \n```\n$ python run.py riskloc --help\nusage: run.py riskloc [-h] [--data-root DATA_ROOT] [--run-path RUN_PATH] [--derived [DERIVED]] [--n-threads N_THREADS] [--output-suffix OUTPUT_SUFFIX] [--debug [DEBUG]] [--risk-threshold RISK_THRESHOLD] [--pep-threshold PEP_THRESHOLD] [--n-remove N_REMOVE] [--remove-relative [REMOVE_RELATIVE]] [--prune-elements [PRUNE_ELEMENTS]]\n\noptions:\n  -h, --help                           show this help message and exit\n  --data-root DATA_ROOT                root directory for all datasets (default ./data)\n  --run-path RUN_PATH                  directory or file to be run; if a directory, any subdirectories will be considered as well;\n                                       must contain data-path as a prefix\n  --derived [DERIVED]                  derived dataset (defaults to True for the D and RS datasets and False for others)\n  --n-threads N_THREADS                number of threads to run\n  --output-suffix OUTPUT_SUFFIX        suffix for output csv file\n  --debug [DEBUG]                      debug mode\n  --risk-threshold RISK_THRESHOLD      risk threshold\n  --pep-threshold PEP_THRESHOLD        proportional explanatory power threshold\n  --n-remove N_REMOVE                  number of elements to ignore when computing the cutoff point\n  --remove-relative [REMOVE_RELATIVE]  if true then n_remove is a percentage value\n  --prune-elements [PRUNE_ELEMENTS]    use element pruning (True/False)\n```\n\nThe `risk-threshold` and below arguments are specific for the RiskLoc while the rest are shared by all algorithms. To see the algorithm-specific arguments for other algorithms simply run them with the `--help` flag or check the code in `run.py`.\n\n## Datasets\nThe real-world dataset with derived measures from RobustSpot (RS) is already present in the data folder and can be used immediately.\n\nThe semi-synthetic datasets can be downloaded from: https://github.com/NetManAIOps/Squeeze.\nTo run these, place them within the data/ directory and name them: A, B0, B1, B2, B3, B4, and D, respectively.\n\nThe three synthetic datasets used in the paper can be generated using `generate_dataset.py` as follows.\n\nS dataset:\n```\npython generate_dataset.py --num 1000 --dataset-name S --seed 121\n```\nL dataset:\n```\npython generate_dataset.py --num 1000 --dataset-name L --seed 122 --dims 10 24 10 15 --noise-level 0.0 0.1 --anomaly-severity 0.5 1.0 --anomaly-deviation 0.0 0.0 --num-anomaly 1 5 --num-anomaly-elements 1 1 --only-last-layer\n```\nH dataset:\n```\npython generate_dataset.py --num 100 --dataset-name H --seed 123 --dims 10 5 250 20 8 12\n```\n\nIn addition, new, interesting datasets can be created using `generate_dataset.py` for extended empirical verification and research purposes. Supported input arguments can be found at the beginning of the `generate_dataset.py` file or using the `--help` flag. \n\n## Citation\nIf you find this code useful, please cite the following paper:\n\n```\n@article{riskloc,\n  title={RiskLoc: Localization of Multi-dimensional Root Causes by Weighted Risk},\n  author={Kalander, Marcus},\n  journal={arXiv preprint arXiv:2205.10004},\n  year={2022}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshaido987%2Friskloc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshaido987%2Friskloc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshaido987%2Friskloc/lists"}