{"id":46350259,"url":"https://github.com/microsoft/openrca","last_synced_at":"2026-03-04T23:00:54.161Z","repository":{"id":279262200,"uuid":"880717690","full_name":"microsoft/OpenRCA","owner":"microsoft","description":"[ICLR'25] OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?","archived":false,"fork":false,"pushed_at":"2026-02-24T13:13:41.000Z","size":4412,"stargazers_count":278,"open_issues_count":9,"forks_count":36,"subscribers_count":5,"default_branch":"main","last_synced_at":"2026-03-04T15:58:58.038Z","etag":null,"topics":["benchmark","large-language-models","llm","llm-agent","rca","root-cause-analysis","software-engineering"],"latest_commit_sha":null,"homepage":"https://aka.ms/openrca","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":"SUPPORT.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-10-30T08:21:05.000Z","updated_at":"2026-03-02T16:15:05.000Z","dependencies_parsed_at":"2026-03-04T23:00:46.170Z","dependency_job_id":null,"html_url":"https://github.com/microsoft/OpenRCA","commit_stats":null,"previous_names":["microsoft/openrca"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/microsoft/OpenRCA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FOpenRCA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FOpenRCA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FOpenRCA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FOpenRCA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microsoft","download_url":"https://codeload.github.com/microsoft/OpenRCA/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FOpenRCA/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30098078,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-04T22:49:54.894Z","status":"ssl_error","status_checked_at":"2026-03-04T22:49:48.883Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","large-language-models","llm","llm-agent","rca","root-cause-analysis","software-engineering"],"created_at":"2026-03-04T23:00:29.755Z","updated_at":"2026-03-04T23:00:54.156Z","avatar_url":"https://github.com/microsoft.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OpenRCA\n\n![Python Version](https://img.shields.io/badge/Python-3776AB?\u0026logo=python\u0026logoColor=white-blue\u0026label=3.10%20%7C%203.11)\u0026ensp;\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\u0026ensp;\n![Welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)\n\n\u003c/div\u003e\n\nOpenRCA is a benchmark for assessing LLMs' root cause analysis ability in a software operating scenario. When given a natural language query, LLMs need to analyze large volumes of telemetry data to identify the relevant root cause elements. This process requires the models to understand complex system dependencies and perform comprehensive reasoning across various types of telemetry data, including KPI time series, dependency trace graphs, and semi-structured log text.\n\n\u003cimg src=\"./.asset/openrca.png\"/\u003e \n\n\u003c/div\u003e\n\nWe also introduce RCA-agent as a baseline for OpenRCA. By using Python for data retrieval and analysis, the model avoids processing overly long contexts, enabling it to focus on reasoning and scalable for extensive telemetry.\n\n\u003cimg src=\"./.asset/rcaagent.png\"/\u003e \n\n## ✨ Quick Start\n\n\u003e ⚠️ Since the OpenRCA dataset includes a large amount of telemetry and RCA-agent requires extensive memory operations, we recommend using a device with at least 80GB of storage space and 32GB of memory.\n\n### 🛠️ Installation\n\nOpenRCA requires **Python \u003e= 3.10**. It can be installed by running the following command:\n```bash\n# [optional to create conda environment]\n# conda create -n openrca python=3.10\n# conda activate openrca\n\n# clone the repository\ngit clone https://github.com/microsoft/OpenRCA.git\ncd OpenRCA\n# install the requirements\npip install -r requirements.txt\n```\n\nThe telemetry data can be download from [Google Drive](https://drive.google.com/drive/folders/1wGiEnu4OkWrjPxfx5ZTROnU37-5UDoPM?usp=drive_link). Once you have download the telemetry dataset, please put them into the path `dataset/` (which is empty now).\n\nThe directory structure of the data is:\n\n```\n.\n├── {SYSTEM}\n│   ├── query.csv\n│   ├── record.csv\n│   └── telemetry\n│       ├── {DATE}\n│       │   ├── log\n│       │   ├── metric\n│       │   └── trace\n│       └── ... \n└── ...\n```\n\nwhere the `{SYSTEM}` can be `Telecom`, `Bank`, or `Market`, and the `{DATE}` format is `{YYYY_MM_DD}`.\n\n### 🖊️ Evaluation\n\nUsing following command to evaluate:\n\n```bash\npython -m main.evaluate \\\n    -p [prediction csv files to evaluate] \\\n    -q [groundtruth csv files to evaluate] \\\n    -r [report csv file to save]\n```\n\nNote that the prediction CSV file must include at least a \"prediction\" field for valid evaluation (extra fields are allowed). Each prediction should be a JSON-like string containing all required elements for each query (extra elements are allowed). If there are multiple failures, list them in chronological order (e.g., 1, 2, 3, ...):\n\n\n```json\n{\n    \"1\": {\n        \"root cause occurrence datetime\": \"[%Y-%m-%d %H:%M:%S]\",\n        \"root cause component\": \"[COMPONENT]\",\n        \"root cause reason\": \"[REASON]\"\n    }, \n    ...\n}\n```\n\nFor example, to evaluate the archived predictions of RCA-agent (Claude ver.), you can use the following command:\n\n```bash\npython -m main.evaluate \\\n    -p \\\n        rca/archive/agent-Bank.csv \\\n        rca/archive/agent-Market-cloudbed-1.csv \\\n        rca/archive/agent-Market-cloudbed-2.csv \\\n        rca/archive/agent-Telecom.csv \\\n    -q \\\n        dataset/Bank/query.csv \\\n        dataset/Market/cloudbed-1/query.csv \\\n        dataset/Market/cloudbed-2/query.csv \\\n        dataset/Telecom/query.csv \\\n    -r \\\n        test/agent_claude.csv\n```\n\n### 🚩 Reproduction\n\nTo reproduce results in the paper, please first setup your API configurations before running OpenRCA's baselines. Taking OpenAI as an example, you can configure `rca/api_config.yaml` file as follows:\n\n```yaml\nSOURCE:   \"OpenAI\"\nMODEL:    \"gpt-4o-2024-05-13\"\nAPI_KEY:  \"sk-xxxxxxxxxxxxxx\"\n```\n\nThen, run the following commands for result reproduction:\n\n```bash\npython -m rca.{TESTS} --dataset {DATASET_NAME}\n# Optional tests: run_agent_standard, run_baseline_balanced, run_baseline_oracle\n# Optional datasets: Telecom, Bank, Market/cloudbed-1, Market/cloudbed-2\n```\n\nFor example, if you want to evaluate RCA-agent on Bank dataset, you should use the following command:\n\n```bash\npython -m rca.run_agent_standard --dataset Bank\n```\n\nNote that the telemetry of two Market cloudbed service group are collected separately. For example, if you want to evaluate RCA-agent on the whole Market dataset, you should use the following command:\n\n```bash\npython -m rca.run_agent_standard --dataset Market/cloudbed-1\npython -m rca.run_agent_standard --dataset Market/cloudbed-2\n```\n\nThe generated results and monitor files can be found in a new `test` directory created after running any test script.\n\n### 💽 Reconstruction\n\nYou can generate new task for OpenRCA telemetry or your own privacy telemetry by modifying `main/task_specification.json` and run the following command:\n\n```bash\npython -m main.generate \\\n    -s [your specification config file] \\\n    -r [record file to generate query] \\\n    -q [query file to save] \\\n    -t [timezone of telemetry]\n```\n\nNote that the record schema should be consistent with the `record.csv` of OpenRCA.\n\nYou can also re-generate random queries of OpenRCA with the following command:\n\n```bash\npython -m main.generate -d True\n```\n\n## 📚 Citation\n\nIf you use OpenRCA in your research, please cite our paper:\n\n```bibtex\n@inproceedings{\nxu2025openrca,\ntitle={OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?},\nauthor={Xu, Junjielong and Zhang, Qinan and Zhong, Zhiqing and He, Shilin and Zhang, Chaoyun and Lin, Qingwei and Pei, Dan and He, Pinjia and Zhang, Dongmei and Zhang, Qi},\nbooktitle={The Thirteenth International Conference on Learning Representations},\nyear={2025},\nurl={https://openreview.net/forum?id=M4qNIzQYpd}\n}\n```\n\n## Trademarks\n\nThis project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft \ntrademarks or logos is subject to and must follow \n[Microsoft's Trademark \u0026 Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).\nUse of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.\nAny use of third-party trademarks or logos are subject to those third-party's policies.\n\n## Disclaimer\nThe recommended models in this Repo are just examples, used to explore the potential of agent systems with the paper at ICLR2025. Users can replace the models in this Repo according to their needs. When using the recommended models in this Repo, you need to comply with the licenses of these models respectively. Microsoft shall not be held liable for any infringement of third-party rights resulting from your usage of this repo. Users agree to defend, indemnify and hold Microsoft harmless from and against all damages, costs, and attorneys' fees in connection with any claims arising from this Repo. If anyone believes that this Repo infringes on your rights, please notify the project owner email.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Fopenrca","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrosoft%2Fopenrca","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Fopenrca/lists"}