{"id":14296595,"url":"https://github.com/amazon-science/RAGChecker","last_synced_at":"2025-08-15T16:32:26.763Z","repository":{"id":246190830,"uuid":"819435778","full_name":"amazon-science/RAGChecker","owner":"amazon-science","description":"RAGChecker: A Fine-grained Framework For Diagnosing RAG","archived":false,"fork":false,"pushed_at":"2024-08-20T10:51:58.000Z","size":3138,"stargazers_count":186,"open_issues_count":5,"forks_count":11,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-08-21T13:11:05.545Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amazon-science.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-24T13:53:07.000Z","updated_at":"2024-08-21T13:05:32.000Z","dependencies_parsed_at":"2024-08-20T13:07:20.429Z","dependency_job_id":null,"html_url":"https://github.com/amazon-science/RAGChecker","commit_stats":null,"previous_names":["amazon-science/ragchecker"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2FRAGChecker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2FRAGChecker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2FRAGChecker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2FRAGChecker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amazon-science","download_url":"https://codeload.github.com/amazon-science/RAGChecker/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":216692944,"owners_count":16065570,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-23T22:01:36.002Z","updated_at":"2024-12-16T08:30:20.864Z","avatar_url":"https://github.com/amazon-science.png","language":"Python","readme":"# RAGChecker: A Fine-grained Framework For Diagnosing RAG\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://arxiv.org/pdf/2408.08067\"\u003eRAGChecker Paper\u003c/a\u003e \u0026nbsp\u0026nbsp | \u0026nbsp\u0026nbsp \u003ca href=\"./tutorial/ragchecker_tutorial_en.md\"\u003eTutorial (English)\u003c/a\u003e \u0026nbsp\u0026nbsp ｜ \u0026nbsp\u0026nbsp \u003ca href=\"./tutorial/ragchecker_tutorial_zh.md\"\u003e中文教程\u003c/a\u003e\n\u003c/p\u003e\n\nRAGChecker is an advanced automatic evaluation framework designed to assess and diagnose Retrieval-Augmented Generation (RAG) systems. It provides a comprehensive suite of metrics and tools for in-depth analysis of RAG performance.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"imgs/ragchecker_metrics.png\" alt=\"RefChecker Metrics\" \n  style=\"width:800px\"\u003e\n  \u003cbr\u003e\n  \u003cb\u003eFigure\u003c/b\u003e: RAGChecker Metrics\n\u003c/p\u003e\n\n## 🌟 Highlighted Features\n\n- **Holistic Evaluation**: RAGChecker offers `Overall Metrics` for an assessment of the entire RAG pipeline.\n\n- **Diagnostic Metrics**: `Diagnostic Retriever Metrics` for analyzing the retrieval component. `Diagnostic Generator Metrics` for evaluating the generation component. These metrics provide valuable insights for targeted improvements.\n\n- **Fine-grained Evaluation**: Utilizes `claim-level entailment` operations for fine-grained evaluation.\n\n- **Benchmark Dataset**: A comprehensive RAG benchmark dataset with 4k questions covering 10 domains (upcoming).\n\n- **Meta-Evaluation**: A human-annotated preference dataset for evaluating the correlations of RAGChecker's results with human judgments.\n\nRAGChecker empowers developers and researchers to thoroughly evaluate, diagnose, and enhance their RAG systems with precision and depth.\n\n\n## 🔥 News\n- [08/16/2024] RAGChecker paper is on arXiv: https://arxiv.org/pdf/2408.08067\n\n\n## ❤️ Citation\nRAGChecker paper: https://arxiv.org/pdf/2408.08067\n\nIf you use RAGChecker in your work, please cite us:\n```bibtex\n@misc{ru2024ragcheckerfinegrainedframeworkdiagnosing,\n      title={RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation}, \n      author={Dongyu Ru and Lin Qiu and Xiangkun Hu and Tianhang Zhang and Peng Shi and Shuaichen Chang and Jiayang Cheng and Cunxiang Wang and Shichao Sun and Huanyu Li and Zizhao Zhang and Binjie Wang and Jiarong Jiang and Tong He and Zhiguo Wang and Pengfei Liu and Yue Zhang and Zheng Zhang},\n      year={2024},\n      eprint={2408.08067},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https://arxiv.org/abs/2408.08067}, \n}\n```\n\n## 🚀 Quick Start\n\n### Setup Environment\n\n```bash\npip install ragchecker\npython -m spacy download en_core_web_sm\n```\n\n\n### Run the Checking Pipeline with CLI\n\nPlease process your own data with the same format as [examples/checking_inputs.json](./examples/checking_inputs.json). The only required annotation for each query is the `ground truth answer (gt_answer)`.\n\n```json\n{\n  \"results\": [\n    {\n      \"query_id\": \"\u003cquery id\u003e\", # string\n      \"query\": \"\u003cinput query\u003e\", # string\n      \"gt_answer\": \"\u003cground truth answer\u003e\", # string\n      \"response\": \"\u003cresponse generated by the RAG generator\u003e\", # string\n      \"retrieved_context\": [ # a list of retrieved chunks by the retriever\n        {\n          \"doc_id\": \"\u003cdoc id\u003e\", # string, optional\n          \"text\": \"\u003ccontent of the chunk\u003e\" # string\n        },\n        ...\n      ]\n    },\n    ...\n  ]\n}\n```\n\nIf you are using AWS Bedrock version of Llama3 70B for the claim extractor and checker, use the following command to run the checking pipeline, the checking results as well as intermediate results will be saved to `--output_path`:\n\n\n```bash\nragchecker-cli \\\n    --input_path=examples/checking_inputs.json \\\n    --output_path=examples/checking_outputs.json \\\n    --extractor_name=bedrock/meta.llama3-1-70b-instruct-v1:0 \\\n    --checker_name=bedrock/meta.llama3-1-70b-instruct-v1:0 \\\n    --batch_size_extractor=64 \\\n    --batch_size_checker=64 \\\n    --metrics all_metrics \\\n    # --disable_joint_check  # uncomment this line for one-by-one checking, slower but slightly more accurate\n```\n\nPlease refer to [RefChecker's guidance](https://github.com/amazon-science/RefChecker/tree/main?tab=readme-ov-file#choose-models-for-the-extractor-and-checker) for setting up the extractor and checker models.\n\nIt will output the values for the metrics like follows:\n\n```json\nResults for examples/checking_outputs.json:\n{\n  \"overall_metrics\": {\n    \"precision\": 73.3,\n    \"recall\": 62.5,\n    \"f1\": 67.3\n  },\n  \"retriever_metrics\": {\n    \"claim_recall\": 61.4,\n    \"context_precision\": 87.5\n  },\n  \"generator_metrics\": {\n    \"context_utilization\": 87.5,\n    \"noise_sensitivity_in_relevant\": 22.5,\n    \"noise_sensitivity_in_irrelevant\": 0.0,\n    \"hallucination\": 4.2,\n    \"self_knowledge\": 25.0,\n    \"faithfulness\": 70.8\n  }\n}\n```\n\n### Run the Checking Pipeline with Python\n```python\nfrom ragchecker import RAGResults, RAGChecker\nfrom ragchecker.metrics import all_metrics\n\n\n# initialize ragresults from json/dict\nwith open(\"examples/checking_inputs.json\") as fp:\n    rag_results = RAGResults.from_json(fp.read())\n\n# set-up the evaluator\nevaluator = RAGChecker(\n    extractor_name=\"bedrock/meta.llama3-1-70b-instruct-v1:0\",\n    checker_name=\"bedrock/meta.llama3-1-70b-instruct-v1:0\",\n    batch_size_extractor=32,\n    batch_size_checker=32\n)\n\n# evaluate results with selected metrics or certain groups, e.g., retriever_metrics, generator_metrics, all_metrics\nevaluator.evaluate(rag_results, all_metrics)\nprint(rag_results)\n\n\"\"\"Output\nRAGResults(\n  2 RAG results,\n  Metrics:\n  {\n    \"overall_metrics\": {\n      \"precision\": 76.4,\n      \"recall\": 62.5,\n      \"f1\": 68.3\n    },\n    \"retriever_metrics\": {\n      \"claim_recall\": 61.4,\n      \"context_precision\": 87.5\n    },\n    \"generator_metrics\": {\n      \"context_utilization\": 87.5,\n      \"noise_sensitivity_in_relevant\": 19.1,\n      \"noise_sensitivity_in_irrelevant\": 0.0,\n      \"hallucination\": 4.5,\n      \"self_knowledge\": 27.3,\n      \"faithfulness\": 68.2\n    }\n  }\n)\n\"\"\"\n```\n\n## Meta-Evaluation\n\nPlease refer to [data/meta_evaluation](./data/meta_evaluation/README.md) on meta-evaluation for the effectiveness of RAGChecker.\n\n## Work with LlamaIndex\n\nRAGChecker now integrates with LlamaIndex, providing a powerful evaluation tool for RAG applications built with LlamaIndex. For detailed instructions on how to use RAGChecker with LlamaIndex, please refer to the [LlamaIndex documentation on RAGChecker integration](https://docs.llamaindex.ai/en/latest/examples/evaluation/RAGChecker/). This integration allows LlamaIndex users to leverage RAGChecker's comprehensive metrics to evaluate and improve their RAG systems.\n\n## Security\n\nSee [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.\n\n## License\n\nThis project is licensed under the Apache-2.0 License.\n\n","funding_links":[],"categories":["others","Evaluation and Monitoring","A01_文本生成_文本对话","2024.08"],"sub_categories":["大语言对话模型及数据","RAGChecker【质检员】"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2FRAGChecker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famazon-science%2FRAGChecker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2FRAGChecker/lists"}