{"id":27984399,"url":"https://github.com/iMoonLab/Hyper-RAG","last_synced_at":"2025-05-08T05:01:57.056Z","repository":{"id":285915403,"uuid":"959734860","full_name":"iMoonLab/Hyper-RAG","owner":"iMoonLab","description":"\"Hyper-RAG: Combating LLM Hallucinations using Hypergraph-Driven Retrieval-Augmented Generation\" by Yifan Feng, Hao Hu, Xingliang Hou, Shiquan Liu, Shihui Ying, Shaoyi Du, Han Hu, and Yue Gao.","archived":false,"fork":false,"pushed_at":"2025-04-27T06:25:29.000Z","size":4857,"stargazers_count":93,"open_issues_count":3,"forks_count":9,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-27T07:26:29.964Z","etag":null,"topics":["graph-rag","llms","rag"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2504.08758","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iMoonLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-03T09:10:33.000Z","updated_at":"2025-04-27T06:25:32.000Z","dependencies_parsed_at":"2025-04-27T07:23:43.713Z","dependency_job_id":"30a71fbd-52e3-44bd-8ae6-8d550f69bc54","html_url":"https://github.com/iMoonLab/Hyper-RAG","commit_stats":null,"previous_names":["imoonlab/hyper-rag"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iMoonLab%2FHyper-RAG","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iMoonLab%2FHyper-RAG/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iMoonLab%2FHyper-RAG/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iMoonLab%2FHyper-RAG/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iMoonLab","download_url":"https://codeload.github.com/iMoonLab/Hyper-RAG/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253002856,"owners_count":21838640,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["graph-rag","llms","rag"],"created_at":"2025-05-08T05:01:49.336Z","updated_at":"2025-05-08T05:01:57.035Z","avatar_url":"https://github.com/iMoonLab.png","language":"Python","readme":"\u003c!-- \u003cdiv align=\"center\" id=\"top\"\u003e \n  \u003cimg src=\"./assets/hg.svg\" alt=\"Hypergraph\" width=\"100%\" /\u003e\n\u003c/div\u003e --\u003e\n\n\u003ch1 align=\"center\"\u003eHyper-RAG\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg alt=\"Github top language\" src=\"https://img.shields.io/github/languages/top/iMoonLab/Hyper-RAG?color=purple\"\u003e\n\n  \u003cimg alt=\"Github language count\" src=\"https://img.shields.io/github/languages/count/iMoonLab/Hyper-RAG?color=purple\"\u003e\n\n  \u003cimg alt=\"Repository size\" src=\"https://img.shields.io/github/repo-size/iMoonLab/Hyper-RAG?color=purple\"\u003e\n\n  \u003cimg alt=\"License\" src=\"https://img.shields.io/github/license/iMoonLab/Hyper-RAG?color=purple\"\u003e\n\n  \u003c!-- \u003cimg alt=\"Github issues\" src=\"https://img.shields.io/github/issues/iMoonLab/Hyper-RAG?color=purple\" /\u003e --\u003e\n\n  \u003c!-- \u003cimg alt=\"Github forks\" src=\"https://img.shields.io/github/forks/iMoonLab/Hyper-RAG?color=purple\" /\u003e --\u003e\n\n  \u003cimg alt=\"Github stars\" src=\"https://img.shields.io/github/stars/iMoonLab/Hyper-RAG?color=purple\" /\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#dart-about\"\u003eAbout\u003c/a\u003e \u0026#xa0; | \u0026#xa0; \n  \u003ca href=\"#sparkles-why-hyper-rag-is-more-powerful\"\u003eFeatures\u003c/a\u003e \u0026#xa0; | \u0026#xa0;\n  \u003ca href=\"#rocket-installation\"\u003eInstallation\u003c/a\u003e \u0026#xa0; | \u0026#xa0;\n  \u003ca href=\"#white_check_mark-quick-start\"\u003eQuick Start\u003c/a\u003e \u0026#xa0; | \u0026#xa0;\n  \u003ca href=\"#checkered_flag-evaluation\"\u003eEvaluation\u003c/a\u003e \u0026#xa0; | \u0026#xa0;\n  \u003ca href=\"#memo-license\"\u003eLicense\u003c/a\u003e \u0026#xa0; | \u0026#xa0;\n  \u003ca href=\"https://github.com/yifanfeng97\" target=\"_blank\"\u003eAuthor\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cbr\u003e\n\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./assets/many_llms_all.svg\" alt=\"Overall Performance\" width=\"100%\" /\u003e\n\u003c/div\u003e\n\nWe show that Hyper-RAG is a powerful RAG that can enhance the performance of various LLMs and outperform other SOTA RAG methods in the NeurologyCorp dataset. **Our paper is available at \u003ca href=\"https://arxiv.org/abs/2504.08758\"\u003ehere\u003c/a\u003e**.\n\n## :dart: About\n\n\u003cdetails\u003e\n\u003csummary\u003e \u003cb\u003eAbstract\u003c/b\u003e \u003c/summary\u003e\nLarge language models (LLMs) have transformed various sectors, including education, finance, and medicine, by enhancing content generation and decision-making processes. However, their integration into the medical field is cautious due to hallucinations, instances where generated content deviates from factual accuracy, potentially leading to adverse outcomes. To address this, we introduce Hyper-RAG, a hypergraph-driven Retrieval-Augmented Generation method that comprehensively captures both pairwise and beyond-pairwise correlations in domain-specific knowledge, thereby mitigating hallucinations. Experiments on the NeurologyCrop dataset with six prominent LLMs demonstrated that Hyper-RAG improves accuracy by an average of 12.3% over direct LLM use and outperforms Graph RAG and Light RAG by 6.3% and 6.0%, respectively. Additionally, Hyper-RAG maintained stable performance with increasing query complexity, unlike existing methods which declined. Further validation across nine diverse datasets showed a 35.5% performance improvement over Light RAG using a selection-based assessment. The lightweight variant, Hyper-RAG-Lite, achieved twice the retrieval speed and a 3.3\\% performance boost compared with Light RAG. These results confirm Hyper-RAG's effectiveness in enhancing LLM reliability and reducing hallucinations, making it a robust solution for high-stakes applications like medical diagnostics.\n\u003c/details\u003e\n\n\u003cbr\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./assets/fw.svg\" alt=\"Framework\" width=\"100%\" /\u003e\n\u003c/div\u003e\nSchematic diagram of the proposed Hyper-RAG architecture. a, The patient poses a question. b, A knowledge base is constructed from relevant domainspecific corpora. c, Responses are generated directly using LLMs. d, Hyper-RAG generates responses by first retrieving relevant prior knowledge from the knowledge base and then inputting this knowledge, along with the patient’s question, into the LLMs to formulate the reply.\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e \u003cb\u003eMore details about hypergraph modeling\u003c/b\u003e \u003c/summary\u003e\n\u003cdiv align=\"center\"\u003e \n  \u003cimg src=\"./assets/hg.svg\" alt=\"Hypergraph\" width=\"100%\" /\u003e\nExample of hypergraph modeling for entity space. Hypergraph can model the beyond-pairwise relationship among entities, which is more powerful than the pairwise relationship in traditional graph modeling. With hypergraphs, we can avoid the information loss caused by the pairwise relationship.\n\u003c/div\u003e\n\u003cbr\u003e\n\u003cdiv align=\"center\"\u003e \n  \u003cimg src=\"./assets/extract.svg\" alt=\"Extract Hypergraph\" width=\"100%\" /\u003e\n  Illustration of Entity and Correlation Extraction from Raw Corpus: Dark brown boxes represent entities, blue arrows denote low-order correlations between entities, and red arrows indicate high-order correlations. Yellow boxes contain the original descriptions of the respective entities or their correlations.\n\u003c/div\u003e\n\u003c/details\u003e\n\n\u003cbr\u003e\n\n## :sparkles: Why Hyper-RAG is More Powerful\n\n:heavy_check_mark: **Comprehensive Relationship Modeling with Hypergraphs**: Utilizes hypergraphs to thoroughly model the associations within the raw corpus data, providing more complex relationships compared to traditional graph-based data organization.;\\\n:heavy_check_mark: **Native Hypergraph-DB Integration**: Employs the native hypergraph database, \u003ca href=\"https://github.com/iMoonLab/Hypergraph-DB\"\u003eHypergraph-DB\u003c/a\u003e, as the foundation, supporting rapid retrieval of higher-order associations.;\\\n:heavy_check_mark: **Superior Performance**: Hyper-RAG outperforms Graph RAG and Light RAG by 6.3% and 6.0% respectively.;\\\n:heavy_check_mark: **Broad Validation**: Across nine diverse datasets, Hyper-RAG shows a 35.5% performance improvement over Light RAG based on a selection-based assessment.;\\\n:heavy_check_mark: **Efficiency**: The lightweight variant, Hyper-RAG-Lite, achieves twice the retrieval speed and a 3.3% performance boost compared to Light RAG.;\n\n## :rocket: Installation\n\n\n```bash\n# Clone this project\ngit clone https://github.com/iMoonLab/Hyper-RAG.git\n\n# Access\ncd Hyper-RAG\n\n# Install dependencies\npip install -r requirements.txt\n```\n\n## :white_check_mark: Quick Start\n\n### Configure your LLM API\nCopy the `config_temp.py` file to `my_config.py` in the root folder and set your LLM `URL` and `KEY`.\n\n```python\nLLM_BASE_URL = \"Yours xxx\"\nLLM_API_KEY = \"Yours xxx\"\nLLM_MODEL = \"gpt-4o-mini\"\n\nEMB_BASE_URL = \"Yours xxx\"\nEMB_API_KEY = \"Yours xxx\"\nEMB_MODEL = \"text-embedding-3-small\"\nEMB_DIM = 1536\n```\n\n### Run the toy example\n\n```bash\npython examples/hyperrag_demo.py\n```\n\n### Or Run by Steps\n\n1. Prepare the data. You can download the dataset from \u003ca href=\"https://cloud.tsinghua.edu.cn/d/187386488d5c404a83a5/\"\u003ehere\u003c/a\u003e. Put the dataset in the root direction. Then run the following command to preprocess the data.\n\n```bash\npython reproduce/Step_0.py\n```\n\n2. Build the knowledge hypergraphs, and entity and relation vector database with following command.\n\n```bash\npython reproduce/Step_1.py\n```\n\n3. Extract questions from the orignial datasets with following command.\n\n```bash\npython reproduce/Step_2_extract_question.py\n```\n\nThose questions are saved in the `cache/{{data_name}}/questions` folder. \n\n4. Run the Hyper-RAG to response those questions with following command.\n\n```bash\npython reproduce/Step_3_response_question.py\n```\n\nThose response are saved in the `cache/{{data_name}}/response` folder.\n\nYou can also change the `mode` parameter to `hyper` or `hyper-lite` to run the Hyper-RAG or Hyper-RAG-Lite.\n\n\n### Hypergraph Visualization\nWe provide a web-based visualization tool for hypergraphs and lightweight Hyper-RAG QA system. For more information, please refer to [Hyper-RAG Web-UI](./web-ui/README.md).\n\n*Note: The web UI is still under development and may not be fully functional. We welcome any contributions to improve it.*\n![vis-qa](./assets/vis-QA.jpg)\n![vis-hg](./assets/vis-hg.jpg)\n\n\n\n## :checkered_flag: Evaluation\nIn this work, we propose two evaluation strategys: the **selection-based** and **scoring-based** evaluation. \n\n### Scoring-based evaluation\nScoring-Based Assessment is designed to facilitate the comparative evaluation of multiple model outputs by quantifying their performance across various dimensions. This approach allows for a nuanced assessment of model capabilities by providing scores on several key metrics. However, a notable limitation is its reliance on reference answers. In our preprocessing steps, we leverage the source chunks from which each question is derived as reference answers.\n\nYou can use the following command to use this evaluation method.\n\n```bash\npython evaluate/evaluate_by_scoring.py\n```\nThe results of this evaluation are shown in the following figure.\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./assets/many_llms_sp.svg\" alt=\"Scoring-based evaluation\" width=\"90%\" /\u003e\n\u003c/div\u003e\n\n\n### Selection-based evaluation\nSelection-Based Assessment is tailored for scenarios where preliminary candidate models are available, enabling a comparative evaluation through a binary choice mechanism. This method does not require reference answers, making it suitable for diverse and open-ended questions. However, its limitation lies in its comparative nature, as it only allows for the evaluation of two models at a time.\n\nYou can use the following command to use this evaluation method.\n\n```bash\npython evaluate/evaluate_by_selection.py\n```\nThe results of this evaluation are shown in the following figure.\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./assets/multi_domain.svg\" alt=\"Selection-based evaluation\" width=\"90%\" /\u003e\n\u003c/div\u003e\n\n\n### Efficiency Analysis\nWe conducted an efficiency analysis of our Hyper-RAG method using GPT-4o mini on the NeurologyCrop dataset, comparing it with standard RAG, Graph RAG, and Light RAG. To ensure fairness by excluding network latency, we measured only the local retrieval time for relevant knowledge and the construction of the prior knowledge prompt. While standard RAG focuses on the direct retrieval of chunk embeddings, Graph RAG, Light RAG, and Hyper-RAG also include retrieval from node and correlation vector databases and the time for one layer of graph or hypergraph information diffusion. We averaged the response times over 50 questions from the dataset for each method. The results are shown in the following figure.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./assets/speed_all.svg\" alt=\"Efficiency analysis\" width=\"60%\" /\u003e\n\u003c/div\u003e\n\n## :memo: License\n\nThis project is under license from Apache 2.0. For more details, see the [LICENSE](LICENSE.md) file.\n\nHyper-RAG is maintained by [iMoon-Lab](http://moon-lab.tech/), Tsinghua University. \nMade with :heart: by \u003ca href=\"https://github.com/yifanfeng97\" target=\"_blank\"\u003eYifan Feng\u003c/a\u003e, \u003ca href=\"https://github.com/haoohu\" target=\"_blank\"\u003eHao Hu\u003c/a\u003e, \u003ca href=\"https://github.com/yifanfeng97\" target=\"_blank\"\u003eXingliang Hou\u003c/a\u003e, \u003ca href=\"https://github.com/yifanfeng97\" target=\"_blank\"\u003eShiquan Liu\u003c/a\u003e, \u003ca href=\"https://github.com/FuYou0723\" target=\"_blank\"\u003eYifan Zhang\u003c/a\u003e, \u003ca href=\"https://github.com/yuxizhe\" target=\"_blank\"\u003eXizhe Yu\u003c/a\u003e. \n\nIf you have any questions, please feel free to contact us via email: [Yifan Feng](mailto:evanfeng97@gmail.com). \n\nThis repo benefits from [LightRAG](https://github.com/HKUDS/LightRAG) and [Hypergraph-DB](https://github.com/iMoonLab/Hypergraph-DB).  Thanks for their wonderful works.\n\n\u0026#xa0;\n\n## 🌟Citation\n```\n@misc{feng2025hyperrag,\n      title={Hyper-RAG: Combating LLM Hallucinations using Hypergraph-Driven Retrieval-Augmented Generation}, \n      author={Yifan Feng and Hao Hu and Xingliang Hou and Shiquan Liu and Shihui Ying and Shaoyi Du and Han Hu and Yue Gao},\n      year={2025},\n      eprint={2504.08758},\n      archivePrefix={arXiv},\n      primaryClass={cs.IR},\n      url={https://arxiv.org/abs/2504.08758}, \n}\n```\n\n\u003ca href=\"#top\"\u003eBack to top\u003c/a\u003e\n","funding_links":[],"categories":["A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FiMoonLab%2FHyper-RAG","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FiMoonLab%2FHyper-RAG","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FiMoonLab%2FHyper-RAG/lists"}