{"id":27984400,"url":"https://github.com/IAAR-Shanghai/SafeRAG","last_synced_at":"2025-05-08T05:01:57.542Z","repository":{"id":278128611,"uuid":"923006471","full_name":"IAAR-Shanghai/SafeRAG","owner":"IAAR-Shanghai","description":null,"archived":false,"fork":false,"pushed_at":"2025-03-11T03:50:27.000Z","size":19848,"stargazers_count":26,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-06T20:36:15.322Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IAAR-Shanghai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-27T13:38:35.000Z","updated_at":"2025-04-02T17:16:41.000Z","dependencies_parsed_at":"2025-04-06T20:42:25.889Z","dependency_job_id":null,"html_url":"https://github.com/IAAR-Shanghai/SafeRAG","commit_stats":null,"previous_names":["iaar-shanghai/saferag"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IAAR-Shanghai%2FSafeRAG","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IAAR-Shanghai%2FSafeRAG/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IAAR-Shanghai%2FSafeRAG/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IAAR-Shanghai%2FSafeRAG/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IAAR-Shanghai","download_url":"https://codeload.github.com/IAAR-Shanghai/SafeRAG/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253002856,"owners_count":21838640,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-08T05:01:49.487Z","updated_at":"2025-05-08T05:01:57.475Z","avatar_url":"https://github.com/IAAR-Shanghai.png","language":"Python","funding_links":[],"categories":["A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"readme":"\u003cdiv align=\"center\"\u003e\u003ch2\u003e\n\u003cimg src=\"https://github.com/IAAR-Shanghai/SafeRAG/blob/main/assets/sfr_logo.png\" alt=\"sfr_logo\" width=23px\u003e SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model\u003c/h2\u003e\u003c/div\u003e\n\n\n\u003cp align=\"center\"\u003e\n\u003c!-- arXiv badge with a more vibrant academic red --\u003e\n\u003ca href=\"https://arxiv.org/abs/2501.18636\"\u003e\n\u003cimg src=\"https://img.shields.io/badge/arXiv-Paper-B31B1B?style=flat-square\u0026logo=arxiv\u0026logoColor=white\"\u003e\n\u003c/a\u003e\n\u003c!-- Github badge with clean dark color --\u003e\n\u003ca href=\"https://github.com/IAAR-Shanghai/SafeRAG\"\u003e\n\u003cimg src=\"https://img.shields.io/badge/Github-Code-181717?style=flat-square\u0026logo=github\u0026logoColor=white\"\u003e\n\u003c/a\u003e\n\u003c!-- Huggingface Collection badge with more dynamic orange --\u003e\n\u003ca href=\"https://huggingface.co/collections/zaown/saferag-67b422f4072adee12c784bde\"\u003e\n\u003cimg src=\"https://img.shields.io/badge/Huggingface-Collection-FF6F00?style=flat-square\u0026logo=huggingface\u0026logoColor=white\"\u003e\n\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n\u003cp\u003e\nXun Liang\u003csup\u003e1,*\u003c/sup\u003e,\n\u003ca href=\"https://github.com/siminniu\"\u003eSimin Niu\u003c/a\u003e\u003csup\u003e1,*\u003c/sup\u003e,\n\u003ca href=\"\"\u003eZhiyu Li\u003c/a\u003e\u003csup\u003e2,†\u003c/sup\u003e,\nSensen Zhang\u003csup\u003e1\u003c/sup\u003e, Hanyu Wang\u003csup\u003e1\u003c/sup\u003e, Feiyu Xiong\u003csup\u003e2\u003c/sup\u003e, Jason Zhaoxin Fan\u003csup\u003e3\u003c/sup\u003e, \nBo Tang\u003csup\u003e2\u003c/sup\u003e, Shichao Song\u003csup\u003e1\u003c/sup\u003e, Mengwei Wang\u003csup\u003e1\u003c/sup\u003e, Jiawei Yang\u003csup\u003e1\u003c/sup\u003e\n\u003c/p\u003e\n\u003cp\u003e\n\u003csup\u003e1\u003c/sup\u003e\u003ca href=\"https://en.ruc.edu.cn/\"\u003eRenmin University of China\u003c/a\u003e, \n\u003csup\u003e2\u003c/sup\u003e\u003ca href=\"https://www.iaar.ac.cn/\"\u003eInstitute for Advanced Algorithms Research, Shanghai\u003c/a\u003e,\n\u003csup\u003e3\u003c/sup\u003e\u003ca href=\"https://www.buaa.edu.cn/\"\u003eBeihang University\u003c/a\u003e\n\u003c/p\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\u003ch5\u003eFor business inquiries, please contact us at \u003ca href=\"mailto:lizy@iaar.ac.cn\"\u003elizy@iaar.ac.cn\u003c/a\u003e.\u003c/h5\u003e\n\u003c/div\u003e\n\n\n**🎯 Who Should Pay Attention to Our Work?**\n\n- **Exploring attacks on RAG systems?** SafeRAG introduces a **Threat Framework** that executes **Noise, Conflict, Toxicity, and Denial-of-Service (DoS) attacks** at various stages of the **RAG Pipeline**, aiming to **bypass RAG security components as effectively as possible and exploit its vulnerabilities**.\n- **Developing robust and trustworthy RAG systems?** Our benchmark provides a new **Security Evaluation Framework** to test defenses and reveals systemic weaknesses in the **RAG Pipeline**.\n- **Shaping RAG security policies?** SafeRAG provides **empirical evidence** of how **Data Injection** attacks can impact AI reliability.\n\n\u003e \\[!Tip\\]\n\u003e - Security evaluation datasets may become outdated or ineffective over time, diminishing their evaluation value. Therefore, until the attack tasks proposed in this paper are adequately addressed, we will continuously update the datasets to ensure their effectiveness.\n\u003e - Each attack task in the SafeRAG dataset contains only about 100 evaluation data points, but it covers a wide range of topics, ensuring comprehensive assessment. This lightweight design not only helps reduce the evaluation cost for users but also enables a fast and cost-effective testing experience.\n\n## :loudspeaker: News\n- **[2025/01]** We released SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model.\n\n## Overview\n\u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"https://github.com/IAAR-Shanghai/SafeRAG/blob/main/assets/ACL_8pgs_Page3.png\" alt=\"SafeRAG\" width=\"93%\"\u003e\n\u003c/div\u003e\n\n\u003cdetails\u003e\u003csummary\u003eAbstract\u003c/summary\u003e\nRetrieval-Augmented Generation (RAG) seamlessly integrates advanced retrieval and generation techniques, making it particularly well-suited for high-stakes domains such as law, healthcare, and finance, where factual accuracy is paramount. This approach significantly enhances the professional applicability of large language models (LLMs).  \n\nBut is RAG truly secure? **Clearly, attackers can manipulate the data flow at any stage of the RAG pipeline**—including **indexing, retrieval, and filtering**—by injecting malicious, low-quality, misleading, or incorrect texts into **knowledge bases, retrieved contexts, and filtered contexts**. These adversarial modifications **indirectly influence the LLM’s outputs**, potentially compromising its reliability.  \n\n**SafeRAG systematically evaluates the security vulnerabilities of RAG components from both retrieval and generation perspectives.** Experiments conducted on **14 mainstream RAG components** reveal that **most RAG systems fail to effectively defend against data injection attacks**. Attackers can **manipulate the data flow within the RAG pipeline**, deceiving the model into generating **low-quality, inaccurate, or misleading content**, and in some cases, even **inducing a denial-of-service (DoS) response**.\n\u003c/details\u003e\n\nWe summarize our primary contributions as follows:\n\n- We reveal four attack tasks capable of bypassing the **retriever**, **filter**, and **generator**. For each attack task, we develop a lightweight RAG security evaluation dataset, primarily constructed by humans with LLM assistance.\n- We propose an economical, efficient, and accurate RAG security evaluation framework that incorporates attack-specific metrics, which are highly consistent with human judgment.\n- We introduce the first Chinese RAG security benchmark, SafeRAG, which analyzes the risks posed to the **retriever** and **generator** by the injection of **Noise**, **Conflict**, **Toxicity**, and **DoS** at various stages of the RAG pipeline.\n\n## Quick Start\n- Install dependency packages\n```bash\npip install -r requirements.txt\n```\n\n- Start the milvus-lite service (vector database)\n```bash\nmilvus-server\n```\n\n- Download the bge-base-zh-v1.5 model to the /path/to/your/bge-base-zh-v1.5/ directory\n\n- Modify config.py according to your need.\n\n- Run quick_start_nctd.py\n\n```bash\npython quick_start_nctd.py \\\n  --retriever_name 'bm25' \\\n  --retrieve_top_k 6 \\\n  --filter_module 'off' \\\n  --model_name 'gpt-3.5-turbo' \\\n  --quest_eval_model 'deepseek-chat' \\\n  --attack_task 'SN' \\\n  --attack_module 'indexing' \\\n  --attack_intensity 0.5 \\\n  --shuffle True \\\n  --bert_score_eval \\\n  --quest_eval \\\n  --num_threads 5 \\\n  --show_progress_bar True\n```\n\n\n\u003e \\[!Tip\\]\n\u003e - You can modify the RAG components to be evaluated, attack tasks, and other parameters based on your specific evaluation needs.\n\n\n## Results\nThe default retrieval window for the silver noise task is set to top K = 6, with a default attack injection ratio of 3/6. For other tasks, the default retrieval window is top K = 2, and the attack injection ratio is fixed at 1/2. \nWe evaluate the security of 14 different types of RAG components against injected attack texts at different RAG stages (**indexing**, retrieval, and generation), including: (1) retrievers (**DPR**, BM25, Hybrid, Hybrid-Rerank); (2) filters (OFF, **filter NLI**, compressor SKR); and (3) generators (**DeepSeek**, GPT-3.5-turbo, GPT-4, GPT-4o, Qwen 7B, Qwen 14B, Baichuan 13B, ChatGLM 6B).\nThe bold values represent the default settings. Additionally, we adopt a unified sentence chunking strategy to segment the knowledge base during the indexing. The embedding model used is bge-base-zh-v1.5, the reranker is bge-reranker-base.  \n\n### Results on Noise\nWe inject different noise ratios into the text accessible in the RAG pipeline, including the **knowledge base**, **retrieved context**, and **filtered context**.\n\n\u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"https://github.com/IAAR-Shanghai/SafeRAG/blob/main/assets/sfr_result_of_task_N.png\" alt=\"SafeRAG\" width=\"93%\"\u003e\n\u003c/div\u003e\n\n\n\u003e - Regardless of the stage at which noise injection is performed, the F1 (avg) decreases as the noise ratio increases, indicating a decline in the diversity of generated responses.\n\u003e - Different retrievers exhibit varying degrees of noise resistance. The overall ranking of retrievers' robustness against noise attacks is Hybrid-Rerank \u003e Hybrid \u003e BM25 \u003e DPR. This suggests that hybrid retrievers and rerankers are more inclined to retrieve diverse golden contexts rather than homogeneous attack contexts.\n\u003e - When the noise ratio increases, the retrieval accuracy (RA) for noise injected into the retrieved or filtered context is significantly higher than that for noise injected into the knowledge base. This is because noise injected into the knowledge base has approximately a 50% chance of not being retrieved.\n\u003e - The compressor SKR lacks sufficient security. Although it attempts to merge redundant information in silver noise, it severely compresses the detailed information necessary to answer questions within the retrieved context, leading to a decrease in F1 (avg).\n\n### Results on Conflict, Toxicity, and DoS\n\u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"https://github.com/IAAR-Shanghai/SafeRAG/blob/main/assets/sfr_result_of_task_C.png\" alt=\"SafeRAG\" width=\"93%\"\u003e\n    \u003cimg src=\"https://github.com/IAAR-Shanghai/SafeRAG/blob/main/assets/sfr_result_of_task_T.png\" alt=\"SafeRAG\" width=\"93%\"\u003e\n    \u003cimg src=\"https://github.com/IAAR-Shanghai/SafeRAG/blob/main/assets/sfr_result_of_task_D.png\" alt=\"SafeRAG\" width=\"93%\"\u003e\n\u003c/div\u003e\n\n\u003e - After injecting different types of attacks into the texts accessible at any stage of the RAG pipeline, both F1 (avg) and the attack failure rate (AFR) decline across all three tasks. Specifically, conflict attacks make it difficult for the RAG to determine which information is true, potentially leading to the use of fabricated facts from the attack context, resulting in a drop in metrics. Toxicity attacks cause the RAG to misinterpret disguised authoritative statements as factual, leading to the automatic propagation of soft ads in generated responses, which also contributes to the metric decline. DoS attacks, on the other hand, make the RAG more likely to refuse to answer, even when relevant evidence is retrieved, further reducing the performance metrics. Overall, the ranking of attack effectiveness across different stages is: filtered context \u003e retrieved context \u003e knowledge base.   \n\u003e - Different retrievers exhibit varying vulnerabilities to different types of attacks. For instance, Hybrid-Rerank is more susceptible to conflict attacks, while DPR is more prone to DoS attacks. The vulnerability levels of retrievers under toxicity attacks are generally consistent. \n\u003e - Across different attack tasks, the changes in RA remain largely consistent regardless of the retriever used.   \n\u003e - In conflict tasks, using the compressor SKR is less secure as it compresses conflict details, leading to a decline in F1 (avg). In toxicity and DoS tasks, the filter NLI is generally ineffective, with its AFR close to that of disabling the filter. However, in toxicity and DoS tasks, the SKR compressor proves to be secure as it effectively compresses soft ads and warning content. \n\n## TODOs\n\u003cdetails\u003e\u003csummary\u003eClick me to show all TODOs\u003c/summary\u003e\n\n- [ ] feat: add SafeRAG PyPI package.\n- [ ] feat: release SafeRAG dataset on Hugging Face.\n- [ ] docs: extend dataset.\n\u003c/details\u003e\n\n# Project Structure\n```bash\n├── configs # This folder comprises scripts used to initialize the loading parameters of the large language models (LLMs) in RAG systems.\n│   └── config.py # Before running the project, users need to fill in their own key or local model path information to the corresponding location. \n├── embeddings # The embedding model used to build vector databases.\n│   └──base.py\n├── knowledge_base # Path to knowledge_base.\n│   ├──SN # The knowledge base used for silver noise.\n│   |   ├──add_SN # Add attacks to the clean knowledge base.\n│   |   └──db.txt # a clean knowledge base.\n│   ├──ICC # The knowledge base used for inter-context conflict.\n│   |   ├──add_ICC # Add attacks to the clean knowledge base.\n│   |   └──db.txt # a clean knowledge base.\n│   ├──SA  # The knowledge base used for soft ad.\n│   |   ├──add_SA # Add attacks to the clean knowledge base.\n│   |   └──db.txt # a clean knowledge base.\n│   └──WDoS # The knowledge base used for White DoS.\n│       ├──add_WDoS # Add attacks to the clean knowledge base.\n│       └──db.txt # a clean knowledge base.\n├── llms # This folder contains scripts used to load the LLMs.\n│   ├── api_model.py # Call GPT-series models.\n│   ├── local_model.py # Call a locally deployed model.\n│   └── remote_model.py # Call the model deployed remotely and encapsulated into an API.\n├── metric # The evaluation metric we used in the experiments.\n│   ├── common.py  # bleu, rouge, bertScore.\n│   └── quest_eval.py # Multiple-choice QuestEval. Note that using such metric requires calling a LLM such as GPT to answer questions, or modifying the code and deploying the question answering model yourself.\n├── datasets # This folder contains scripts used to load the dataset.\n├── output # The evaluation results will be retained here.\n├── prompts # The prompts we used in the experiments.\n├── retrievers # The retriever used in RAG system.\n└── tasks # The evaluation attack tasks.\n```\n\n\u003cp align=\"right\"\u003e\u003ca href=\"#top\"\u003e🔝Back to top\u003c/a\u003e\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FIAAR-Shanghai%2FSafeRAG","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FIAAR-Shanghai%2FSafeRAG","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FIAAR-Shanghai%2FSafeRAG/lists"}