{"id":23141095,"url":"https://github.com/nimad70/vulrag","last_synced_at":"2026-05-06T03:33:39.522Z","repository":{"id":265697238,"uuid":"849601220","full_name":"nimad70/VulRAG","owner":"nimad70","description":"Investigating the vulnerability of Large Language Models (LLMs) to misinformation in Retrieval-Augmented Generation (RAG) systems by poisoning vector databases and analyzing LLM responses to identify potential weaknesses and exploitation risks.","archived":false,"fork":false,"pushed_at":"2025-03-15T11:51:26.000Z","size":53329,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-04T18:38:22.802Z","etag":null,"topics":["deeplearning","huggingface","huggingface-transformers","llms","nlp","python3","rag","security-vulnerability","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc-by-sa-4.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nimad70.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-08-29T22:26:47.000Z","updated_at":"2025-08-31T08:44:57.000Z","dependencies_parsed_at":null,"dependency_job_id":"ee1ac84f-97f1-4ec8-bdc0-2c6e3ace4aa4","html_url":"https://github.com/nimad70/VulRAG","commit_stats":null,"previous_names":["nimad70/vulrag"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/nimad70/VulRAG","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nimad70%2FVulRAG","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nimad70%2FVulRAG/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nimad70%2FVulRAG/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nimad70%2FVulRAG/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nimad70","download_url":"https://codeload.github.com/nimad70/VulRAG/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nimad70%2FVulRAG/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32677928,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-06T02:33:58.958Z","status":"ssl_error","status_checked_at":"2026-05-06T02:33:39.611Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deeplearning","huggingface","huggingface-transformers","llms","nlp","python3","rag","security-vulnerability","tensorflow"],"created_at":"2024-12-17T14:12:25.441Z","updated_at":"2026-05-06T03:33:39.491Z","avatar_url":"https://github.com/nimad70.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# VulRAG: Investigating LLM Vulnerabilities in RAG Systems\n\nThis repository accompanies the Master's thesis titled:\n\n**\"Security in Machine Learning: Exposing LLM Vulnerabilities through Poisoned Vector Databases in RAG-based Systems\"**\n\nThe research focuses on the susceptibility of Large Language Models (LLMs) to misinformation within Retrieval-Augmented Generation (RAG) systems. It introduces poisoned data into vector databases and analyzes the resulting LLM responses to identify potential weaknesses and exploitation risks.\n\n## Overview\n\nLarge Language Models (LLMs) have significantly advanced natural language processing tasks. However, their integration with Retrieval-Augmented Generation (RAG) systems introduces vulnerabilities, especially when external knowledge sources are compromised. This project examines how poisoning vector databases—integral components of RAG systems—affect LLM outputs, aiming to uncover and address security flaws.\n\n## Objectives\n\n- **Assess LLM Vulnerabilities**: Determine the impact of corrupted data within vector databases on LLM performance.\n- **Analyze RAG System Security**: Investigate how RAG architectures can be exploited through data poisoning.\n- **Develop Mitigation Strategies**: Propose methods to enhance the resilience of RAG-based LLM systems against such attacks.\n\n## Methodology\n\n1. **Data Poisoning**: Introduce malicious entries into vector databases used by RAG systems.\n2. **LLM Evaluation**: Utilize state-of-the-art LLMs to process queries and observe the influence of poisoned data.\n3. **Security Analysis**: Identify vulnerabilities and potential exploitation avenues within the RAG framework.\n4. **Mitigation Development**: Formulate and test strategies to defend against identified threats.\n\n## Repository Contents\n\n- **Code**: Scripts and tools for poisoning vector databases and analyzing LLM responses.\n- **Data**: Sample datasets for experimentation, including clean and poisoned versions.\n- **Documentation**: Detailed explanations of methodologies, findings, and proposed countermeasures.\n\n## Usage\n\n1. **Clone the Repository**:\n   ```bash\n   git clone https://github.com/nimad70/VulRAG.git\n\n2. Explore the repository for scripts and data relevant to your experiments.\n\n## License\n\n\u003cp xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:dct=\"http://purl.org/dc/terms/\"\u003e\u003ca property=\"dct:title\" rel=\"cc:attributionURL\" href=\"https://github.com/nimad70/VulRAG\"\u003eVulRAG: Investigating LLM Vulnerabilities in RAG Systems\u003c/a\u003e by \u003ca rel=\"cc:attributionURL dct:creator\" property=\"cc:attributionName\" href=\"https://github.com/nimad70\"\u003eNima Daryabar\u003c/a\u003e is licensed under \u003ca href=\"https://creativecommons.org/licenses/by-sa/4.0/?ref=chooser-v1\" target=\"_blank\" rel=\"license noopener noreferrer\" style=\"display:inline-block;\"\u003eCC BY-SA 4.0\u003cimg style=\"height:22px!important;margin-left:3px;vertical-align:text-bottom;\" src=\"https://mirrors.creativecommons.org/presskit/icons/cc.svg?ref=chooser-v1\" alt=\"\"\u003e\u003cimg style=\"height:22px!important;margin-left:3px;vertical-align:text-bottom;\" src=\"https://mirrors.creativecommons.org/presskit/icons/by.svg?ref=chooser-v1\" alt=\"\"\u003e\u003cimg style=\"height:22px!important;margin-left:3px;vertical-align:text-bottom;\" src=\"https://mirrors.creativecommons.org/presskit/icons/sa.svg?ref=chooser-v1\" alt=\"\"\u003e\u003c/a\u003e\u003c/p\u003e\n\n## Citation\n\nIf you use this repository in your work, please cite it as follows:\n\n```BibTeX\n@thesis{daryabar2023security,\n  title={Security in Machine Learning: Exposing LLM Vulnerabilities through Poisoned Vector Databases in RAG-based Systems},\n  author={Nima Daryabar},\n  institution={University of Padova},\n  year={2023}\n}\n```\n\n## Contact\n\nFor inquiries or collaboration opportunities, please contact:\n\n- **Name**: Nima Daryabar\n- **Email**: [nima daryabar](mailto:nima.daryabar@studenti.unipd.it)\n- **GitHub**: [nimad70](https://github.com/nimad70)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnimad70%2Fvulrag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnimad70%2Fvulrag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnimad70%2Fvulrag/lists"}