{"id":29981386,"url":"https://github.com/datawhalechina/all-in-rag","last_synced_at":"2025-09-11T01:09:19.587Z","repository":{"id":304067407,"uuid":"996626040","full_name":"datawhalechina/all-in-rag","owner":"datawhalechina","description":"🔍大模型应用开发实战：RAG技术全栈指南，在线阅读地址：datawhalechina.github.io/all-in-rag/","archived":false,"fork":false,"pushed_at":"2025-08-02T16:54:04.000Z","size":47448,"stargazers_count":104,"open_issues_count":1,"forks_count":30,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-08-02T18:42:40.721Z","etag":null,"topics":["ai","embedding","kimi-k2","langchain","llama-index","llm","milvus","multimodal","neo4j","python","rag"],"latest_commit_sha":null,"homepage":"https://datawhalechina.github.io/all-in-rag/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datawhalechina.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-05T08:12:35.000Z","updated_at":"2025-08-02T17:35:05.000Z","dependencies_parsed_at":"2025-08-02T18:32:23.155Z","dependency_job_id":null,"html_url":"https://github.com/datawhalechina/all-in-rag","commit_stats":null,"previous_names":["futureunreal/all-in-rag","datawhalechina/all-in-rag"],"tags_count":0,"template":false,"template_full_name":"datawhalechina/repo-template","purl":"pkg:github/datawhalechina/all-in-rag","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datawhalechina%2Fall-in-rag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datawhalechina%2Fall-in-rag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datawhalechina%2Fall-in-rag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datawhalechina%2Fall-in-rag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datawhalechina","download_url":"https://codeload.github.com/datawhalechina/all-in-rag/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datawhalechina%2Fall-in-rag/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268721480,"owners_count":24296498,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-04T02:00:09.867Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","embedding","kimi-k2","langchain","llama-index","llm","milvus","multimodal","neo4j","python","rag"],"created_at":"2025-08-04T16:02:56.837Z","updated_at":"2025-09-11T01:09:19.578Z","avatar_url":"https://github.com/datawhalechina.png","language":"Python","funding_links":[],"categories":["Python","语言资源库"],"sub_categories":["books"],"readme":"# All-in-RAG | 大模型应用开发实战一：RAG技术全栈指南\n\n\u003cdiv align='center'\u003e\n  \u003cimg src=\"./docs/logo.svg\" alt=\"All-in-RAG Logo\" width=\"70%\"\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003ch2\u003e🔍 检索增强生成 (RAG) 技术全栈指南\u003c/h2\u003e\n  \u003cp\u003e\u003cem\u003e从理论到实践，从基础到进阶，构建你的RAG技术体系\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/github/stars/datawhalechina/all-in-rag?style=for-the-badge\u0026logo=github\u0026color=ff6b6b\" alt=\"GitHub stars\"/\u003e\n  \u003cimg src=\"https://img.shields.io/github/forks/datawhalechina/all-in-rag?style=for-the-badge\u0026logo=github\u0026color=4ecdc4\" alt=\"GitHub forks\"/\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Python-3.12.7-blue?style=for-the-badge\u0026logo=python\u0026logoColor=white\" alt=\"Python\"/\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://datawhalechina.github.io/all-in-rag/\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/📖_在线阅读-立即开始-success?style=for-the-badge\u0026logoColor=white\" alt=\"在线阅读\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"README_en.md\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/🌍_English-Version-blue?style=for-the-badge\u0026logoColor=white\" alt=\"English Version\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/datawhalechina\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/💬_讨论交流-加入我们-purple?style=for-the-badge\u0026logoColor=white\" alt=\"讨论交流\"/\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cbr\u003e\n  \u003ctable\u003e\n    \u003ctr\u003e\n      \u003ctd align=\"center\"\u003e🎯 \u003cstrong\u003e系统化学习\u003c/strong\u003e\u003cbr\u003e完整的RAG技术体系\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e🛠️ \u003cstrong\u003e动手实践\u003c/strong\u003e\u003cbr\u003e丰富的项目案例\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e🚀 \u003cstrong\u003e生产就绪\u003c/strong\u003e\u003cbr\u003e工程化最佳实践\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e📊 \u003cstrong\u003e多模态支持\u003c/strong\u003e\u003cbr\u003e文本+图像检索\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/table\u003e\n\u003c/div\u003e\n\n## 项目简介（中文 | [English](README_en.md)）\n\n本项目是一个面向大模型应用开发者的RAG（检索增强生成）技术全栈教程，旨在通过体系化的学习路径和动手实践项目，帮助开发者掌握基于大语言模型的RAG应用开发技能，构建生产级的智能问答和知识检索系统。\n\n**主要内容包括：**\n\n1. **RAG技术基础**：深入浅出地介绍RAG的核心概念、技术原理和应用场景\n2. **数据处理全流程**：从数据加载、清洗到文本分块的完整数据准备流程\n3. **索引构建与优化**：向量嵌入、多模态嵌入、向量数据库构建及索引优化技术\n4. **检索技术进阶**：混合检索、查询构建、Text2SQL等高级检索技术\n5. **生成集成与评估**：格式化生成、系统评估与优化方法\n6. **项目实战**：从基础到进阶的完整RAG应用开发实践\n\n## 项目意义\n\n随着大语言模型的快速发展，RAG技术已成为构建智能问答系统、知识检索应用的核心技术。然而，现有的RAG教程往往零散且缺乏系统性，初学者难以形成完整的技术体系认知。\n\n本项目从实践出发，结合最新的RAG技术发展趋势，构建了一套完整的RAG学习体系，帮助开发者：\n- 系统掌握RAG技术的理论基础和实践技能\n- 理解RAG系统的完整架构和各组件的作用\n- 具备独立开发RAG应用的能力\n- 掌握RAG系统的评估和优化方法\n\n## 项目受众\n\n**本项目适合以下人群学习：**\n- 具备Python编程基础，对RAG技术感兴趣的开发者\n- 希望系统学习RAG技术的AI工程师\n- 想要构建智能问答系统的产品开发者\n- 对检索增强生成技术有学习需求的研究人员\n\n**前置要求：**\n- 掌握Python基础语法和常用库的使用\n- 能够简单使用docker\n- 了解基本的LLM概念（推荐但非必需）\n- 具备基础的Linux命令行操作能力\n\n## 项目亮点\n\n1. **体系化学习路径**：从基础概念到高级应用，构建完整的RAG技术学习体系\n2. **理论与实践并重**：每个章节都包含理论讲解和代码实践，确保学以致用\n3. **多模态支持**：不仅涵盖文本RAG，还包括多模态嵌入和检索技术\n4. **工程化导向**：注重实际应用中的工程化问题，包括性能优化、系统评估等\n5. **丰富的实战项目**：提供从基础到进阶的多个实战项目，帮助巩固学习成果\n\n## 内容大纲\n\n### 第一部分：RAG基础入门\n\n**第一章 解锁RAG** [📖 查看章节](./docs/chapter1)\n1. [x] [RAG简介](./docs/chapter1/01_RAG_intro.md) - RAG技术概述与应用场景\n2. [x] [准备工作](./docs/chapter1/02_preparation.md) - 环境配置与准备\n3. [x] [四步构建RAG](./docs/chapter1/03_get_start_rag.md) - 快速上手RAG开发\n4. [x] [附：环境部署](./docs/chapter1/virtualenv.md) - Python虚拟环境部署方案补充 (贡献者: [@anarchysaiko](https://github.com/anarchysaiko))\n\n**第二章 数据准备** [📖 查看章节](./docs/chapter2)\n1. [x] [数据加载](./docs/chapter2/04_data_load.md) - 多格式文档处理与加载\n2. [x] [文本分块](./docs/chapter2/05_text_chunking.md) - 文本切分策略与优化\n\n### 第二部分：索引构建与优化\n\n**第三章 索引构建** [📖 查看章节](./docs/chapter3)\n1. [x] [向量嵌入](./docs/chapter3/06_vector_embedding.md) - 文本向量化技术详解\n2. [x] [多模态嵌入](./docs/chapter3/07_multimodal_embedding.md) - 图文多模态向量化\n3. [x] [向量数据库](./docs/chapter3/08_vector_db.md) - 向量存储与检索系统\n4. [x] [Milvus实践](./docs/chapter3/09_milvus.md) - Milvus多模态检索实战\n5. [x] [索引优化](./docs/chapter3/10_index_optimization.md) - 索引性能调优技巧\n\n### 第三部分：检索技术进阶\n\n**第四章 检索优化** [📖 查看章节](./docs/chapter4)\n1. [x] [混合检索](./docs/chapter4/11_hybrid_search.md) - 稠密+稀疏检索融合\n2. [x] [查询构建](./docs/chapter4/12_query_construction.md) - 智能查询理解与构建\n3. [x] [Text2SQL](./docs/chapter4/13_text2sql.md) - 自然语言转SQL查询\n4. [x] [查询重构与分发](./docs/chapter4/14_query_rewriting.md) - 查询优化策略\n5. [x] [检索进阶技术](./docs/chapter4/15_advanced_retrieval_techniques.md) - 高级检索算法\n\n### 第四部分：生成与评估\n\n**第五章 生成集成** [📖 查看章节](./docs/chapter5)\n1. [x] [格式化生成](./docs/chapter5/16_formatted_generation.md) - 结构化输出与格式控制\n\n**第六章 RAG系统评估** [📖 查看章节](./docs/chapter6)\n1. [x] [评估介绍](./docs/chapter6/18_system_evaluation.md) - RAG系统评估方法论\n2. [x] [评估工具](./docs/chapter6/19_common_tools.md) - 常用评估工具与指标\n\n### 第五部分：高级应用与实战\n\n**第七章 高级RAG架构（拓展部分）** [📖 查看章节](./docs/chapter7)\n\n1. [x] [基于知识图谱的RAG](./docs/chapter7/20_kg_rag.md)\n\n**第八章 项目实战一** [📖 查看章节](./docs/chapter8)\n1. [x] [环境配置与项目架构](./docs/chapter8/01_env_architecture.md)\n2. [x] [数据准备模块实现](./docs/chapter8/02_data_preparation.md)\n3. [x] [索引构建与检索优化](./docs/chapter8/03_index_retrieval.md)\n4. [x] [生成集成与系统整合](./docs/chapter8/04_generation_sys.md)\n\n**第九章 项目实战一优化（选修篇）** [📖 查看章节](./docs/chapter9)\n\n[🍽️ 项目展示](https://github.com/FutureUnreal/What-to-eat-today)\n1. [x] [图RAG架构设计](./docs/chapter9/01_graph_rag_architecture.md)\n2. [x] [图数据建模与准备](./docs/chapter9/02_graph_data_modeling.md)\n3. [x] [Milvus索引构建](./docs/chapter9/03_index_construction.md)\n4. [x] [智能查询路由与检索策略](./docs/chapter9/04_intelligent_query_routing.md)\n\n**第十章 项目实战二（选修篇）** [📖 查看章节](./docs/chapter10) *规划中*\n\n## 目录结构说明\n\n```\nall-in-rag/\n├── docs/           # 教程文档\n├── code/           # 代码示例\n├── data/           # 示例数据\n├── models/         # 预训练模型\n└── README.md       # 项目说明\n```\n\n## 实战项目展示\n\n### 第八章 项目一：\n\n![项目一](./project01.png)\n\n### 第九章 项目一（Graph RAG优化）：\n\n![项目一（Graph RAG优化）](./project01_graph.png)\n\n### 第十章 项目二：\n\n## 致谢\n\n**核心贡献者**\n- [尹大吕-项目负责人](https://github.com/FutureUnreal)（项目发起人与主要贡献者）\n\n**额外章节贡献者**\n- [孙超-内容创作者](https://github.com/anarchysaiko)（Datawhale成员-上海工程技术大学）\n\n### 特别感谢\n- 感谢 [@Sm1les](https://github.com/Sm1les) 对本项目的帮助与支持\n- 感谢所有为本项目做出贡献的开发者们\n- 感谢开源社区提供的优秀工具和框架支持\n- 特别感谢以下为教程做出贡献的开发者！\n\n[![Contributors](https://contrib.rocks/image?repo=datawhalechina/all-in-rag)](https://github.com/datawhalechina/all-in-rag/graphs/contributors)\n\n*Made with [contrib.rocks](https://contrib.rocks).*\n\n## 参与贡献\n\n我们欢迎所有形式的贡献，包括但不限于：\n\n- 🚨 **Bug报告**：发现问题请提交 [Issue](https://github.com/datawhalechina/all-in-rag/issues)\n- 💭 **教程建议**：有好的想法欢迎在 [Discussions](https://github.com/datawhalechina/all-in-rag/discussions) 中讨论\n- 📚 **文档改进**：帮助完善文档内容和示例代码（当前仅支持第七章优质内容pr）\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=datawhalechina/all-in-rag\u0026type=Date)](https://star-history.com/#datawhalechina/all-in-rag\u0026Date)\n\n\u003cdiv align=\"center\"\u003e\n  \u003cp\u003e如果这个项目对你有帮助，请给我们一个 ⭐️\u003c/p\u003e\n  \u003cp\u003e让更多人发现这个项目（护食？发来！）\u003c/p\u003e\n\u003c/div\u003e\n\n![star](./emoji.png)\n\n## 关于 Datawhale\n\n\u003cdiv align='center'\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/datawhalechina/pumpkin-book/master/res/qrcode.jpeg\" alt=\"Datawhale\" width=\"30%\"\u003e\n    \u003cp\u003e扫描二维码关注 Datawhale 公众号，获取更多优质开源内容\u003c/p\u003e\n\u003c/div\u003e\n\n---\n\n## 许可证\n\n\u003ca rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc-sa/4.0/\"\u003e\u003cimg alt=\"知识共享许可协议\" style=\"border-width:0\" src=\"https://img.shields.io/badge/license-CC%20BY--NC--SA%204.0-lightgrey\" /\u003e\u003c/a\u003e\n\n本作品采用 [知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议](http://creativecommons.org/licenses/by-nc-sa/4.0/) 进行许可。\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatawhalechina%2Fall-in-rag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatawhalechina%2Fall-in-rag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatawhalechina%2Fall-in-rag/lists"}