{"id":28962448,"url":"https://github.com/taskpyroer/redmansionrag","last_synced_at":"2025-09-08T23:33:19.460Z","repository":{"id":299975676,"uuid":"1004754818","full_name":"taskPyroer/RedMansionRAG","owner":"taskPyroer","description":"分享一个基于检索增强生成（RAG）技术的红楼梦问答系统的完整实现。这个项目展示了如何将传统文学作品与现代AI技术相结合，为用户提供智能化的文学问答体验。","archived":false,"fork":false,"pushed_at":"2025-06-19T07:31:34.000Z","size":3684,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-19T08:36:22.566Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://readrag.streamlit.app/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/taskPyroer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-19T06:10:36.000Z","updated_at":"2025-06-19T08:00:28.000Z","dependencies_parsed_at":"2025-06-19T08:47:40.556Z","dependency_job_id":null,"html_url":"https://github.com/taskPyroer/RedMansionRAG","commit_stats":null,"previous_names":["taskpyroer/redmansionrag"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/taskPyroer/RedMansionRAG","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taskPyroer%2FRedMansionRAG","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taskPyroer%2FRedMansionRAG/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taskPyroer%2FRedMansionRAG/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taskPyroer%2FRedMansionRAG/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/taskPyroer","download_url":"https://codeload.github.com/taskPyroer/RedMansionRAG/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taskPyroer%2FRedMansionRAG/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274231181,"owners_count":25245675,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-08T02:00:09.813Z","response_time":121,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-24T03:06:42.340Z","updated_at":"2025-09-08T23:33:19.427Z","avatar_url":"https://github.com/taskPyroer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 红楼梦RAG问答系统\n\n一个基于DeepSeek API的红楼梦智能问答系统，通过RAG（检索增强生成）技术实现精准的文本问答。\n\nps: 本项目仅供学习参考，代码不够优雅的地方可自行优化，代码比较简洁，易扩展，欢迎star\n\n## 🌟 特性\n\n- 📚 **智能文档检索**: 基于TF-IDF和余弦相似度的高效文档检索\n- 🤖 **AI问答**: 集成DeepSeek API，提供准确、优雅的回答\n- 🔍 **中文优化**: 使用jieba分词，专门优化中文文本处理\n- 💾 **智能缓存**: 自动缓存向量索引，提升响应速度\n- 🎯 **精准匹配**: 多层次文本分块，确保检索精度\n- 🎨 **优雅界面**: 清晰的命令行交互界面\n- 🌐 **Web界面**: 基于Streamlit的现代化交互界面\n- 💬 **聊天体验**: 类ChatGPT的对话式问答体验\n- 📱 **响应式设计**: 支持桌面和移动设备访问\n\n## 📋 系统要求\n\n- 建议Python 3.9+（本人使用3.11）\n- DeepSeek API密钥\n- 约50MB磁盘空间（用于依赖和缓存）\n\n## 🚀 快速开始\n\n### 1. 安装依赖\n\n```bash\npip install -r requirements.txt\n```\n\n### 2. 配置API密钥\n\n#### 方法一：使用环境变量（推荐）\n\n```bash\n# Windows\nset DEEPSEEK_API_KEY=your_deepseek_api_key_here\n\n# Linux/Mac\nexport DEEPSEEK_API_KEY=your_deepseek_api_key_here\n```\n\n#### 方法二：使用配置文件\n\n1. 复制配置文件模板：\n```bash\ncopy .env.example .env\n```\n\n2. 编辑 `.env` 文件，填入您的API密钥：\n```\nDEEPSEEK_API_KEY=your_deepseek_api_key_here\n```\n\n### 3. 获取DeepSeek API密钥\n\n1. 访问 [DeepSeek平台](https://platform.deepseek.com/)\n2. 注册并登录账户\n3. 在API管理页面创建新的API密钥\n4. 复制密钥并按上述方法配置\n\n### 4. 运行系统\n\n#### 方式一：Streamlit Web界面（推荐）\n```bash\nstreamlit run streamlit_app.py\n```\n\n#### 方式二：命令行界面\n```bash\n# 完整交互模式\npython rag_system.py\n```\n\n## 使用说明\n\n### Streamlit Web界面（推荐）\n\n1. **启动应用**:\n   ```bash\n   streamlit run streamlit_app.py\n   ```\n   \n2. **功能特性**:\n   - 🎨 **现代化界面**: 美观的Web界面，支持响应式设计\n   - 💬 **聊天体验**: 类似ChatGPT的对话式交互\n   - ⚙️ **实时配置**: 侧边栏实时配置API密钥和系统参数\n   - 📊 **系统监控**: 实时显示系统状态和文档加载情况\n   - 💡 **示例问题**: 预设常见问题，一键提问\n   - 📖 **文档引用**: 显示答案来源的文档片段和相似度\n   - 🗑️ **历史管理**: 支持清空聊天历史\n\n3. **使用步骤**:\n   - 在侧边栏输入DeepSeek API密钥\n   - 点击\"初始化系统\"按钮\n   - 等待文档加载完成\n   - 在聊天框中输入问题或点击示例问题\n   - 查看AI回答和相关文档片段\n\n### 命令行界面\n\n\n#### 交互模式\n\n运行 `rag_system.py` 进入完全交互模式：\n\n```\n=== 红楼梦智能问答系统 ===\n输入您的问题，输入 'quit' 或 'exit' 退出\n示例问题：\n- 甄士隐是谁？\n- 贾雨村的故事是什么？\n- 通灵宝玉是什么？\n- 英莲发生了什么事？\n\n请输入您的问题: \n```\n\n### 示例问答\n\n**问题**: 甄士隐是谁？\n\n**答案**: 甄士隐是《红楼梦》开篇的重要人物，姓甄，名费，字士隐，住在姑苏阊门外十里街仁清巷葫芦庙旁。他是一位乡宦，嫡妻封氏，情性贤淑，深明礼义。虽然家中不甚富贵，但在本地也算是望族。甄士隐禀性恬淡，不以功名为念，每日只以观花修竹、酌酒吟诗为乐，可谓神仙一流人品...\n\n## 🏗️ 系统架构\n\n```\n红楼梦RAG系统\n├── 文档加载模块\n│   ├── 文档读取\n│   └── 文本预处理\n├── 向量化模块\n│   ├── 中文分词 (jieba)\n│   ├── TF-IDF向量化\n│   └── 向量索引构建\n├── 检索模块\n│   ├── 查询向量化\n│   ├── 相似度计算\n│   └── 相关文档排序\n└── 生成模块\n    ├── 上下文构建\n    ├── DeepSeek API调用\n    └── 答案生成\n```\n\n## 📁 项目结构\n\n```\nRagDemos/\n├── docs/                    # 文档目录\n│   └── 1.txt               # 红楼梦文本\n├── cache/                   # 缓存目录\n│   ├── doc_chunks.pkl      # 文档块缓存\n│   └── doc_vectors.pkl     # 向量索引缓存\n├── rag_system.py           # 核心RAG系统\n├── streamlit_app.py        # Streamlit Web应用\n├── requirements.txt        # 依赖列表\n├── .env.example           # 环境变量模板\n├── .env                   # 环境变量文件（需要创建）\n└── README.md              # 项目说明\n```\n\n## ⚙️ 配置选项\n\n可以通过修改 `rag_system.py` 中的参数来调整系统行为：\n\n```python\n# 文档分块参数\nchunk_size = 300      # 文档块大小\noverlap = 50          # 重叠字符数\n\n# 检索参数\ntop_k = 3             # 返回最相关的文档块数量\nmin_similarity = 0.01 # 最小相似度阈值\n\n# TF-IDF参数\nmax_features = 5000   # 最大特征数\nngram_range = (1, 2)  # N-gram范围\n```\n\n## 🔧 高级功能\n\n### 添加更多文档\n\n1. 将新的文本文件放入 `docs/` 目录\n2. 删除 `cache/` 目录中的缓存文件\n3. 重新运行系统，会自动重建索引\n\n### 自定义分词\n\n可以在 `rag_system.py` 中自定义jieba分词：\n\n```python\n# 添加自定义词典\njieba.load_userdict('custom_dict.txt')\n\n# 添加关键词\njieba.add_word('贾宝玉')\njieba.add_word('林黛玉')\n```\n\n### API参数调整\n\n可以修改DeepSeek API调用参数：\n\n```python\nresponse = self.client.chat.completions.create(\n    model=\"deepseek-chat\",\n    temperature=0.7,      # 创造性程度\n    max_tokens=1000,      # 最大回复长度\n    top_p=0.9,           # 核采样参数\n    frequency_penalty=0.1 # 重复惩罚\n)\n```\n\n## 🐛 故障排除\n\n### 常见问题\n\n1. **API密钥错误**\n   ```\n   错误: 401 Unauthorized\n   解决: 检查API密钥是否正确设置\n   ```\n\n2. **依赖安装失败**\n   ```bash\n   # 升级pip\n   python -m pip install --upgrade pip\n   \n   # 使用国内镜像\n   pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/\n   ```\n\n3. **中文分词问题**\n   ```bash\n   # 重新安装jieba\n   pip uninstall jieba\n   pip install jieba\n   ```\n\n4. **缓存问题**\n   ```bash\n   # 清除缓存重建索引\n   rmdir /s cache  # Windows\n   rm -rf cache    # Linux/Mac\n   ```\n\n### 性能优化\n\n1. **增加缓存**: 系统会自动缓存向量索引，首次运行较慢，后续运行会很快\n2. **调整块大小**: 较小的块提供更精确的检索，较大的块提供更多上下文\n3. **优化检索数量**: 增加 `top_k` 可以获得更全面的上下文，但会增加API调用成本\n\n## 📊 系统指标\n\n- **文档处理速度**: ~1000字符/秒\n- **检索响应时间**: \u003c100ms\n- **API调用时间**: 1-3秒（取决于网络）\n\n## 🙏 致谢\n\n- [DeepSeek](https://www.deepseek.com/) - 提供强大的AI模型\n- [jieba](https://github.com/fxsjy/jieba) - 优秀的中文分词工具\n- [scikit-learn](https://scikit-learn.org/) - 机器学习工具包\n- 《红楼梦》- 中华文学瑰宝\n\n---\n\n**没人比我更懂红楼梦！** 🏮","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaskpyroer%2Fredmansionrag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftaskpyroer%2Fredmansionrag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaskpyroer%2Fredmansionrag/lists"}