{"id":13584835,"url":"https://github.com/shibing624/ChatPDF","last_synced_at":"2025-04-07T05:35:47.491Z","repository":{"id":156888999,"uuid":"628971097","full_name":"shibing624/ChatPDF","owner":"shibing624","description":"RAG for Local LLM, chat with PDF/doc/txt files, ChatPDF. 纯原生实现RAG功能，基于本地LLM、embedding模型、reranker模型实现，支持GraphRAG，无须安装任何第三方agent库。","archived":false,"fork":false,"pushed_at":"2025-04-02T13:30:13.000Z","size":2241,"stargazers_count":725,"open_issues_count":3,"forks_count":123,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-04-07T03:13:25.051Z","etag":null,"topics":["chatdoc","chatpdf","graphrag","llm","local-rag","pdf","rag"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shibing624.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-17T11:06:18.000Z","updated_at":"2025-04-06T13:45:02.000Z","dependencies_parsed_at":"2024-04-11T21:13:13.427Z","dependency_job_id":"6a506f23-5195-44ca-a010-c662bd257160","html_url":"https://github.com/shibing624/ChatPDF","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shibing624%2FChatPDF","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shibing624%2FChatPDF/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shibing624%2FChatPDF/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shibing624%2FChatPDF/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shibing624","download_url":"https://codeload.github.com/shibing624/ChatPDF/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247601448,"owners_count":20964862,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatdoc","chatpdf","graphrag","llm","local-rag","pdf","rag"],"created_at":"2024-08-01T15:04:33.106Z","updated_at":"2025-04-07T05:35:47.486Z","avatar_url":"https://github.com/shibing624.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eChatPDF\u003c/h1\u003e\n\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://github.com/shibing624/ChatPDF\"\u003e\n  \u003c/a\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003ch3\u003e基于本地 LLM 做检索知识问答(RAG)\u003c/h3\u003e\n    \u003cp align=\"center\"\u003e\n      \u003ca href=\"https://github.com/shibing624/ChatPDF/blob/main/LICENSE\"\u003e\n        \u003cimg alt=\"Tests Passing\" src=\"https://img.shields.io/github/license/shibing624/ChatPDF\" /\u003e\n      \u003c/a\u003e\n      \u003ca href=\"https://gradio.app/\"\u003e\n        \u003cimg alt=\"GitHub Contributors\" src=\"https://img.shields.io/badge/Base-Gradio-fb7d1a?style=flat\" /\u003e\n      \u003c/a\u003e\n      \u003cp\u003e\n        根据文件回答 / 开源模型 / 本地部署LLM\n      \u003c/p\u003e\n    \u003c/p\u003e\n    \u003cp align=\"center\"\u003e\n      \u003cimg alt=\"Animation Demo\" src=\"https://github.com/shibing624/ChatPDF/blob/main/docs/snap.png\" width=\"860\" /\u003e\n    \u003c/p\u003e\n  \u003c/p\u003e\n\u003c/div\u003e\n\n\n## 介绍\n- 本项目实现了轻量版的GraphRAG\n  - 支持`local`模式的关系图检索的文档问答\n  - 支持Openai API, Deepseek API, Ollama API等，可自行扩展支持更多LLM\n  - 支持openai embedding、本地 text2vec embedding、huggingface embedding、sentence-transformers embedding等\n  - 异步开发，支持多个API并发请求\n- 本项目支持多种开源LLM模型，包括Qwen/DeepSeek等\n- 本项目支持多种文件格式，包括PDF、docx、markdown、txt等\n- 本项目优化了RAG准确率\n  - Chinese chunk切分优化，适配中英文混合文档\n  - embedding优化，使用text2vec的sentence embedding，支持sentence embedding/字面相似度匹配算法\n  - 检索匹配优化，引入jieba分词的rank_BM25，提升对query关键词的字面匹配，使用字面相似度+sentence embedding向量相似度加权获取corpus候选集\n  - 新增reranker模块，对字面+语义检索的候选集进行rerank排序，减少候选集，并提升候选命中准确率，用`rerank_model_name_or_path`参数设置rerank模型\n  - 新增候选chunk扩展上下文功能，用`num_expand_context_chunk`参数设置命中的候选chunk扩展上下文窗口大小\n  - RAG底模优化，可以使用200k的基于RAG微调的LLM模型，支持自定义RAG模型，用`generate_model_name_or_path`参数设置底模\n- 本项目基于gradio开发了RAG对话页面，支持流式对话\n\n## 原理\n\n\u003cimg src=\"https://github.com/shibing624/ChatPDF/blob/main/docs/chatpdf.jpg\" width=\"860\" /\u003e\n\n## Usage\n\n### 安装依赖\n\n在终端中输入下面的命令，然后回车即可。\n```shell\npip install -r requirements.txt\n```\n\n如果您在使用Windows，建议通过WSL，在Linux上安装。如果您没有安装CUDA，并且不想只用CPU跑大模型，请先安装CUDA。\n\n如果下载慢，建议配置豆瓣源。\n\n### RAG示例\n\n请使用下面的命令。取决于你的系统，你可能需要用python或者python3命令。请确保你已经安装了Python。\n```shell\nCUDA_VISIBLE_DEVICES=0 python rag.py\n```\n\noutput:\n\n```\nprompt: 基于以下已知信息，用专业知识回答用户的问题。用简体中文回答。\n\n已知内容:\n[1]\t \"ReferencesPeter F Brown, John Cocke, Stephen A Della Pietra, Vincent J Della Pietra, Fredrick Jelinek, John DLafferty, Robert L Mercer, and Paul S Roossin. A statistical \n[2]\t \"Let be an encoder that infers the content zfor a given sentence xand a styley\n...\n\n问题:\n自然语言中的非平行迁移是指什么？\n\n---\n回答:\n自然语言中的非平行迁移是指在文本生成任务中，我们只能假设访问到非平行或单语的文本数据。这类任务包括翻译和摘要，其中所有问题都涉及这类任务。 \n['[1]\\t \"ReferencesPeter F Brown, John Cocke, Stephen A Della Pietra, Vincent J Della Pietra, Fredrick Jelinek, John DLafferty, Robert L Mercer, and Paul S Roossin. A statistical approach to machine translation.Computational linguistics \n'[2]\\t \"LetE:X\\x02Y!Z be an encoder that infers the content zfor a given sentence xand a styley, andG:Y\\x02Z!X be a generator that generates a sentence xfrom a given style yand contentz.EandGform an auto-encoder ', \n...\n]\n```\n\n### 启动Gradio的Web服务\n\n```shell\npython webui.py\n```\n\n现在，你应该已经可以在浏览器地址栏中输入 http://localhost:8082 查看并使用 ChatPDF 了。\n\n### GraphRAG示例\n\u003e [!TIP]\n\u003e\n\u003e  **Please set OpenAI API key in environment: `export OPENAI_API_KEY=\"sk-...\"`.** \n\u003e\n\u003e If you don't have LLM key, check out this [graphrag._model.py](https://github.com/shibing624/ChatPDF/blob/main/graphrag/_model.py#L120) that using `ollama` .\n\n```shell\npython graphrag_demo.py\n```\n\n\n## Contact\n\n- Issue(建议)：[![GitHub issues](https://img.shields.io/github/issues/shibing624/ChatPDF.svg)](https://github.com/shibing624/ChatPDF/issues)\n- 邮件我：xuming: xuming624@qq.com\n- 微信我：加我*微信号：xuming624, 备注：姓名-公司-NLP* 进NLP交流群。\n\n\u003cimg src=\"https://github.com/shibing624/ChatPDF/blob/main/docs/wechat.jpeg\" width=\"200\" /\u003e\n\n## License\n\n\n授权协议为 [The Apache License 2.0](LICENSE)，可免费用做商业用途。请在产品说明中附加ChatPDF的链接和授权协议。\n\n\n## Contribute\n项目代码还很粗糙，如果大家对代码有所改进，欢迎提交回本项目。\n\n### 关联项目推荐\n- [shibing624/MedicalGPT](https://github.com/shibing624/MedicalGPT)：训练自己的GPT大模型，实现了包括增量预训练、有监督微调、RLHF(奖励建模、强化学习训练)和DPO(直接偏好优化)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshibing624%2FChatPDF","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshibing624%2FChatPDF","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshibing624%2FChatPDF/lists"}