{"id":19978452,"url":"https://github.com/fiercex/document_qa","last_synced_at":"2025-06-27T03:06:37.077Z","repository":{"id":142934589,"uuid":"612125085","full_name":"fierceX/Document_QA","owner":"fierceX","description":"类似于chatpdf的简化demo版","archived":false,"fork":false,"pushed_at":"2023-03-10T09:11:08.000Z","size":3,"stargazers_count":191,"open_issues_count":6,"forks_count":30,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-07T08:24:12.963Z","etag":null,"topics":["chatgpt","chatpdf"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fierceX.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-10T08:53:35.000Z","updated_at":"2025-02-28T01:38:23.000Z","dependencies_parsed_at":"2023-06-07T08:45:49.337Z","dependency_job_id":null,"html_url":"https://github.com/fierceX/Document_QA","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/fierceX/Document_QA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fierceX%2FDocument_QA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fierceX%2FDocument_QA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fierceX%2FDocument_QA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fierceX%2FDocument_QA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fierceX","download_url":"https://codeload.github.com/fierceX/Document_QA/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fierceX%2FDocument_QA/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262180947,"owners_count":23271313,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatgpt","chatpdf"],"created_at":"2024-11-13T03:33:38.846Z","updated_at":"2025-06-27T03:06:37.045Z","avatar_url":"https://github.com/fierceX.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Document_QA\n\n根据传入的文本文件，回答你的问题。\n\n核心逻辑来自于chatPDF，自动化客服AI，以及：[ChatWeb](https://github.com/SkywalkerDarren/chatWeb)\n\n由于原来的ChatWeb项目使用的是pqsql作为向量存储和计算工具，较为复杂，本项目修改成faiss，更简单快速。\n\n\n# 基本原理\n\n1. 读取文件，并进行分割\n2. 对于每段文本，使用text-embedding-ada-002生成特征向量\n3. 将向量和文本对应关系存入本地pkl文件\n4. 对于用户输入，生成向量\n5. 使用向量数据库进行最近邻搜索，返回最相似的文本列表\n6. 使用gpt3.5的chatAPI，设计prompt，使其基于最相似的文本列表进行回答\n\n就是先把大量文本中提取相关内容，再进行回答，最终可以达到类似突破token限制的效果  \n后续可以考虑将openai的文本向量改成自定义的向量生成工具\n\n# 准备开始\n\n- 项目依赖\n\n主要依赖\n```\nfaiss\nnumpy\nopenai\n```\n\n- 环境变量\n\n设置`OPENAI_API_KEY`为你的openai的api key\n\n```shell\nexport OPENAI_API_KEY=\"sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\"\n```\n\n- 运行\n\n```\npython Document_QA.py --input_file test.md --file_embeding test.pkl\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffiercex%2Fdocument_qa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffiercex%2Fdocument_qa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffiercex%2Fdocument_qa/lists"}