{"id":23575888,"url":"https://github.com/signitdoc/semantic-file-retrieval","last_synced_at":"2025-05-05T19:12:19.575Z","repository":{"id":268487020,"uuid":"894966871","full_name":"SignitDoc/semantic-file-retrieval","owner":"SignitDoc","description":"A semantic file retrieval application based on LLM（一个轻量级基于大模型解析的多模态文件语义检索工具，不同于传统基于文件名或metadata检索的方式，该工具可实现基于文件内容的语义检索，支持各类主流格式文档、图片、音频、视频。)","archived":false,"fork":false,"pushed_at":"2024-12-26T03:23:36.000Z","size":511,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-12-26T04:18:57.211Z","etag":null,"topics":["chromadb","llm","multimodal","ollama","semantic-search"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SignitDoc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-27T10:21:31.000Z","updated_at":"2024-12-26T03:33:18.000Z","dependencies_parsed_at":null,"dependency_job_id":"dec5494e-94f7-4473-b424-2e2bbce4869f","html_url":"https://github.com/SignitDoc/semantic-file-retrieval","commit_stats":null,"previous_names":["signitdoc/semantic-file-retrieval"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SignitDoc%2Fsemantic-file-retrieval","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SignitDoc%2Fsemantic-file-retrieval/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SignitDoc%2Fsemantic-file-retrieval/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SignitDoc%2Fsemantic-file-retrieval/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SignitDoc","download_url":"https://codeload.github.com/SignitDoc/semantic-file-retrieval/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":231409559,"owners_count":18372472,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chromadb","llm","multimodal","ollama","semantic-search"],"created_at":"2024-12-26T21:10:19.080Z","updated_at":"2024-12-26T21:10:19.802Z","avatar_url":"https://github.com/SignitDoc.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 文件语义检索\n一个轻量级基于大模型解析的多模态文件语义检索工具，不同于传统基于文件名或metadata检索的方式，该工具可实现基于文件内容的语义检索，支持各类主流格式文档、图片、音频、视频。\n\nRead this in [English](README_en.md)\n\n## 架构\n![架构图](assets/architecture.png)\n\n## Demo\nhttps://github.com/user-attachments/assets/f1590c6f-5d5c-44c6-8370-591ed66e7452\n\n## 快速开始\n1. 安装依赖\n```bash\npip install -r requirements.txt\n```\n\n2. 在项目根目录创建.env配置文件，配置OLLAMA_BASE_URL（使用本地ollama服务）或GLM_API_KEY（使用智谱AI开放平台服务）\n\n\n3. 运行项目\n```bash\nstreamlit run main.py\n```\n## Docker部署\n1. 使用项目自带的Dockerfile构建镜像\n```bash\ndocker build -t semantic-file-retrieval:latest .\n```\n\n2. 运行容器\n```bash\ndocker run -d -e OLLAMA_BASE_URL=\"http://x.x.x.x:11434\" -p 8501:8501 semantic-file-retrieval:latest\n```\n\u003e _.env文件中的所有配置均可通过docker运行命令的环境变量参数覆盖_\n\n## TODO\n- [ ] 支持音频\n- [ ] 支持视频\n- [ ] 支持扫描PDF文档\n- [ ] 支持Office文档（docx/xlsx/pptx）\n- [ ] 支持图搜图\n- [ ] 支持批量上传\n- [ ] 提供Restful API供集成到其他系统使用\n- [ ] 支持离线处理大文件（100M+）\n- [ ] 支持文件类型过滤\n- [ ] 同时支持传统检索（关键词匹配）和语义检索\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsignitdoc%2Fsemantic-file-retrieval","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsignitdoc%2Fsemantic-file-retrieval","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsignitdoc%2Fsemantic-file-retrieval/lists"}