{"id":27394613,"url":"https://github.com/pig-mesh/office2md","last_synced_at":"2026-02-28T23:32:14.253Z","repository":{"id":270509680,"uuid":"904707116","full_name":"pig-mesh/office2md","owner":"pig-mesh","description":"[Required for large models] Office to Markdown service implementation, based on Microsoft Markitdown.","archived":false,"fork":false,"pushed_at":"2025-04-10T15:30:13.000Z","size":29158,"stargazers_count":32,"open_issues_count":1,"forks_count":5,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-19T12:40:18.617Z","etag":null,"topics":["markdown","markitdown","office"],"latest_commit_sha":null,"homepage":"https://ai.pig4cloud.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pig-mesh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-12-17T12:01:05.000Z","updated_at":"2025-09-26T19:46:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"aabbe0b9-ca2d-4384-8863-3eaa56e58139","html_url":"https://github.com/pig-mesh/office2md","commit_stats":null,"previous_names":["lltx/office2md","pig-mesh/office2md"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/pig-mesh/office2md","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pig-mesh%2Foffice2md","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pig-mesh%2Foffice2md/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pig-mesh%2Foffice2md/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pig-mesh%2Foffice2md/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pig-mesh","download_url":"https://codeload.github.com/pig-mesh/office2md/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pig-mesh%2Foffice2md/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29954967,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-28T22:53:01.873Z","status":"ssl_error","status_checked_at":"2026-02-28T22:52:50.699Z","response_time":90,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["markdown","markitdown","office"],"created_at":"2025-04-13T22:50:32.744Z","updated_at":"2026-02-28T23:32:14.230Z","avatar_url":"https://github.com/pig-mesh.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# office2md\n\n这是一项基于 Markdown 格式的多功能转换服务，支持将 PowerPoint、Word、Excel、图像、音频和 HTML 等文件转化为 Markdown 格式。同时，服务整合了 Gitee AI 和智谱 AI 提供的 GLM-4V 模型，以及阿里云百炼平台的 Qwen-VL-Max 模型，用于图片和 PDF 文件的高效文本识别。\n\n## Docker 使用说明\n\n### 1. 快速使用\n\n```bash\n# 内置了GLM-4V-FLASH视觉模型，仅供测试使用\ndocker run -p 8000:8000 registry.cn-hangzhou.aliyuncs.com/dockerhub_mirror/markitdown\n```\n\n### 2. 使用 Gitee AI\n\n```bash\ndocker run -d \\\n -p 8000:8000 \\\n -e API_KEY=gitee_ai_key \\\n -e MODEL=InternVL2_5-26B \\\n -e BASE_URL=https://ai.gitee.com/v1 \\\n registry.cn-hangzhou.aliyuncs.com/dockerhub_mirror/markitdown\n```\n\n### 3. 使用阿里云百炼平台\n\n```bash\ndocker run -d \\\n  -p 8000:8000 \\\n  -e API_KEY=your_aliyun_api_key \\\n  -e MODEL=qwen-vl-max \\\n  -e BASE_URL=https://dashscope.aliyuncs.com/api/v1 \\\n  registry.cn-hangzhou.aliyuncs.com/dockerhub_mirror/markitdown\n```\n\n## 环境变量说明\n\n服务支持以下环境变量配置：\n\n| 环境变量     | 说明                   | 默认值                                                 |\n| ------------ | ---------------------- | ------------------------------------------------------ |\n| API_KEY      | AI 平台的 API 密钥     | XXXX                                                   |\n| BASE_URL     | AI 平台的 API 基础 URL | https://open.bigmodel.cn/api/paas/v4                   |\n| MODEL        | 使用的模型名称         | glm-4v-flash                                           |\n| DELETE_DELAY | 临时文件删除延迟（秒） | 300                                                    |\n| PROMPT       | 文本提取提示词         | 提取图片中全部的文本，不需要任何推理和总结，只需要原文 |\n\n### 支持的模型配置\n\n#### 智谱 AI\n\n- MODEL=glm-4v-flash\n- BASE_URL=https://open.bigmodel.cn/api/paas/v4\n\n#### Gitee AI\n\n- MODEL=InternVL2_5-26B\n- BASE_URL=https://ai.gitee.com/v1\n\n#### 阿里云百炼\n\n- MODEL=qwen-vl-max\n- BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1\n\n## API 接口\n\n### 1. 上传图片并提取文本\n\n**Endpoint:** POST /upload/\n\n**请求格式:** multipart/form-data\n\n**参数:**\n\n- file: 图片文件\n\n**响应示例:**\n\n```json\n{\n  \"text\": \"提取的文本内容\"\n}\n```\n\n### 2. 文档图像矫正\n\n**Endpoint:** POST /uvdoc/unwarp\n\n**请求格式:** multipart/form-data\n\n**参数:**\n\n- file: 需要进行展平处理的文档图片文件\n\n**响应格式:** image/png\n\n**说明:**\n\n- 该接口用于处理弯曲变形的文档图片，返回展平后的图片\n- 支持常见图片格式（PNG、JPEG等）\n- 返回的是展平后的PNG格式图片数据\n\n**错误响应:**\n\n```json\n{\n  \"detail\": \"Error message\"\n}\n```\n\n## 源码运行\n\n```\ngit clone https://gitee.com/log4j/office2md.git\n\ncd office2md \n\npython3 -m venv venvdev\n\nsource venvdev/bin/activate\n\npip install -r requirements.txt\n\n# 启动服务\nuvicorn main:app --reload\n```\n\n## 注意事项\n\n1. 使用前请确保已获取相应平台的 API 密钥\n2. 智谱 AI 和阿里云百炼平台的接口略有不同，请确保使用正确的配置\n3. 上传的图片文件会在处理后自动删除（默认 5 分钟）\n4. 服务默认监听 8000 端口\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpig-mesh%2Foffice2md","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpig-mesh%2Foffice2md","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpig-mesh%2Foffice2md/lists"}