{"id":50451366,"url":"https://github.com/xusenlin/document-mcp","last_synced_at":"2026-06-01T00:03:12.980Z","repository":{"id":358396298,"uuid":"1239353570","full_name":"xusenlin/document-mcp","owner":"xusenlin","description":"集成markitdown、LibreOffice、pandoc到mcp,让AI能轻松转换任意文档(md、docx、pdf、html、ppt、xlsx...)","archived":false,"fork":false,"pushed_at":"2026-05-17T07:51:22.000Z","size":28,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-17T08:34:12.416Z","etag":null,"topics":["doc2md","doc2pdf","mcp","md2pdf","pdf2html","pdf2markdown"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xusenlin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-15T02:31:45.000Z","updated_at":"2026-05-17T07:51:26.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/xusenlin/document-mcp","commit_stats":null,"previous_names":["xusenlin/document-mcp"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/xusenlin/document-mcp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xusenlin%2Fdocument-mcp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xusenlin%2Fdocument-mcp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xusenlin%2Fdocument-mcp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xusenlin%2Fdocument-mcp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xusenlin","download_url":"https://codeload.github.com/xusenlin/document-mcp/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xusenlin%2Fdocument-mcp/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33753931,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["doc2md","doc2pdf","mcp","md2pdf","pdf2html","pdf2markdown"],"created_at":"2026-06-01T00:03:12.168Z","updated_at":"2026-06-01T00:03:12.963Z","avatar_url":"https://github.com/xusenlin.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# document-mcp\n\n基于 Go + MCP Go SDK 的文档转换服务。支持 **MCP** 和 **CLI + Skill** 两种模式，容器内集成 pandoc、LibreOffice、markitdown、weasyprint、headless-shell 完成任意文档格式互转。\n\n---\n\n## 两种使用方式\n\n### 模式一：MCP（标准协议，通用）\n\n通过 MCP 协议对接任意支持 MCP 的 AI 工具（OpenCode、Claude Desktop、Cursor、Windsurf 等）。\n\n启动容器：\n```bash\ndocker run -d -p 8080:8080 -v /your/docs:/data ghcr.io/xusenlin/document-mcp:v1.3.1\n```\n\nMCP 配置（JSON-RPC over HTTP，Streamable HTTP 传输）：\n```json\n{\n  \"mcpServers\": {\n    \"document-mcp\": {\n      \"url\": \"http://localhost:8080/mcp\"\n    }\n  }\n}\n```\n\n不同工具配置位置：\n- **OpenCode** — `opencode.json` 中 `mcp` 字段\n- **Claude Desktop** — `claude_desktop_config.json` 中 `mcpServers` 字段\n- **Cursor** — `.cursor/mcp.json`\n- **Windsurf** — `mcp.json`\n\n### 模式二：CLI + Skill（通用 AI 工具）\n\n无需 MCP 配置，AI 通过 Skill 学习 `docker run` 命令直接调容器 CLI。本地零依赖。\n\n```bash\n# 路径规则：源文件父目录挂载到 /data，文件名不变\n# /Users/xx/project/report.md  →  -v /Users/xx/project:/data  →  /data/report.md\n\ndocker run --rm -v /your/project:/data \\\n  ghcr.io/xusenlin/document-mcp:v1.3.1 \\\n  cli pdf /data/report.md\n```\n\nSkill 文件位于 `skills/document-convert/SKILL.md`，复制到对应工具的 skills 目录即可：\n- Claude Desktop — `~/.claude/skills/document-convert/SKILL.md`\n- OpenCode — `.opencode/skills/document-convert/SKILL.md`\n- 通用 — `.agents/skills/document-convert/SKILL.md`\n\n---\n\n## 工具列表\n\n### 单文件转换\n\n| MCP Tool | CLI 命令 | 效果 |\n|----------|----------|------|\n| `convert_to_pdf` | `docker run --rm -v \u003cdir\u003e:/data \u003cimage\u003e cli pdf /data/file.xxx` | 任意格式 → PDF |\n| `convert_to_docx` | `docker run --rm -v \u003cdir\u003e:/data \u003cimage\u003e cli docx /data/file.xxx` | 任意格式 → Word |\n| `convert_to_html` | `docker run --rm -v \u003cdir\u003e:/data \u003cimage\u003e cli html /data/file.xxx` | 任意格式 → HTML |\n| `convert_to_markdown` | `docker run --rm -v \u003cdir\u003e:/data \u003cimage\u003e cli markdown /data/file.xxx [return_content]` | 任意格式 → Markdown |\n\n### PDF 操作\n\n| MCP Tool | CLI 命令 | 效果 |\n|----------|----------|------|\n| `merge_pdf` | `docker run --rm -v \u003cdir\u003e:/data \u003cimage\u003e cli merge /data/a.pdf /data/b.pdf` | 合并多个 PDF |\n| `split_pdf` | `docker run --rm -v \u003cdir\u003e:/data \u003cimage\u003e cli split /data/doc.pdf [页码范围]` | 拆分 PDF |\n\n\u003e `\u003cimage\u003e` = `ghcr.io/xusenlin/document-mcp:v1.3.1`，后续命令以此为镜像。\n\n---\n\n## 转换引擎\n\n### → PDF\n\n| 源格式 | 引擎 | 链路 | 主题 |\n|--------|------|------|:---:|\n| `.html` `.htm` | headless-shell（amd64）/ pandoc + weasyprint（arm64） | `src → pdf` | ❌ |\n| `.docx` `.pptx` `.xlsx` `.odt` | LibreOffice | `src → pdf` | ❌ |\n| `.md` `.latex` `.tex` `.rst` `.org` `.txt` `.epub` | pandoc + weasyprint | `src → pdf` | ✅ |\n| `.pdf` | none | 同格式跳过 | — |\n\n### → Markdown\n\n| 源格式 | 引擎 | 链路 |\n|--------|------|------|\n| `.docx` `.pptx` `.xlsx` `.pdf` | markitdown | `src → md` |\n| `.html` `.latex` `.tex` `.epub` `.odt` `.rst` `.org` `.txt` | pandoc | `src → md` |\n| `.md` | none | 同格式跳过 |\n\n### → Word\n\n| 源格式 | 引擎 | 链路 |\n|--------|------|------|\n| `.md` `.html` `.latex` `.tex` `.odt` `.epub` `.rst` `.org` `.txt` | pandoc | `src → docx` |\n| `.pptx` `.xlsx` `.pdf` | markitdown + pandoc | `src → md → docx` |\n| `.docx` | none | 同格式跳过 |\n\n### → HTML\n\n| 源格式 | 引擎 | 链路 |\n|--------|------|------|\n| `.md` `.latex` `.tex` `.docx` `.odt` `.epub` `.rst` `.org` `.txt` | pandoc | `src → html` |\n| `.pptx` `.xlsx` `.pdf` | markitdown + pandoc | `src → md → html` |\n| `.html` | none | 同格式跳过 |\n\n---\n\n## 主题\n\nPDF 输出支持 2 套内置 CSS 主题，通过 `theme` 参数选择：\n\n| 参数值 | 主题 | 风格 | 适用场景 |\n|--------|------|------|----------|\n| `default`（默认） | [themes/default.css](themes/default.css) | GitHub 风格，无衬线，表格带边框 | 技术文档 |\n| `paper` | [themes/paper.css](themes/paper.css) | 学术报告，衬线，首行缩进，自动编号 | 论文/报告 |\n\n**生效范围：** 仅对 md / latex / tex / rst / org / txt / epub → pdf 生效，html→pdf 和 Office→pdf 不受主题控制。\n\nMCP 调用示例：\n```json\n{ \"tool\": \"convert_to_pdf\", \"arguments\": { \"source_path\": \"/data/doc.md\", \"theme\": \"paper\" } }\n```\n\nCLI 调用示例：\n```bash\ndocker run --rm -v /path:/data ghcr.io/xusenlin/document-mcp:v1.3.1 cli pdf /data/doc.md --theme=paper\n```\n\n---\n\n## 输出规则\n\n- 输出文件自动生成在**源文件同目录**，文件名相同，仅扩展名变化\n- `merge_pdf` 输出固定命名为 `merged.pdf`\n- `split_pdf` 输出命名为 `{名}_page_N.pdf` 或 `{名}_range_N.pdf`\n- 目标文件已存在时报错，保护已有文件\n- 同格式转换直接跳过，返回源路径\n\n---\n\n## 构建\n\n```bash\nmake build          # 本地编译\nmake docker-dev     # 本地构建并启动测试容器\nmake docker-push    # 构建多架构镜像并推送\nmake run            # 本地运行 HTTP 模式\n```\n\n## 容器内工具\n\n| 工具 | 版本 |\n|------|------|\n| pandoc | latest (debian bookworm) |\n| libreoffice-writer | latest (debian bookworm) |\n| markitdown | 0.1.5 (with docx/pdf/pptx extras) |\n| weasyprint | latest (pip) |\n| headless-shell | 138.0.7204.183 (Chromium) |\n| pdfunite / pdfseparate | poppler-utils |\n\n## 开发\n\n```bash\ngo run ./cmd/cli/                        # 本地运行 HTTP 模式\nMCP_ADDR=:9090 go run ./cmd/cli/         # 指定端口\ngo run ./cmd/cli/ pdf /data/doc.md       # CLI 模式测试\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxusenlin%2Fdocument-mcp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxusenlin%2Fdocument-mcp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxusenlin%2Fdocument-mcp/lists"}