{"id":44021969,"url":"https://github.com/xt765/mcp-document-converter","last_synced_at":"2026-03-01T08:01:00.974Z","repository":{"id":335835826,"uuid":"1147168415","full_name":"xt765/mcp-document-converter","owner":"xt765","description":"MCP Document Converter - A powerful MCP tool for converting documents between multiple formats, enabling AI agents to easily transform documents.","archived":false,"fork":false,"pushed_at":"2026-03-01T05:57:39.000Z","size":880,"stargazers_count":9,"open_issues_count":0,"forks_count":3,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-01T06:50:37.542Z","etag":null,"topics":["agent","agentic-ai","agents","ai","ai-agents","ai-assistant","ai-tools","mcp","mcp-client","mcp-server","mcp-servers","tool","tools"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xt765.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-01T10:09:19.000Z","updated_at":"2026-03-01T05:33:11.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/xt765/mcp-document-converter","commit_stats":null,"previous_names":["xt765/mcp-document-converter"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/xt765/mcp-document-converter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xt765%2Fmcp-document-converter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xt765%2Fmcp-document-converter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xt765%2Fmcp-document-converter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xt765%2Fmcp-document-converter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xt765","download_url":"https://codeload.github.com/xt765/mcp-document-converter/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xt765%2Fmcp-document-converter/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29964203,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-01T06:55:38.174Z","status":"ssl_error","status_checked_at":"2026-03-01T06:53:04.810Z","response_time":124,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","agentic-ai","agents","ai","ai-agents","ai-assistant","ai-tools","mcp","mcp-client","mcp-server","mcp-servers","tool","tools"],"created_at":"2026-02-07T16:36:07.468Z","updated_at":"2026-03-01T08:01:00.963Z","avatar_url":"https://github.com/xt765.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eMCP Document Converter\u003c/h1\u003e\n\n\u003c!-- mcp-name: io.github.xt765/mcp-document-converter --\u003e\n\n\u003cp align=\"center\"\u003e\u003cstrong\u003eMCP (Model Context Protocol) Document Converter - A powerful MCP tool for converting documents between multiple formats, enabling AI agents to easily transform documents.\u003c/strong\u003e\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e🌐 \u003cstrong\u003eLanguage\u003c/strong\u003e: \u003ca href=\"README.md\"\u003eEnglish\u003c/a\u003e | \u003ca href=\"README.zh-CN.md\"\u003e中文\u003c/a\u003e\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://blog.csdn.net/Yunyi_Chi\"\u003e\u003cimg src=\"https://img.shields.io/badge/CSDN-玄同765-orange.svg?style=flat\u0026logo=csdn\" alt=\"CSDN\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/xt765/mcp-document-converter\"\u003e\u003cimg src=\"https://img.shields.io/badge/GitHub-mcp_document_converter-black.svg?style=flat\u0026logo=github\" alt=\"GitHub\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://gitee.com/xt765/mcp-document-converter\"\u003e\u003cimg src=\"https://img.shields.io/badge/Gitee-mcp_document_converter-red.svg?style=flat\u0026logo=gitee\" alt=\"Gitee\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-MIT-blue.svg?style=flat\u0026logo=opensourceinitiative\" alt=\"License\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://www.python.org/downloads/\"\u003e\u003cimg src=\"https://img.shields.io/badge/python-3.10+-blue.svg?style=flat\u0026logo=python\" alt=\"Python\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/mcp-document-converter/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/mcp-document-converter.svg?logo=pypi\" alt=\"PyPI Version\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://pepy.tech/project/mcp-document-converter\"\u003e\u003cimg src=\"https://img.shields.io/pepy/dt/mcp-document-converter.svg?logo=pypi\u0026label=PyPI%20Downloads\" alt=\"PyPI Downloads\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://registry.modelcontextprotocol.io/v0.1/servers?search=io.github.xt765/mcp-document-converter\"\u003e\u003cimg src=\"https://img.shields.io/badge/MCP-Registry-blue?logo=modelcontextprotocol\" alt=\"MCP Registry\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://mcp-marketplace.io/server/io-github-xt765-mcp-document-converter\"\u003e\u003cimg src=\"https://img.shields.io/badge/MCP-Marketplace-22c55e.svg?style=flat\u0026logo=shopify\u0026logoColor=white\" alt=\"MCP Marketplace\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n## Features\n\n- **Multi-format Support**: Supports 5 mainstream document formats: Markdown, HTML, DOCX, PDF, and Text\n- **Bidirectional Conversion**: Any format can be converted to any other format (5×5=25 conversion combinations)\n- **MCP Protocol**: Compliant with MCP standards, can be used as a tool for AI assistants like Trae IDE\n- **Plugin Architecture**: Easy to extend with new parsers and renderers\n- **Syntax Highlighting**: HTML and PDF outputs support code syntax highlighting\n- **Style Customization**: Support for custom CSS styles\n- **Metadata Preservation**: Preserves document title, author, creation time, and other metadata during conversion\n\n---\n\n## 📚 Documentation\n\n[User Guide](docs/en/USER_GUIDE.md) · [API Reference](docs/en/API.md) · [Contributing](docs/en/CONTRIBUTING.md) · [Changelog](docs/en/CHANGELOG.md) · [License](LICENSE)\n\n---\n\n## Architecture\n\n```mermaid\nflowchart TB\n    subgraph Parsers[\"Parsers\"]\n        MD[Markdown]\n        DOCX1[DOCX]\n        HTML1[HTML]\n        PDF1[PDF]\n        TXT1[Text]\n    end\n\n    subgraph IR[\"Intermediate Representation (IR)\"]\n        DT[Document Tree]\n        META[Metadata]\n        ASSETS[Assets]\n    end\n\n    subgraph Renderers[\"Renderers\"]\n        HTML2[HTML]\n        PDF2[PDF]\n        MD2[Markdown]\n        DOCX2[DOCX]\n        TXT2[Text]\n    end\n\n    MD --\u003e IR\n    DOCX1 --\u003e IR\n    HTML1 --\u003e IR\n    PDF1 --\u003e IR\n    TXT1 --\u003e IR\n    \n    IR --\u003e HTML2\n    IR --\u003e PDF2\n    IR --\u003e MD2\n    IR --\u003e DOCX2\n    IR --\u003e TXT2\n```\n\n### Core Components\n\n1. **DocumentIR (Intermediate Representation)**: Unified abstraction for all documents, containing document tree, metadata, assets, etc.\n2. **BaseParser (Parser Base Class)**: Defines the parser interface, parses various formats into DocumentIR\n3. **BaseRenderer (Renderer Base Class)**: Defines the renderer interface, renders DocumentIR into various formats\n4. **ConverterRegistry (Registry)**: Manages all parsers and renderers, provides format lookup and auto-matching\n5. **DocumentConverter (Conversion Engine)**: Coordinates parsers and renderers to complete document conversion\n\n## Supported Formats\n\n### Input Formats (Parsers)\n\n| Format | Extensions | MIME Type | Features |\n|--------|------------|-----------|----------|\n| Markdown | .md, .markdown, .mdown, .mkd | text/markdown | YAML Front Matter, GFM extensions |\n| HTML | .html, .htm | text/html | Semantic tag parsing |\n| DOCX | .docx | application/vnd.openxmlformats-officedocument.wordprocessingml.document | Styles, tables, images |\n| PDF | .pdf | application/pdf | Text extraction and structure recognition |\n| Text | .txt, .text | text/plain | Auto encoding detection and structure recognition |\n\n### Output Formats (Renderers)\n\n| Format | Extension | MIME Type | Features |\n|--------|-----------|-----------|----------|\n| HTML | .html | text/html | Beautiful styling, code highlighting, responsive design |\n| Markdown | .md | text/markdown | Standard Markdown format, YAML Front Matter |\n| DOCX | .docx | application/vnd.openxmlformats-officedocument.wordprocessingml.document | Word document format, style preservation |\n| PDF | .pdf | application/pdf | Generated with WeasyPrint, pagination support |\n| Text | .txt | text/plain | Plain text, basic formatting preserved |\n\n## Conversion Matrix\n\n```mermaid\nflowchart LR\n    subgraph Sources[\"Source Formats\"]\n        MD_S[Markdown]\n        HTML_S[HTML]\n        DOCX_S[DOCX]\n        PDF_S[PDF]\n        TXT_S[Text]\n    end\n\n    subgraph Targets[\"Target Formats\"]\n        MD_T[Markdown]\n        HTML_T[HTML]\n        DOCX_T[DOCX]\n        PDF_T[PDF]\n        TXT_T[Text]\n    end\n\n    MD_S --\u003e Targets\n    HTML_S --\u003e Targets\n    DOCX_S --\u003e Targets\n    PDF_S --\u003e Targets\n    TXT_S --\u003e Targets\n```\n\n## Installation\n\n### Using pip (Recommended)\n\n```bash\npip install mcp-document-converter\n```\n\n### From Source\n\n```bash\ngit clone https://github.com/xt765/mcp-document-converter.git\ncd mcp-document-converter\npip install -e .\n```\n\n## MCP Tools\n\nThis server provides the following tools:\n\n### `convert_document`\nConvert a document from one format to another.\n\n**Arguments:**\n- `source_path` (string, required): Path to the source document.\n- `target_format` (string, required): Target format (`html`, `pdf`, `markdown`, `docx`, `text`).\n- `output_path` (string, optional): Path for the output file.\n- `source_format` (string, optional): Format of the source file (auto-detected if not provided).\n- `options` (object, optional): Additional options like `template`, `css`, and `preserve_metadata`.\n\n## Configuration\n\n### Using in Trae IDE / Claude Desktop\n\nAdd the following to your MCP configuration file:\n\n**Option 1: Using PyPI (Recommended)**\n\n```json\n{\n  \"mcpServers\": {\n    \"mcp-document-converter\": {\n      \"command\": \"uvx\",\n      \"args\": [\n        \"mcp-document-converter\"\n      ]\n    }\n  }\n}\n```\n\n**Option 2: Using GitHub repository**\n\n```json\n{\n  \"mcpServers\": {\n    \"mcp-document-converter\": {\n      \"command\": \"uvx\",\n      \"args\": [\n        \"--from\",\n        \"git+https://github.com/xt765/mcp-document-converter\",\n        \"mcp-document-converter\"\n      ]\n    }\n  }\n}\n```\n\n**Option 3: Using Gitee repository (Faster access in China)**\n\n```json\n{\n  \"mcpServers\": {\n    \"mcp-document-converter\": {\n      \"command\": \"uvx\",\n      \"args\": [\n        \"--from\",\n        \"git+https://gitee.com/xt765/mcp-document-converter\",\n        \"mcp-document-converter\"\n      ]\n    }\n  }\n}\n```\n\n**Option 4: Using pip (Manual installation)**\n\nFirst install the package:\n```bash\npip install mcp-document-converter\n```\n\nThen add to configuration:\n```json\n{\n  \"mcpServers\": {\n    \"mcp-document-converter\": {\n      \"command\": \"mcp-document-converter\",\n      \"args\": []\n    }\n  }\n}\n```\n\n### Using in Cherry Studio\n\n*Cherry Studio is a powerful open-source desktop AI client assistant that supports integrating various tools through the MCP protocol*\n\n**Configuration Example:**\n\n![Cherry Studio Configuration](docs/images/1770102311686.png)\n\n**Usage Example:**\n\n![Cherry Studio Usage](docs/images/1770102446855.png)\n\n## Usage\n\n### As an MCP Tool\n\nAfter configuration, AI assistants can directly call the following tools:\n\n#### 1. convert_document (Recommended)\n\nUse a unified interface to convert any supported document type.\n\n```python\n# Markdown to HTML\nconvert_document(\n    source_path=\"document.md\",\n    target_format=\"html\"\n)\n\n# HTML to PDF\nconvert_document(\n    source_path=\"document.html\",\n    target_format=\"pdf\"\n)\n\n# DOCX to Markdown\nconvert_document(\n    source_path=\"document.docx\",\n    target_format=\"markdown\"\n)\n\n# Conversion with options\nconvert_document(\n    source_path=\"document.md\",\n    target_format=\"html\",\n    output_path=\"output.html\",\n    options={\n        \"css\": \"custom.css\",\n        \"preserve_metadata\": True\n    }\n)\n```\n\n#### 2. list_supported_formats\n\nList all supported document formats.\n\n```python\nlist_supported_formats()\n```\n\n#### 3. get_conversion_matrix\n\nGet the complete format conversion matrix.\n\n```python\nget_conversion_matrix()\n```\n\n#### 4. can_convert\n\nCheck if conversion from source format to target format is supported.\n\n```python\ncan_convert(source_format=\"markdown\", target_format=\"pdf\")\n```\n\n#### 5. get_format_info\n\nGet detailed information about a specific format.\n\n```python\nget_format_info(format=\"markdown\")\n```\n\n### As a Python Library\n\n```python\nfrom mcp_document_converter import DocumentConverter\nfrom mcp_document_converter.registry import get_registry\nfrom mcp_document_converter.parsers import MarkdownParser, HTMLParser\nfrom mcp_document_converter.renderers import HTMLRenderer, PDFRenderer\n\n# Register parsers and renderers\nregistry = get_registry()\nregistry.register_parser(MarkdownParser())\nregistry.register_parser(HTMLParser())\nregistry.register_renderer(HTMLRenderer())\nregistry.register_renderer(PDFRenderer())\n\n# Create converter\nconverter = DocumentConverter(registry)\n\n# Convert document\nresult = converter.convert(\n    source=\"input.md\",\n    target_format=\"html\",\n    output_path=\"output.html\"\n)\n\nif result.success:\n    print(f\"✅ Conversion successful: {result.output_path}\")\nelse:\n    print(f\"❌ Conversion failed: {result.error_message}\")\n```\n\n## Tool Interface Details\n\n### convert_document\n\nConvert a document from one format to another.\n\n**Parameters:**\n\n| Parameter | Type | Required | Description |\n|-----------|------|----------|-------------|\n| `source_path` | string | ✅ | Source file path, supports absolute or relative paths |\n| `target_format` | string | ✅ | Target format: `html`, `pdf`, `markdown`, `docx`, `text` |\n| `output_path` | string | ❌ | Output file path (optional, defaults to source filename) |\n| `source_format` | string | ❌ | Source format (optional, auto-detected from file extension) |\n| `options` | object | ❌ | Conversion options |\n\n**Options:**\n\n| Option | Type | Default | Description |\n|--------|------|---------|-------------|\n| `template` | string | - | Template name |\n| `css` | string | - | Custom CSS styles |\n| `preserve_metadata` | boolean | true | Whether to preserve metadata |\n| `extract_images` | boolean | true | Whether to extract images |\n\n**Example:**\n\n```json\n{\n  \"source_path\": \"/path/to/document.md\",\n  \"target_format\": \"html\",\n  \"output_path\": \"/path/to/output.html\",\n  \"options\": {\n    \"css\": \"body { font-family: Arial; }\",\n    \"preserve_metadata\": true\n  }\n}\n```\n\n## Extension Development\n\n### Adding a New Parser\n\n```python\nfrom typing import List, Union\nfrom pathlib import Path\nfrom mcp_document_converter.core.parser import BaseParser\nfrom mcp_document_converter.core.ir import DocumentIR, Node, NodeType\n\nclass MyParser(BaseParser):\n    @property\n    def supported_extensions(self) -\u003e List[str]:\n        return [\".myext\"]\n    \n    @property\n    def format_name(self) -\u003e str:\n        return \"myformat\"\n    \n    @property\n    def mime_types(self) -\u003e List[str]:\n        return [\"application/x-myformat\"]\n    \n    def parse(self, source: Union[str, Path, bytes], **options) -\u003e DocumentIR:\n        # Read source file\n        content = self._read_source(source)\n        \n        # Parse into DocumentIR\n        document = DocumentIR()\n        document.title = \"My Document\"\n        \n        # Add content nodes\n        document.add_node(Node(\n            type=NodeType.PARAGRAPH,\n            content=[Node(type=NodeType.TEXT, content=\"Hello World\")]\n        ))\n        \n        return document\n```\n\n### Adding a New Renderer\n\n```python\nfrom typing import Any\nfrom mcp_document_converter.core.renderer import BaseRenderer\nfrom mcp_document_converter.core.ir import DocumentIR\n\nclass MyRenderer(BaseRenderer):\n    @property\n    def output_extension(self) -\u003e str:\n        return \".myext\"\n    \n    @property\n    def format_name(self) -\u003e str:\n        return \"myformat\"\n    \n    @property\n    def mime_type(self) -\u003e str:\n        return \"application/x-myformat\"\n    \n    def render(self, document: DocumentIR, **options: Any) -\u003e str:\n        # Render DocumentIR to target format\n        parts = []\n        \n        if document.title:\n            parts.append(f\"# {document.title}\")\n        \n        for node in document.content:\n            # Render each node\n            pass\n        \n        return \"\\n\".join(parts)\n```\n\n### Registering Extensions\n\n```python\nfrom mcp_document_converter.registry import get_registry\n\n# Register new parser and renderer\nregistry = get_registry()\nregistry.register_parser(MyParser())\nregistry.register_renderer(MyRenderer())\n```\n\n## Testing\n\n```bash\n# Run all tests\npython tests/test_conversion.py\n\n# Run specific test\npython tests/test_conversion.py::test_markdown_to_html\n```\n\n## Environment Variables\n\n| Variable | Description | Default |\n|----------|-------------|---------|\n| `MCP_CONVERTER_LOG_LEVEL` | Log level | `INFO` |\n| `MCP_CONVERTER_TEMP_DIR` | Temporary files directory | System temp directory |\n\n## Dependencies\n\n### Core Dependencies\n- `mcp` \u003e= 1.0.0 - MCP protocol implementation\n- `pydantic` \u003e= 2.0.0 - Data validation\n\n### Parser Dependencies\n- `markdown` \u003e= 3.5.0 - Markdown parsing\n- `beautifulsoup4` \u003e= 4.12.0 - HTML parsing\n- `python-docx` \u003e= 1.1.0 - DOCX parsing\n- `PyPDF2` \u003e= 3.0.0 - PDF parsing\n- `chardet` \u003e= 5.0.0 - Encoding detection\n- `pyyaml` \u003e= 6.0.0 - YAML parsing\n\n### Renderer Dependencies\n- `weasyprint` \u003e= 60.0 - PDF rendering\n- `pygments` \u003e= 2.17.0 - Code highlighting\n- `jinja2` \u003e= 3.1.0 - Template engine\n\n### Development Dependencies\n- `pytest` \u003e= 7.0.0 - Testing framework\n- `pytest-asyncio` \u003e= 0.21.0 - Async testing support\n- `pytest-cov` \u003e= 4.0.0 - Coverage reporting\n- `basedpyright` \u003e= 1.0.0 - Type checking\n- `ruff` \u003e= 0.1.0 - Linting and formatting\n\n## License\n\nMIT License\n\n## Contributing\n\nIssues and Pull Requests are welcome!\n\n## Related Projects\n\n- [MCP Document Reader](https://github.com/xt765/mcp_documents_reader) - MCP document reader supporting multiple document formats\n- [Model Context Protocol](https://modelcontextprotocol.io/) - Official Model Context Protocol documentation\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxt765%2Fmcp-document-converter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxt765%2Fmcp-document-converter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxt765%2Fmcp-document-converter/lists"}