{"id":31327976,"url":"https://github.com/trsdn/markitdown-mcp","last_synced_at":"2025-09-25T23:40:01.309Z","repository":{"id":313195466,"uuid":"1050410446","full_name":"trsdn/markitdown-mcp","owner":"trsdn","description":"📄 Professional MCP server for converting 29+ file formats to Markdown - Perfect for Claude Desktop and AI workflows!","archived":false,"fork":false,"pushed_at":"2025-09-16T21:49:39.000Z","size":1567,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-16T23:55:46.975Z","etag":null,"topics":["ai-tools","claude-desktop","document-conversion","file-conversion","image-processing","markdown","markitdown","mcp","metadata-extraction","model-context-protocol","office-documents","pdf-converter","python","speech-to-text"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/trsdn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-04T11:50:46.000Z","updated_at":"2025-09-16T21:49:42.000Z","dependencies_parsed_at":"2025-09-04T14:19:50.553Z","dependency_job_id":"893ab1f1-0b1f-40e7-bd7f-2c1d4b4b05e8","html_url":"https://github.com/trsdn/markitdown-mcp","commit_stats":null,"previous_names":["trsdn/markitdown-mcp"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/trsdn/markitdown-mcp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trsdn%2Fmarkitdown-mcp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trsdn%2Fmarkitdown-mcp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trsdn%2Fmarkitdown-mcp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trsdn%2Fmarkitdown-mcp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/trsdn","download_url":"https://codeload.github.com/trsdn/markitdown-mcp/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trsdn%2Fmarkitdown-mcp/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276997358,"owners_count":25742423,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-25T02:00:09.612Z","response_time":80,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-tools","claude-desktop","document-conversion","file-conversion","image-processing","markdown","markitdown","mcp","metadata-extraction","model-context-protocol","office-documents","pdf-converter","python","speech-to-text"],"created_at":"2025-09-25T23:39:59.576Z","updated_at":"2025-09-25T23:40:01.301Z","avatar_url":"https://github.com/trsdn.png","language":"Python","readme":"# 📄 MarkItDown MCP Server\n\n[![MCP](https://img.shields.io/badge/Model_Context_Protocol-MCP-blue)](https://modelcontextprotocol.io)\n[![Python](https://img.shields.io/badge/python-3.10+-blue.svg)](https://python.org)\n[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)\n[![CI](https://github.com/trsdn/markitdown-mcp/workflows/CI/badge.svg)](https://github.com/trsdn/markitdown-mcp/actions)\n[![Contributions Welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](CONTRIBUTING.md)\n\nA powerful **Model Context Protocol (MCP) server** that converts 29+ file formats to clean, structured Markdown using Microsoft's MarkItDown library.\n\n🔥 **Perfect for Claude Desktop, MCP clients, and AI workflows!** \n\n## ✨ Features\n\n- 🔌 **MCP Protocol**: Seamless integration with Claude Desktop and MCP clients\n- 📁 **29+ File Formats**: PDFs, Office docs, images, audio, archives, and more\n- 🔍 **Image Metadata**: Extract EXIF metadata from images (JPG, PNG, GIF, etc.)\n- 🎵 **Speech Recognition**: Convert audio to text with speech transcription (MP3, WAV)*\n\n*_Requires `markitdown[all]` installation for full functionality_\n\n### 📦 Dependency Requirements by File Type\n\n| File Type | Required Dependencies | Install Command |\n|-----------|----------------------|-----------------|\n| **PDF** | `pypdf`, `pymupdf`, `pdfplumber` | `pipx inject markitdown-mcp 'markitdown[all]'` |\n| **Excel (.xlsx, .xls)** | `openpyxl`, `xlrd`, `pandas` | `pipx inject markitdown-mcp openpyxl xlrd pandas` |\n| **PowerPoint (.pptx)** | `python-pptx` | Included in base install |\n| **Images** | `PIL`, `exiftool` (optional) | Included in base install |\n| **Audio** | `pydub`, `speech_recognition` | `pipx inject markitdown-mcp 'markitdown[all]'` |\n| **Basic formats** | None | Base install only |\n\n**Note**: For the best experience, we recommend installing all dependencies using the **Complete Install** method below.\n- 📊 **Office Documents**: Word, PowerPoint, Excel files\n- 🌐 **Web Content**: HTML, XML, JSON, CSV\n- 📚 **E-books \u0026 Archives**: EPUB, ZIP files\n- ⚡ **Fast \u0026 Reliable**: Built on Microsoft's MarkItDown library\n\n## 🚀 Quick Start for Claude Desktop\n\n1. **Install the server with ALL features:**\n   ```bash\n   # One command to install everything\n   pipx install git+https://github.com/trsdn/markitdown-mcp.git \u0026\u0026 \\\n   pipx inject markitdown-mcp 'markitdown[all]' openpyxl xlrd pandas pymupdf pdfplumber\n   ```\n\n2. **Add to your Claude Desktop config:**\n   ```json\n   {\n     \"mcpServers\": {\n       \"markitdown\": {\n         \"command\": \"markitdown-mcp\",\n         \"args\": []\n       }\n     }\n   }\n   ```\n\n3. **Restart Claude Desktop** and start converting files!\n\n## Features\n\n- Convert multiple file formats to Markdown\n- Batch processing of entire directories\n- Preserves directory structure in output\n- Environment variable support via .env file\n\n## 📋 Available MCP Tools\n\n### 🔧 `convert_file`\nConvert a single file to Markdown.\n```json\n{\n  \"name\": \"convert_file\",\n  \"arguments\": {\n    \"file_path\": \"/path/to/document.pdf\"\n  }\n}\n```\n\n### 📋 `list_supported_formats`\nGet a complete list of supported file formats.\n```json\n{\n  \"name\": \"list_supported_formats\",\n  \"arguments\": {}\n}\n```\n\n### 📁 `convert_directory`\nConvert all supported files in a directory.\n```json\n{\n  \"name\": \"convert_directory\", \n  \"arguments\": {\n    \"input_directory\": \"/path/to/files\",\n    \"output_directory\": \"/path/to/markdown\" \n  }\n}\n```\n\n## 📄 Supported File Formats (29+)\n\n| Category | Extensions | Features |\n|----------|------------|----------|\n| **📊 Office** | `.pdf`, `.docx`, `.pptx`, `.xlsx`, `.xls` | Full document structure |\n| **🖼️ Images** | `.jpg`, `.png`, `.gif`, `.bmp`, `.tiff`, `.webp` | EXIF metadata extraction |\n| **🎵 Audio** | `.mp3`, `.wav` | Speech-to-text transcription |\n| **🌐 Web** | `.html`, `.htm`, `.xml`, `.json`, `.csv` | Clean formatting |\n| **📚 Books** | `.epub` | Chapter extraction |\n| **📦 Archives** | `.zip` | Auto-extract and process |\n| **📝 Text** | `.txt`, `.md`, `.rst` | Direct conversion |\n\n## Installation\n\n### Option 1: Pip Install (Recommended)\n\n```bash\n# Install from local directory\npip install -e /Users/torstenmahr/GitHub/markitdown-mcp\n\n# Or navigate to the directory first\ncd /Users/torstenmahr/GitHub/markitdown-mcp\npip install -e .\n```\n\n### Option 2: Direct Usage\n\n```bash\ncd /Users/torstenmahr/GitHub/markitdown-mcp\nsource venv/bin/activate\npip install -r requirements.txt\n```\n\n## Quick Start\n\n### MCP Server Mode (Recommended)\n\nAfter pip installation:\n```bash\n# Start the MCP server (for use with MCP clients)\nmarkitdown-mcp\n```\n\nOr using the development script:\n```bash\npython run_server.py\n```\n\n## 🛠️ Installation Options\n\n### 🚀 One-Command Install (Recommended)\nInstall with ALL dependencies in one command:\n```bash\n# Using pipx (recommended)\npipx install git+https://github.com/trsdn/markitdown-mcp.git \u0026\u0026 \\\npipx inject markitdown-mcp 'markitdown[all]' openpyxl xlrd pandas pymupdf pdfplumber pytesseract pydub speechrecognition\n\n# Or download and run the install script\ncurl -sSL https://raw.githubusercontent.com/trsdn/markitdown-mcp/main/scripts/install-all-deps.sh | bash\n```\n\n### Quick Install (Basic Features Only)\n```bash\npip install -e git+https://github.com/trsdn/markitdown-mcp.git\n```\n\n### Complete Install with All Dependencies (Step by Step)\n\nTo ensure all file formats are supported, use one of these methods:\n\n#### Method 1: Using pipx (Recommended)\n```bash\n# Install the MCP server\npipx install git+https://github.com/trsdn/markitdown-mcp.git\n\n# Install all required dependencies for full functionality\npipx inject markitdown-mcp 'markitdown[all]'         # PDF, OCR, Speech\npipx inject markitdown-mcp openpyxl xlrd pandas      # Excel support\npipx inject markitdown-mcp pymupdf pdfplumber        # Advanced PDF\n```\n\n#### Method 2: Using pip with virtual environment\n```bash\n# Create and activate virtual environment\npython -m venv markitdown-env\nsource markitdown-env/bin/activate  # On Windows: markitdown-env\\Scripts\\activate\n\n# Install with all dependencies in one command\ngit clone https://github.com/trsdn/markitdown-mcp.git\ncd markitdown-mcp\npip install -e \".[all]\"  # This installs everything!\n```\n\n#### Method 3: For Claude Desktop with existing installation\nIf you already have the MCP server installed but some formats aren't working:\n```bash\n# Find your installation\nwhich markitdown-mcp  # Shows path like /Users/you/.local/bin/markitdown-mcp\n\n# Inject missing dependencies\npipx inject markitdown-mcp 'markitdown[all]' openpyxl xlrd pandas pymupdf pdfplumber\n```\n\n### Verify Installation\nAfter installation, verify all dependencies are properly installed:\n```bash\n# Test the MCP server\nmarkitdown-mcp --help\n\n# For pipx installations, check injected packages\npipx list --include-injected\n```\n\n## 🔧 Claude Desktop Configuration\n\nAdd this to your Claude Desktop `claude_desktop_config.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"markitdown\": {\n      \"command\": \"markitdown-mcp\",\n      \"args\": []\n    }\n  }\n}\n```\n\n**Config file locations:**\n- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`\n- **Windows**: `%APPDATA%\\Claude\\claude_desktop_config.json`\n\n## 💡 Usage Examples\n\n### Convert a PDF\n```\nConvert the file ~/Documents/report.pdf to markdown\n```\n\n### Batch Process Directory\n```\nConvert all files in ~/Downloads/documents/ to markdown\n```\n\n### Check Supported Formats\n```\nWhat file formats can you convert to markdown?\n```\n\n## 🔍 Troubleshooting\n\n### Missing Dependencies Errors\nIf you see errors like:\n- `PdfConverter threw MissingDependencyException`\n- `XlsxConverter threw MissingDependencyException`\n- `PptxConverter threw BadZipFile`\n\nThis means some optional dependencies are missing. Follow the **Complete Install** instructions above.\n\n### Unicode Errors with .md Files\nSome Markdown files with special characters may fail with `UnicodeDecodeError`. This is a known limitation in the MarkItDown library.\n\n### Installation Issues\n- **\"externally-managed-environment\" error**: Use pipx instead of pip\n- **Permission denied**: Never use sudo with pip; use pipx or virtual environments\n- **Command not found**: Make sure `~/.local/bin` is in your PATH\n\nSee [KNOWN_ISSUES.md](KNOWN_ISSUES.md) for more details.\n\n## Configuration\n\nNo special configuration required. The tool uses the MarkItDown library for document conversion.\n\n## Usage\n\n### Basic Usage\n\n```bash\n# Convert all supported files from input/ to output/\npython mdconvert.py\n```\n\n### Custom Directories\n\nSpecify custom input and output directories:\n```bash\npython mdconvert.py --input /path/to/docs --output /path/to/markdown\n```\n\n### Single File Conversion\n\nConvert a single file:\n```bash\npython mdconvert.py --file document.pdf\n```\n\n## Command Line Options\n\n- `--input, -i`: Input directory (default: `input`)\n- `--output, -o`: Output directory (default: `output`)\n- `--file, -f`: Convert a single file instead of a directory\n\n## MCP Server Features\n\nThe MCP server provides three tools:\n\n### 1. convert_file\nConvert a single file to Markdown.\n- **Input**: File path or base64 encoded content with filename\n- **Output**: Converted Markdown content\n\n### 2. list_supported_formats\nList all supported file formats.\n- **Output**: Categorized list of supported file extensions\n\n### 3. convert_directory\nConvert all supported files in a directory.\n- **Input**: Input directory path, optional output directory\n- **Output**: Summary of conversion results\n\n## Directory Structure\n\n```\nmarkitdown-mcp/\n├── mcp_server.py        # MCP protocol server\n├── mdconvert.py         # CLI script\n├── run_server.py        # Server runner script\n├── mcp_config.json      # MCP configuration\n├── requirements.txt     # Python dependencies\n├── README.md           # This file\n├── input/              # Default input directory\n├── output/             # Default output directory\n└── venv/               # Virtual environment\n```\n\n## 🔍 How It Works\n\nThis MCP server leverages Microsoft's MarkItDown library to provide intelligent document conversion:\n\n- **📄 PDFs**: Extracts text, tables, and structure\n- **🖼️ Images**: Uses OCR to extract text content + EXIF metadata  \n- **🎵 Audio**: Converts speech to text transcription (MP3, WAV)\n- **📊 Office**: Preserves formatting from Word, Excel, PowerPoint\n- **🌐 HTML**: Converts to clean, readable Markdown\n- **📦 Archives**: Automatically extracts and processes contents\n\n## 🏷️ Tags\n\n`mcp` `model-context-protocol` `claude-desktop` `markdown` `document-conversion` `pdf` `ocr` `speech-to-text` `markitdown` `ai-tools`\n\n## 📋 Requirements\n\n- **Python**: 3.10+\n- **MCP Client**: Claude Desktop or compatible MCP client\n- **Dependencies**: Automatically installed via pip\n\n## 🤝 Contributing\n\nWe welcome contributions! Here's how you can help:\n\n### 🚀 Quick Start for Contributors\n```bash\n# Fork and clone the repository\ngit clone https://github.com/YOUR_USERNAME/markitdown-mcp.git\ncd markitdown-mcp\n\n# Set up development environment\npython -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\npip install -e \".[dev]\"\n\n# Test your changes\nmarkitdown-mcp  # Test the server works\n```\n\n### 📝 Ways to Contribute\n- 🐛 **Bug Reports**: Found an issue? [Report it](https://github.com/trsdn/markitdown-mcp/issues/new?template=bug_report.yml)\n- 💡 **Feature Requests**: Have an idea? [Suggest it](https://github.com/trsdn/markitdown-mcp/issues/new?template=feature_request.yml)  \n- 📄 **New File Formats**: Add support for more file types\n- 📚 **Documentation**: Improve guides and examples\n- 🧪 **Testing**: Add tests and improve reliability\n- 🎨 **Code Quality**: Refactor and optimize\n\n### 📋 Contribution Process\n1. Read our [Contributing Guide](docs/development/CONTRIBUTING.md)\n2. Check [existing issues](https://github.com/trsdn/markitdown-mcp/issues)\n3. Fork the repository\n4. Create a feature branch (`feat/amazing-feature`)\n5. Make your changes with tests\n6. Submit a pull request\n\n**Please read [docs/development/CONTRIBUTING.md](docs/development/CONTRIBUTING.md) for detailed guidelines.**\n\n## 📚 Documentation\n\n### For Users\n- **[Examples](examples/)** - MCP client configuration examples\n- **[Known Issues](docs/guides/KNOWN_ISSUES.md)** - Common problems and solutions\n- **[Changelog](CHANGELOG.md)** - Version history and updates\n\n### For AI Agents\n- **[AGENTS.md](AGENTS.md)** - Comprehensive guide for AI agent integration\n- **[API Documentation](docs/api/)** - Technical specifications and tool details\n\n### For Developers\n- **[Contributing Guide](docs/development/CONTRIBUTING.md)** - How to contribute\n- **[Testing Strategy](docs/development/TESTING_STRATEGY.md)** - Testing approach and guidelines\n- **[Documentation](docs/)** - Complete documentation index\n\n## 📄 License\n\nMIT License - see LICENSE file for details.\n\n## 🔗 Related\n\n- [Model Context Protocol](https://modelcontextprotocol.io)\n- [Claude Desktop](https://claude.ai/)  \n- [Microsoft MarkItDown](https://github.com/microsoft/markitdown)# Test workflow fixes\n# Test fix verification\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrsdn%2Fmarkitdown-mcp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftrsdn%2Fmarkitdown-mcp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrsdn%2Fmarkitdown-mcp/lists"}