{"id":30800625,"url":"https://github.com/pspdfkit/nutrient-pdf-mcp-server","last_synced_at":"2025-09-05T20:11:15.132Z","repository":{"id":305200150,"uuid":"1011374343","full_name":"PSPDFKit/nutrient-pdf-mcp-server","owner":"PSPDFKit","description":"A powerful Model Context Protocol server for LLM-driven PDF document analysis and exploration","archived":false,"fork":false,"pushed_at":"2025-08-01T19:44:54.000Z","size":54,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-01T21:44:34.138Z","etag":null,"topics":["ai-integration","llm-tools","mcp","pdf","pdf-parser","pdf-tools","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PSPDFKit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-30T18:03:25.000Z","updated_at":"2025-08-01T19:44:56.000Z","dependencies_parsed_at":"2025-07-18T22:11:30.769Z","dependency_job_id":null,"html_url":"https://github.com/PSPDFKit/nutrient-pdf-mcp-server","commit_stats":null,"previous_names":["pspdfkit/nutrient-pdf-mcp-server"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/PSPDFKit/nutrient-pdf-mcp-server","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PSPDFKit%2Fnutrient-pdf-mcp-server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PSPDFKit%2Fnutrient-pdf-mcp-server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PSPDFKit%2Fnutrient-pdf-mcp-server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PSPDFKit%2Fnutrient-pdf-mcp-server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PSPDFKit","download_url":"https://codeload.github.com/PSPDFKit/nutrient-pdf-mcp-server/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PSPDFKit%2Fnutrient-pdf-mcp-server/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273813835,"owners_count":25172892,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-05T02:00:09.113Z","response_time":402,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-integration","llm-tools","mcp","pdf","pdf-parser","pdf-tools","python"],"created_at":"2025-09-05T20:11:09.942Z","updated_at":"2025-09-05T20:11:15.119Z","avatar_url":"https://github.com/PSPDFKit.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Nutrient PDF MCP Server\n\n\u003e **A powerful Model Context Protocol server for LLM-driven PDF document analysis and exploration**\n\nA [Model Context Protocol (MCP)](https://modelcontextprotocol.io) server for investigating PDF object trees with lazy loading support. This tool allows LLMs to efficiently explore PDF document structure without overwhelming token limits.\n\n## Features\n\n- **Lazy Loading**: Explore PDF structure without loading entire object trees\n- **Path Navigation**: Navigate through PDF objects using dot notation (e.g., `Pages.Kids.0`)\n- **Selective Resolution**: Resolve specific indirect objects on demand\n- **Token Efficient**: Massive reduction in response sizes compared to full tree dumps\n- **Type Safe**: Comprehensive type hints and error handling\n\n## Installation\n\n### Optional `asdf` setup\n\nYou'll need `python` and `nodejs` installed on your machine. You can optionally use `asdf`.\n\n- [Install and configure `asdf` version manager](https://asdf-vm.com/guide/getting-started.html)\n- [Install `asdf` `nodejs` plugin](https://github.com/asdf-vm/asdf-nodejs)\n- [Install `asdf` `python` plugin](https://github.com/asdf-community/asdf-python)\n\nFinally install required tools with:\n\n```sh\ngit clone https://github.com/PSPDFKit/nutrient-pdf-mcp-server.git\ncd nutrient-pdf-mcp-server\nasdf install\n\n# Install pipx for Python\npython -m pip install --user pipx\n```\n\nProceed with the rest of the installation after that.\n\n### Quick Start\n\n```bash\ngit clone https://github.com/PSPDFKit/nutrient-pdf-mcp-server.git\ncd nutrient-pdf-mcp-server\nmake install-dev  # Sets up development environment\n```\n\n### For Claude Code CLI\n\n**Recommended: Build and Install**\n\n```bash\npip install build\nmake build\npipx install dist/nutrient_pdf_mcp-1.0.0-py3-none-any.whl\nclaude mcp add nutrient-pdf-mcp nutrient-pdf-mcp\n```\n\nIf using `asdf`, you might need to configure `pipx` with the following before running:\n\n```sh\nexport PIPX_DEFAULT_PYTHON=$(asdf which python)\npipx install dist/nutrient_pdf_mcp-1.0.0-py3-none-any.whl\n```\n\n**Development Mode**\n\n```bash\nmake install-dev\nclaude mcp add nutrient-pdf-mcp \"$(pwd)/venv/bin/python\" -m pdf_mcp.server\n```\n\n#### Manual Configuration\n\n```json\n{\n  \"mcpServers\": {\n    \"nutrient-pdf-mcp\": {\n      \"command\": \"python\",\n      \"args\": [\"-m\", \"pdf_mcp.server\"]\n    }\n  }\n}\n```\n\n### Available Tools\n\n#### `get_pdf_object_tree`\n\nNutrient PDF MCP Server - Get JSON representation of PDF object tree with lazy loading.\n\n**Parameters:**\n\n- `pdf_path` (required): Path to the PDF file\n- `object_id` (optional): Specific object ID to retrieve (e.g., '1 0')\n- `path` (optional): Object path to navigate (e.g., 'Pages.Kids.0')\n- `mode` (optional): Parsing mode - 'lazy' (default) or 'full'\n\n**Examples:**\n\n```json\n{\n  \"pdf_path\": \"document.pdf\",\n  \"mode\": \"lazy\"\n}\n```\n\n```json\n{\n  \"pdf_path\": \"document.pdf\",\n  \"path\": \"Pages.Kids.0\",\n  \"mode\": \"lazy\"\n}\n```\n\n#### `resolve_indirect_object`\n\nNutrient PDF MCP Server - Resolve a specific indirect object by its object and generation numbers.\n\n**Parameters:**\n\n- `pdf_path` (required): Path to the PDF file\n- `objnum` (required): PDF object number (e.g., 3)\n- `gennum` (optional): PDF generation number (defaults to 0)\n- `depth` (optional): Resolution depth - 'shallow' (default) or 'deep'\n\n**Examples:**\n\n```json\n{\n  \"pdf_path\": \"document.pdf\",\n  \"objnum\": 3,\n  \"gennum\": 0,\n  \"depth\": \"shallow\"\n}\n```\n\n### Command Line Usage\n\n```bash\n# Run the server\nmake serve\n\n# Or run with debug logging\nmake serve-debug\n```\n\n## Architecture\n\n### Core Components\n\n- **`parser.py`**: Main PDF parsing logic with lazy loading support\n- **`server.py`**: MCP server implementation\n- **`types.py`**: Type definitions for PDF objects and responses\n- **`exceptions.py`**: Custom exception classes\n\n### Response Types\n\nAll PDF objects are serialized into a consistent JSON format:\n\n```json\n{\n  \"type\": \"dict\",\n  \"value\": {\n    \"/Type\": { \"type\": \"name\", \"value\": \"/Pages\" },\n    \"/Kids\": {\n      \"type\": \"array\",\n      \"value\": [{ \"type\": \"indirect_ref\", \"objnum\": 2, \"gennum\": 0 }]\n    }\n  }\n}\n```\n\n### Token Efficiency\n\nThe lazy loading system provides massive token savings:\n\n- **Lazy mode**: ~5-50 lines (minimal tokens)\n- **Shallow resolution**: ~50-100 lines (reasonable tokens)\n- **Deep resolution**: 500+ lines (use sparingly)\n\n## Examples\n\n### Exploring PDF Structure\n\n1. **Get overview**: `get_pdf_object_tree(path=\"document.pdf\", mode=\"lazy\")`\n2. **Navigate to pages**: `get_pdf_object_tree(path=\"document.pdf\", path=\"Pages\", mode=\"lazy\")`\n3. **Resolve specific page**: `resolve_indirect_object(objnum=3, gennum=0, depth=\"shallow\")`\n4. **Deep dive when needed**: `resolve_indirect_object(objnum=3, gennum=0, depth=\"deep\")`\n\n### Path Navigation Examples\n\n- `\"Pages\"` - Navigate to Pages object\n- `\"Pages.Kids\"` - Get Kids array from Pages\n- `\"Pages.Kids.0\"` - Get first page\n- `\"Pages.Kids.0.MediaBox.2\"` - Get width from MediaBox array\n\n## Development\n\n### Quick Start\n\n```bash\n# Set up development environment\nmake install-dev\n\n# Run all quality checks (format, lint, typecheck, test)\nmake quality\n\n# Or run individual commands\nmake test          # Run tests\nmake format        # Format code\nmake lint          # Run linter\nmake typecheck     # Type checking\n```\n\n### Project Structure\n\n```\nnutrient-pdf-mcp-server/\n├── pdf_mcp/\n│   ├── __init__.py\n│   ├── server.py          # MCP server\n│   ├── parser.py          # PDF parsing logic\n│   ├── types.py           # Type definitions\n│   └── exceptions.py      # Custom exceptions\n├── tests/                 # Test suite\n├── res/                   # Sample PDFs\n├── pyproject.toml         # Project configuration\n└── README.md\n```\n\n## Publishing to PyPI\n\n```bash\n# Build the package\nmake build\n\n# Upload to test PyPI first\ntwine upload --repository testpypi dist/*\n\n# Upload to production PyPI\ntwine upload dist/*\n```\n\nAfter publishing, users can install with:\n\n```bash\npipx install nutrient-pdf-mcp\n# or\npip install --user nutrient-pdf-mcp\n```\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes with tests\n4. Ensure code quality checks pass\n5. Submit a pull request\n\n## License\n\nMIT License - see LICENSE file for details.\n\n## Related Projects\n\n- [Model Context Protocol](https://modelcontextprotocol.io)\n- [PyPDF](https://pypdf.readthedocs.io/)\n- [Claude Code](https://claude.ai/code)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpspdfkit%2Fnutrient-pdf-mcp-server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpspdfkit%2Fnutrient-pdf-mcp-server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpspdfkit%2Fnutrient-pdf-mcp-server/lists"}