{"id":27063738,"url":"https://github.com/shtse8/pdf-reader-mcp","last_synced_at":"2025-04-05T16:20:54.403Z","repository":{"id":286157353,"uuid":"960549454","full_name":"shtse8/pdf-reader-mcp","owner":"shtse8","description":"An MCP server built with Node.js/TypeScript that allows AI agents to securely read PDF files (local or URL) and extract text, metadata, or page counts. Uses pdf-parse.","archived":false,"fork":false,"pushed_at":"2025-04-04T17:30:06.000Z","size":154,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-04T17:32:18.935Z","etag":null,"topics":["ai-agent","llm-tool","mcp","model-content-protocol","nodejs","pdf","pdf-parse","pdf-parser","pdf-reader","stdio","typescript"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shtse8.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-04-04T16:17:42.000Z","updated_at":"2025-04-04T17:30:09.000Z","dependencies_parsed_at":"2025-04-04T17:42:24.661Z","dependency_job_id":null,"html_url":"https://github.com/shtse8/pdf-reader-mcp","commit_stats":null,"previous_names":["shtse8/pdf-reader-mcp"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shtse8%2Fpdf-reader-mcp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shtse8%2Fpdf-reader-mcp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shtse8%2Fpdf-reader-mcp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shtse8%2Fpdf-reader-mcp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shtse8","download_url":"https://codeload.github.com/shtse8/pdf-reader-mcp/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247362529,"owners_count":20926797,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agent","llm-tool","mcp","model-content-protocol","nodejs","pdf","pdf-parse","pdf-parser","pdf-reader","stdio","typescript"],"created_at":"2025-04-05T16:20:53.912Z","updated_at":"2025-04-05T16:20:54.396Z","avatar_url":"https://github.com/shtse8.png","language":"TypeScript","funding_links":[],"categories":["🌐 Web Development"],"sub_categories":[],"readme":"# PDF Reader MCP Server (@shtse8/pdf-reader-mcp)\n\n[![npm version](https://badge.fury.io/js/%40shtse8%2Fpdf-reader-mcp.svg)](https://badge.fury.io/js/%40shtse8%2Fpdf-reader-mcp)\n[![Docker Pulls](https://img.shields.io/docker/pulls/shtse8/pdf-reader-mcp.svg)](https://hub.docker.com/r/shtse8/pdf-reader-mcp)\n\n\u003c!-- Add other badges like License, Build Status if applicable --\u003e\n\n**Empower your AI agents (like Cline/Claude) with the ability to read and\nextract information from PDF files within your project, using a single, flexible\ntool.**\n\nThis Node.js server implements the\n[Model Context Protocol (MCP)](https://docs.modelcontextprotocol.com/) to\nprovide a consolidated `read_pdf` tool for interacting with PDF documents (local\nor URL) located within a defined project root directory.\n\n---\n\n## ⭐ Why Use This Server?\n\n- **🛡️ Secure Project Root Focus:**\n  - All local file operations are **strictly confined to the project root\n    directory** (determined by the server's launch context), preventing\n    unauthorized access.\n  - Uses **relative paths** for local files. **Important:** The server\n    determines its project root from its own Current Working Directory (`cwd`)\n    at launch. The process starting the server (e.g., your MCP host) **must**\n    set the `cwd` to your intended project directory.\n- **🌐 URL Support:** Can directly process PDFs from public URLs.\n- **⚡ Efficient PDF Processing:**\n  - Leverages the `pdf-parse` library for extracting text, metadata, and page\n    information.\n- **🔧 Flexible \u0026 Consolidated Tool:**\n  - A single `read_pdf` tool handles various extraction needs via parameters,\n    simplifying agent interaction.\n- **🚀 Easy Integration:** Get started quickly using `npx` with minimal\n  configuration.\n- **🐳 Containerized Option:** Also available as a Docker image for consistent\n  deployment environments.\n- **✅ Robust Validation:** Uses Zod schemas to validate all incoming tool\n  arguments.\n\n---\n\n## 🚀 Quick Start: Usage with MCP Host (Recommended: `npx`)\n\nThe simplest way is via `npx`, configured in your MCP host (e.g.,\n`mcp_settings.json`).\n\n```json\n{\n  \"mcpServers\": {\n    \"pdf-reader-mcp\": {\n      \"command\": \"npx\",\n      \"args\": [\n        \"@shtse8/pdf-reader-mcp\"\n      ],\n      \"name\": \"PDF Reader (npx)\"\n    }\n  }\n}\n```\n\n**(Alternative) Using `bunx`:**\n\n```json\n{\n  \"mcpServers\": {\n    \"pdf-reader-mcp\": {\n      \"command\": \"bunx\",\n      \"args\": [\n        \"@shtse8/pdf-reader-mcp\"\n      ],\n      \"name\": \"PDF Reader (bunx)\"\n    }\n  }\n}\n```\n\n**Important:** Ensure your MCP Host launches the command with the `cwd` set to\nyour project's root directory for local file access.\n\n---\n\n## ✨ The `read_pdf` Tool\n\nThis server provides a single, powerful tool: `read_pdf`.\n\n- **Description:** Reads content, metadata, or page count from a PDF file (local\n  or URL), controlled by parameters.\n- **Input:** An object containing:\n  - `sources` (array): **Required.** An array of source objects. Each object\n    must contain _either_ `path` (string, relative path to local PDF) _or_ `url`\n    (string, URL of PDF). Each source object can _optionally_ include:\n    - `pages` (string | number[], optional): Extract text only from specific\n      pages (1-based) or ranges (e.g., `[1, 3, 5]` or `'1,3-5,7'`) for _this\n      specific source_. If provided, the global `include_full_text` flag is\n      ignored for this source.\n  - `include_full_text` (boolean, optional, default `false`): Include the full\n    text content for each PDF. Ignored if `pages` is provided.\n  - `include_metadata` (boolean, optional, default `true`): Include metadata\n    (`info` and `metadata` objects) for each PDF.\n  - `include_page_count` (boolean, optional, default `true`): Include the total\n    number of pages (`num_pages`) for each PDF.\n  \u003c!-- Removed deprecated top-level pages parameter description --\u003e\n- **Output:** An object containing a `results` array. Each element corresponds\n  to a source in the input `sources` array. **Processing continues even if some\n  sources fail.** Each result object has the following structure:\n  - `source` (string): The original path or URL provided for identification.\n  - `success` (boolean): Indicates if processing _this specific source_ was\n    successful.\n  - `error` (string, optional): Provides an error message if `success` is false\n    for this source.\n  - `data` (object, optional): Contains the extracted data if `success` is true\n    for this source:\n    - `full_text` (string, optional)\n    - `page_texts` (array, optional): Array of `{ page: number, text: string }`.\n    - `missing_pages` (array, optional)\n    - `info` (object, optional)\n    - `metadata` (object, optional)\n    - `num_pages` (number, optional)\n    - `warnings` (array, optional): Non-critical warnings for this source (e.g.,\n      requested page out of bounds).\n\n1. **Get metadata and page count for multiple files:**\n   ```json\n   {\n     \"sources\": [\n       { \"path\": \"report.pdf\" },\n       { \"url\": \"http://example.com/another.pdf\" },\n       { \"path\": \"nonexistent.pdf\" }\n     ]\n   }\n   ```\n   _(Example Output:\n   `{ \"results\": [ { \"source\": \"report.pdf\", \"success\": true, \"data\": { \"info\": {...}, \"metadata\": {...}, \"num_pages\": 10 } }, { \"source\": \"http://example.com/another.pdf\", \"success\": true, \"data\": { \"info\": {...}, \"metadata\": {...}, \"num_pages\": 5 } }, { \"source\": \"nonexistent.pdf\", \"success\": false, \"error\": \"File not found...\" } ] }`)_\n\n2. **Get full text for one file:**\n   ```json\n   {\n     \"sources\": [{ \"url\": \"http://example.com/document.pdf\" }],\n     \"include_full_text\": true,\n     \"include_metadata\": false,\n     \"include_page_count\": false\n   }\n   ```\n   _(Example Output:\n   `{ \"results\": [ { \"source\": \"http://example.com/document.pdf\", \"success\": true, \"data\": { \"full_text\": \"...\" } } ] }`)_\n\n3. **Get text from different pages for different files:**\n   ```json\n   {\n     \"sources\": [\n       { \"path\": \"manual.pdf\", \"pages\": \"1-2\" },\n       { \"url\": \"http://example.com/report.pdf\", \"pages\": [5] }\n     ],\n     \"include_metadata\": false, /* Default is true, explicitly set false */\n     \"include_page_count\": false /* Default is true, explicitly set false */\n   }\n   ```\n   _(Example Output:\n   `{ \"results\": [ { \"source\": \"manual.pdf\", \"success\": true, \"data\": { \"page_texts\": [...] } }, { \"source\": \"http://example.com/report.pdf\", \"success\": true, \"data\": { \"page_texts\": [...] } } ] }`)_\n\n---\n\n## 🐳 Alternative Usage: Docker\n\nConfigure your MCP Host to run the Docker container, mounting your project\ndirectory to `/app`.\n\n```json\n{\n  \"mcpServers\": {\n    \"pdf-reader-mcp\": {\n      \"command\": \"docker\",\n      \"args\": [\n        \"run\",\n        \"-i\",\n        \"--rm\",\n        \"-v\",\n        \"/path/to/your/project:/app\",\n        \"shtse8/pdf-reader-mcp:latest\"\n      ],\n      \"name\": \"PDF Reader (Docker)\"\n    }\n  }\n}\n```\n\n**Note on Volume Mount Path:** Instead of hardcoding `/path/to/your/project`,\nyou can often use shell variables to automatically use the current working\ndirectory:\n\n- **Linux/macOS:** `-v \"$PWD:/app\"`\n- **Windows Cmd:** `-v \"%CD%:/app\"`\n- **Windows PowerShell:** `-v \"${PWD}:/app\"`\n- **VS Code Tasks/Launch:** You might be able to use `${workspaceFolder}` if\n  supported by your MCP host integration.\n\n---\n\n## 🛠️ Other Usage Options\n\n### Local Build (For Development)\n\n1. Clone: `git clone https://github.com/shtse8/pdf-reader-mcp.git`\n2. Install: `cd pdf-reader-mcp \u0026\u0026 npm install`\n3. Build: `npm run build`\n4. Configure MCP Host:\n   ```json\n   {\n     \"mcpServers\": {\n       \"pdf-reader-mcp\": {\n         \"command\": \"node\",\n         \"args\": [\"/path/to/cloned/repo/pdf-reader-mcp/build/index.js\"],\n         \"name\": \"PDF Reader (Local Build)\"\n       }\n     }\n   }\n   ```\n\n---\n\n## 💻 Development\n\n1. Clone, `npm install`, `npm run build`.\n2. `npm run watch` for auto-recompile.\n\n---\n\n## 🚢 Publishing (via GitHub Actions)\n\nUses GitHub Actions (`.github/workflows/publish.yml`) to publish to npm and\nDocker Hub on pushes to `main`. Requires `NPM_TOKEN`, `DOCKERHUB_USERNAME`,\n`DOCKERHUB_TOKEN` secrets.\n\n---\n\n## 🙌 Contributing\n\nContributions welcome! Open an issue or PR.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshtse8%2Fpdf-reader-mcp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshtse8%2Fpdf-reader-mcp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshtse8%2Fpdf-reader-mcp/lists"}