{"id":26570940,"url":"https://github.com/jae-jae/fetcher-mcp","last_synced_at":"2025-04-12T07:02:58.241Z","repository":{"id":283295697,"uuid":"951224571","full_name":"jae-jae/fetcher-mcp","owner":"jae-jae","description":"MCP server for fetch web page content using Playwright headless browser.","archived":false,"fork":false,"pushed_at":"2025-04-01T07:20:08.000Z","size":66,"stargazers_count":497,"open_issues_count":3,"forks_count":34,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-05T04:01:53.368Z","etag":null,"topics":["ai","mcp","playwright"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jae-jae.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-19T10:54:49.000Z","updated_at":"2025-04-04T19:51:03.000Z","dependencies_parsed_at":"2025-03-20T07:22:30.455Z","dependency_job_id":"24d721ac-d30b-4c63-8ecf-a8e8a047191f","html_url":"https://github.com/jae-jae/fetcher-mcp","commit_stats":null,"previous_names":["jae-jae/fetch-mcp","jae-jae/fetcher-mcp"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jae-jae%2Ffetcher-mcp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jae-jae%2Ffetcher-mcp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jae-jae%2Ffetcher-mcp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jae-jae%2Ffetcher-mcp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jae-jae","download_url":"https://codeload.github.com/jae-jae/fetcher-mcp/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248498037,"owners_count":21114026,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","mcp","playwright"],"created_at":"2025-03-22T22:19:23.441Z","updated_at":"2025-04-12T07:02:58.206Z","avatar_url":"https://github.com/jae-jae.png","language":"TypeScript","funding_links":[],"categories":["Web Scraping","Data Access \u0026 Integration MCP Servers","MCP 服务器精选列表","📚 Projects (1974 total)","Legend","پیاده‌سازی‌های سرور","🤖 AI/ML","TypeScript","Browser Automation","Servers","Search \u0026 Data Extraction","カテゴリ","Table of Contents","MCP Servers"],"sub_categories":["SIEM \u0026 SecOps","🌐 浏览器自动化与网页交互","MCP Servers","🔎 \u003ca name=\"search\"\u003e\u003c/a\u003eSearch","🔎 \u003ca name=\"search\"\u003e\u003c/a\u003eجستجو و استخراج داده","How to Submit","Web Browsing \u0026 Scraping","🕸️ \u003ca name=\"web-scraping--collection\"\u003e\u003c/a\u003eWebスクレイピング・収集","Browser Automation"],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/jae-jae/fetcher-mcp/refs/heads/main/icon.svg\" width=\"100\" height=\"100\" alt=\"Fetcher MCP Icon\" /\u003e\n\u003c/div\u003e\n\n# Fetcher MCP\n\nMCP server for fetch web page content using Playwright headless browser.\n\n## Advantages\n\n- **JavaScript Support**: Unlike traditional web scrapers, Fetcher MCP uses Playwright to execute JavaScript, making it capable of handling dynamic web content and modern web applications.\n\n- **Intelligent Content Extraction**: Built-in Readability algorithm automatically extracts the main content from web pages, removing ads, navigation, and other non-essential elements.\n\n- **Flexible Output Format**: Supports both HTML and Markdown output formats, making it easy to integrate with various downstream applications.\n\n- **Parallel Processing**: The `fetch_urls` tool enables concurrent fetching of multiple URLs, significantly improving efficiency for batch operations.\n\n- **Resource Optimization**: Automatically blocks unnecessary resources (images, stylesheets, fonts, media) to reduce bandwidth usage and improve performance.\n\n- **Robust Error Handling**: Comprehensive error handling and logging ensure reliable operation even when dealing with problematic web pages.\n\n- **Configurable Parameters**: Fine-grained control over timeouts, content extraction, and output formatting to suit different use cases.\n\n## Quick Start\n\nRun directly with npx:\n\n```bash\nnpx -y fetcher-mcp\n```\n\nFirst time setup - install the required browser by running the following command in your terminal:\n\n```bash\nnpx playwright install chromium\n```\n\n### Debug Mode\n\nRun with the `--debug` option to show the browser window for debugging:\n\n```bash\nnpx -y fetcher-mcp --debug\n```\n\n## Configuration MCP\n\nConfigure this MCP server in Claude Desktop:\n\nOn MacOS: `~/Library/Application Support/Claude/claude_desktop_config.json`\n\nOn Windows: `%APPDATA%/Claude/claude_desktop_config.json`\n\n```json\n{\n  \"mcpServers\": {\n    \"fetcher\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"fetcher-mcp\"]\n    }\n  }\n}\n```\n\n## Features\n\n- `fetch_url` - Retrieve web page content from a specified URL\n  - Uses Playwright headless browser to parse JavaScript\n  - Supports intelligent extraction of main content and conversion to Markdown\n  - Supports the following parameters:\n    - `url`: The URL of the web page to fetch (required parameter)\n    - `timeout`: Page loading timeout in milliseconds, default is 30000 (30 seconds)\n    - `waitUntil`: Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'\n    - `extractContent`: Whether to intelligently extract the main content, default is true\n    - `maxLength`: Maximum length of returned content (in characters), default is no limit\n    - `returnHtml`: Whether to return HTML content instead of Markdown, default is false\n    - `waitForNavigation`: Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false\n    - `navigationTimeout`: Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)\n    - `disableMedia`: Whether to disable media resources (images, stylesheets, fonts, media), default is true\n    - `debug`: Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified\n\n- `fetch_urls` - Batch retrieve web page content from multiple URLs in parallel\n  - Uses multi-tab parallel fetching for improved performance\n  - Returns combined results with clear separation between webpages\n  - Supports the following parameters:\n    - `urls`: Array of URLs to fetch (required parameter)\n    - Other parameters are the same as `fetch_url`\n\n## Tips\n\n### Handling Special Website Scenarios\n\n#### Dealing with Anti-Crawler Mechanisms\n- **Wait for Complete Loading**: For websites using CAPTCHA, redirects, or other verification mechanisms, include in your prompt:\n  ```\n  Please wait for the page to fully load\n  ```\n  This will use the `waitForNavigation: true` parameter.\n\n- **Increase Timeout Duration**: For websites that load slowly:\n  ```\n  Please set the page loading timeout to 60 seconds\n  ```\n  This adjusts both `timeout` and `navigationTimeout` parameters accordingly.\n\n#### Content Retrieval Adjustments\n- **Preserve Original HTML Structure**: When content extraction might fail:\n  ```\n  Please preserve the original HTML content\n  ```\n  Sets `extractContent: false` and `returnHtml: true`.\n\n- **Fetch Complete Page Content**: When extracted content is too limited:\n  ```\n  Please fetch the complete webpage content instead of just the main content\n  ```\n  Sets `extractContent: false`.\n\n- **Return Content as HTML**: When HTML format is needed instead of default Markdown:\n  ```\n  Please return the content in HTML format\n  ```\n  Sets `returnHtml: true`.\n\n### Debugging and Authentication\n\n#### Enabling Debug Mode\n- **Dynamic Debug Activation**: To display the browser window during a specific fetch operation:\n  ```\n  Please enable debug mode for this fetch operation\n  ```\n  This sets `debug: true` even if the server was started without the `--debug` flag.\n\n#### Using Custom Cookies for Authentication\n- **Manual Login**: To login using your own credentials:\n  ```\n  Please run in debug mode so I can manually log in to the website\n  ```\n  Sets `debug: true` or uses the `--debug` flag, keeping the browser window open for manual login.\n\n- **Interacting with Debug Browser**: When debug mode is enabled:\n  1. The browser window remains open\n  2. You can manually log into the website using your credentials\n  3. After login is complete, content will be fetched with your authenticated session\n\n- **Enable Debug for Specific Requests**: Even if the server is already running, you can enable debug mode for a specific request:\n  ```\n  Please enable debug mode for this authentication step\n  ```\n  Sets `debug: true` for this specific request only, opening the browser window for manual login.\n\n## Development\n\n### Install Dependencies\n\n```bash\nnpm install\n```\n\n### Install Playwright Browser\n\nInstall the browsers needed for Playwright:\n\n```bash\nnpm run install-browser\n```\n\n### Build the Server\n\n```bash\nnpm run build\n```\n\n## Debugging\n\nUse MCP Inspector for debugging:\n\n```bash\nnpm run inspector\n```\n\nYou can also enable visible browser mode for debugging:\n\n```bash\nnode build/index.js --debug\n```\n\n## Related Projects\n\n- [g-search-mcp](https://github.com/jae-jae/g-search-mcp): A powerful MCP server for Google search that enables parallel searching with multiple keywords simultaneously. Perfect for batch search operations and data collection.\n\n## License\n\nLicensed under the [MIT License](https://choosealicense.com/licenses/mit/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjae-jae%2Ffetcher-mcp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjae-jae%2Ffetcher-mcp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjae-jae%2Ffetcher-mcp/lists"}