{"id":30106007,"url":"https://github.com/7a6163/browser-worker","last_synced_at":"2025-08-10T01:03:38.217Z","repository":{"id":303173897,"uuid":"1014294883","full_name":"7a6163/browser-worker","owner":"7a6163","description":"A Cloudflare Worker that uses browser rendering to extract Open Graph (OG) meta data from Single Page Applications (SPAs) for social media sharing on Facebook, LINE, and other platforms.","archived":false,"fork":false,"pushed_at":"2025-07-06T05:17:58.000Z","size":111,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-06T06:29:30.687Z","etag":null,"topics":["browser-rendering","cloudflare-workers","puppeteer","serverless"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/7a6163.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-05T12:36:55.000Z","updated_at":"2025-07-06T05:18:01.000Z","dependencies_parsed_at":"2025-07-06T06:30:14.438Z","dependency_job_id":"4e0a209c-acd4-4e72-b75e-48049fc59b6c","html_url":"https://github.com/7a6163/browser-worker","commit_stats":null,"previous_names":["7a6163/browser-worker"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/7a6163/browser-worker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/7a6163%2Fbrowser-worker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/7a6163%2Fbrowser-worker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/7a6163%2Fbrowser-worker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/7a6163%2Fbrowser-worker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/7a6163","download_url":"https://codeload.github.com/7a6163/browser-worker/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/7a6163%2Fbrowser-worker/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269659414,"owners_count":24455110,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-09T02:00:10.424Z","response_time":111,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["browser-rendering","cloudflare-workers","puppeteer","serverless"],"created_at":"2025-08-10T01:01:16.572Z","updated_at":"2025-08-10T01:03:38.160Z","avatar_url":"https://github.com/7a6163.png","language":"TypeScript","readme":"# Browser Worker - Fast HTML Content Extractor\n\nA high-performance Cloudflare Worker that uses browser rendering to extract HTML content from Single Page Applications (SPAs) and dynamic websites. Optimized with session reuse and intelligent caching for maximum speed.\n\n## 🎯 Purpose\n\nMany modern web applications (React, Vue, Angular) generate their content dynamically using JavaScript. This Worker solves the problem by:\n- Using a real browser to render JavaScript-heavy pages completely\n- Extracting the fully rendered HTML content\n- Providing fast access through intelligent session reuse and caching\n- Supporting both social media crawlers and API consumers\n\n## 🚀 Features\n\n- **⚡ High Performance** - Optimized session reuse and connection pooling for 3-5x faster startup\n- **🧠 Smart Caching** - Intelligent session management with automatic cleanup\n- **🌐 Full Browser Rendering** - Uses Puppeteer to execute JavaScript and render SPAs completely\n- **📦 KV Caching** - Optional HTML content caching with configurable TTL\n- **🛡️ Resource Optimization** - Blocks unnecessary resources (CSS, fonts, images) for faster loading\n- **🔧 Error Handling** - Robust error handling with optimized timeouts\n- **🌍 CORS Support** - Can be called from frontend applications\n\n## 📋 Requirements\n\n- Cloudflare Workers account\n- Browser Rendering enabled (Puppeteer binding)\n- Node.js compatibility flag enabled\n\n## 🛠️ Installation\n\n1. Clone this repository:\n```bash\ngit clone https://github.com/7a6163/browser-worker.git\ncd browser-worker\n```\n\n2. Install dependencies:\n```bash\nnpm install\n```\n\n3. Configure your `wrangler.jsonc`:\n```json\n{\n  \"name\": \"browser-worker\",\n  \"main\": \"src/index.ts\",\n  \"compatibility_date\": \"2025-07-05\",\n  \"compatibility_flags\": [\"nodejs_compat\"],\n  \"browser\": {\n    \"binding\": \"MYBROWSER\"\n  }\n}\n```\n\n4. Deploy to Cloudflare Workers:\n```bash\nnpm run deploy\n```\n\n## 🔧 Usage\n\n### Basic Usage\n\nThe Worker uses a simple `/content/{url}` endpoint to extract HTML content:\n\n```bash\n# Extract HTML content from any URL\ncurl \"https://your-worker.your-subdomain.workers.dev/content/https://example.com\"\n\n# For URLs with special characters, URL-encode them\ncurl \"https://your-worker.your-subdomain.workers.dev/content/https%3A%2F%2Fexample.com%2Fpath%3Fquery%3Dvalue\"\n```\n\n### Response Format\n\nThe Worker returns the fully rendered HTML content with proper headers:\n\n```http\nContent-Type: text/html;charset=UTF-8\nAccess-Control-Allow-Origin: *\n```\n\n### Error Handling\n\nInvalid requests return JSON error responses:\n\n```json\n{\n  \"success\": false,\n  \"error\": \"Invalid URL format\",\n  \"url\": \"invalid-url\"\n}\n```\n\n### Local Development\n\n```bash\n# Start development server with remote browser support\nnpm run dev\n\n# Or use wrangler directly\nwrangler dev --remote\n```\n\n### Performance Optimizations\n\nThis Worker includes several performance optimizations:\n\n- **Session Reuse**: Browser sessions are kept alive and reused across requests\n- **Connection Pooling**: Maintains up to 5 concurrent sessions for optimal performance\n- **Resource Blocking**: Automatically blocks CSS, fonts, and images for faster loading\n- **Optimized Timeouts**: Reduced wait times while maintaining reliability\n- **Intelligent Caching**: Sessions are cached for 30 minutes with automatic cleanup\n\n## 📱 Integration Examples\n\n### Social Media Crawlers\n\nFor social media platforms that need to crawl your SPA:\n\n```bash\n# Facebook, LINE, Twitter, etc. can access:\nhttps://your-worker.your-subdomain.workers.dev/content/https://your-spa.com/article/123\n```\n\n### API Integration\n\nIntegrate with your applications:\n\n```javascript\n// Fetch rendered HTML content\nconst response = await fetch('https://your-worker.workers.dev/content/https://example.com');\nconst htmlContent = await response.text();\n\n// Use the HTML content in your application\ndocument.getElementById('content').innerHTML = htmlContent;\n```\n\n### Webhook/Automation\n\nUse in automation workflows:\n\n```bash\n# Get rendered content for processing\ncurl \"https://your-worker.workers.dev/content/https://news-site.com/article/123\" \\\n  | grep -o '\u003cmeta property=\"og:title\" content=\"[^\"]*\"' \\\n  | sed 's/.*content=\"\\([^\"]*\\)\".*/\\1/'\n```\n\n## 🧪 Testing\n\n### Local Testing\n\n```bash\n# Test HTML content extraction\ncurl \"http://localhost:8787/content/https://github.com\"\n\n# Test with complex URLs\ncurl \"http://localhost:8787/content/https://example.com/path?query=value\"\n\n# Test error handling\ncurl \"http://localhost:8787/content/invalid-url\"\n\n# Test CORS preflight\ncurl -X OPTIONS \"http://localhost:8787/content/https://github.com\"\n```\n\n### Production Testing\n\n```bash\n# Test your deployed Worker\ncurl \"https://your-worker.your-subdomain.workers.dev/content/https://github.com\"\n\n# Test with URL encoding\ncurl \"https://your-worker.your-subdomain.workers.dev/content/https%3A%2F%2Fexample.com%2Fpath%3Fquery%3Dvalue\"\n```\n\n### Performance Testing\n\n```bash\n# Test session reuse (run multiple times to see performance improvement)\ntime curl \"https://your-worker.workers.dev/content/https://example.com\"\ntime curl \"https://your-worker.workers.dev/content/https://example.com\"\ntime curl \"https://your-worker.workers.dev/content/https://example.com\"\n```\n\n## 📊 Response Format\n\n### Successful Response\n\nReturns the fully rendered HTML content:\n\n```http\nHTTP/1.1 200 OK\nContent-Type: text/html;charset=UTF-8\nAccess-Control-Allow-Origin: *\n\n\u003c!DOCTYPE html\u003e\n\u003chtml\u003e\n\u003chead\u003e\n  \u003cmeta property=\"og:title\" content=\"Page Title\"\u003e\n  \u003cmeta property=\"og:description\" content=\"Page Description\"\u003e\n  \u003c!-- All dynamically generated content --\u003e\n\u003c/head\u003e\n\u003cbody\u003e\n  \u003c!-- Fully rendered page content --\u003e\n\u003c/body\u003e\n\u003c/html\u003e\n```\n\n### Error Response\n\n```http\nHTTP/1.1 400 Bad Request\nContent-Type: application/json\nAccess-Control-Allow-Origin: *\n\n{\n  \"success\": false,\n  \"error\": \"Invalid URL format\",\n  \"url\": \"invalid-url\"\n}\n```\n  \u003cp\u003e\u003ca href=\"https://example.com\"\u003eClick here if you are not redirected automatically\u003c/a\u003e\u003c/p\u003e\n\u003c/body\u003e\n\u003c/html\u003e\n```\n\n### JSON Response (Debug Mode)\n```json\n{\n  \"success\": true,\n  \"url\": \"https://example.com\",\n  \"sessionInfo\": \"Connected to session-id\",\n  \"data\": {\n    \"title\": \"Page Title\",\n    \"description\": \"Page Description\",\n    \"image\": \"https://example.com/image.jpg\",\n    \"url\": \"https://example.com\",\n    \"type\": \"website\",\n    \"siteName\": \"Site Name\",\n    \"locale\": \"en_US\",\n    \"twitterCard\": \"summary_large_image\",\n    \"twitterImage\": \"https://example.com/twitter-image.jpg\",\n    \"twitterTitle\": \"Twitter Title\",\n    \"twitterDescription\": \"Twitter Description\"\n  }\n}\n```\n\n## ⚙️ Configuration\n\n### Environment Variables\n\nNo environment variables are required. The Worker uses Cloudflare's Browser Rendering binding.\n\n### Timeout Settings\n\n- Page load timeout: 10 seconds\n- Browser session reuse for better performance\n- Automatic session cleanup\n\n### Caching\n\n- HTML responses are cached for 5 minutes\n- Browser sessions are reused across requests\n- Efficient resource management\n\n## 🔍 Troubleshooting\n\n### Common Issues\n\n1. **\"Browser Rendering is not supported locally\"**\n   - Use `wrangler dev --remote` instead of `wrangler dev`\n\n2. **\"Failed to load page: 4xx/5xx\"**\n   - Check if the target URL is accessible\n   - Verify the URL format is correct\n\n3. **\"Evaluation failed: ReferenceError\"**\n   - This usually indicates a JavaScript execution error\n   - Check the browser console for more details\n\n### Debug Mode\n\nUse `?format=json` to get detailed error information:\n\n```bash\ncurl \"https://your-worker.your-subdomain.workers.dev/?url=https://problematic-site.com\u0026format=json\"\n```\n\n## 📈 Performance\n\n- **Cold Start**: ~2-3 seconds for new browser sessions\n- **Warm Requests**: ~500ms-1s when reusing sessions\n- **Memory Usage**: Optimized with automatic session cleanup\n- **Concurrent Requests**: Handles multiple requests efficiently\n\n## 🔒 Security\n\n- Input URL validation and normalization\n- Timeout protection against slow-loading pages\n- Automatic browser session cleanup\n- No sensitive data storage\n\n## 📄 License\n\nThis project is licensed under the MIT License.\n\n## 🤝 Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Test thoroughly\n5. Submit a pull request\n\n## 📞 Support\n\nFor issues and questions:\n- Check the troubleshooting section\n- Review Cloudflare Workers documentation\n- Open an issue in this repository\n\n---\n\n**Version**: 1.0.0\n**Last Updated**: 2025-07-05\n**Cloudflare Workers**: Compatible\n**Browser Rendering**: Required\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F7a6163%2Fbrowser-worker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F7a6163%2Fbrowser-worker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F7a6163%2Fbrowser-worker/lists"}