https://github.com/7a6163/browser-worker
A Cloudflare Worker that uses browser rendering to extract Open Graph (OG) meta data from Single Page Applications (SPAs) for social media sharing on Facebook, LINE, and other platforms.
https://github.com/7a6163/browser-worker
browser-rendering cloudflare-workers puppeteer serverless
Last synced: 8 months ago
JSON representation
A Cloudflare Worker that uses browser rendering to extract Open Graph (OG) meta data from Single Page Applications (SPAs) for social media sharing on Facebook, LINE, and other platforms.
- Host: GitHub
- URL: https://github.com/7a6163/browser-worker
- Owner: 7a6163
- License: mit
- Created: 2025-07-05T12:36:55.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-07-06T05:17:58.000Z (9 months ago)
- Last Synced: 2025-07-06T06:29:30.687Z (9 months ago)
- Topics: browser-rendering, cloudflare-workers, puppeteer, serverless
- Language: TypeScript
- Homepage:
- Size: 108 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Browser Worker - Fast HTML Content Extractor
A high-performance Cloudflare Worker that uses browser rendering to extract HTML content from Single Page Applications (SPAs) and dynamic websites. Optimized with session reuse and intelligent caching for maximum speed.
## 🎯 Purpose
Many modern web applications (React, Vue, Angular) generate their content dynamically using JavaScript. This Worker solves the problem by:
- Using a real browser to render JavaScript-heavy pages completely
- Extracting the fully rendered HTML content
- Providing fast access through intelligent session reuse and caching
- Supporting both social media crawlers and API consumers
## 🚀 Features
- **⚡ High Performance** - Optimized session reuse and connection pooling for 3-5x faster startup
- **🧠 Smart Caching** - Intelligent session management with automatic cleanup
- **🌐 Full Browser Rendering** - Uses Puppeteer to execute JavaScript and render SPAs completely
- **📦 KV Caching** - Optional HTML content caching with configurable TTL
- **🛡️ Resource Optimization** - Blocks unnecessary resources (CSS, fonts, images) for faster loading
- **🔧 Error Handling** - Robust error handling with optimized timeouts
- **🌍 CORS Support** - Can be called from frontend applications
## 📋 Requirements
- Cloudflare Workers account
- Browser Rendering enabled (Puppeteer binding)
- Node.js compatibility flag enabled
## 🛠️ Installation
1. Clone this repository:
```bash
git clone https://github.com/7a6163/browser-worker.git
cd browser-worker
```
2. Install dependencies:
```bash
npm install
```
3. Configure your `wrangler.jsonc`:
```json
{
"name": "browser-worker",
"main": "src/index.ts",
"compatibility_date": "2025-07-05",
"compatibility_flags": ["nodejs_compat"],
"browser": {
"binding": "MYBROWSER"
}
}
```
4. Deploy to Cloudflare Workers:
```bash
npm run deploy
```
## 🔧 Usage
### Basic Usage
The Worker uses a simple `/content/{url}` endpoint to extract HTML content:
```bash
# Extract HTML content from any URL
curl "https://your-worker.your-subdomain.workers.dev/content/https://example.com"
# For URLs with special characters, URL-encode them
curl "https://your-worker.your-subdomain.workers.dev/content/https%3A%2F%2Fexample.com%2Fpath%3Fquery%3Dvalue"
```
### Response Format
The Worker returns the fully rendered HTML content with proper headers:
```http
Content-Type: text/html;charset=UTF-8
Access-Control-Allow-Origin: *
```
### Error Handling
Invalid requests return JSON error responses:
```json
{
"success": false,
"error": "Invalid URL format",
"url": "invalid-url"
}
```
### Local Development
```bash
# Start development server with remote browser support
npm run dev
# Or use wrangler directly
wrangler dev --remote
```
### Performance Optimizations
This Worker includes several performance optimizations:
- **Session Reuse**: Browser sessions are kept alive and reused across requests
- **Connection Pooling**: Maintains up to 5 concurrent sessions for optimal performance
- **Resource Blocking**: Automatically blocks CSS, fonts, and images for faster loading
- **Optimized Timeouts**: Reduced wait times while maintaining reliability
- **Intelligent Caching**: Sessions are cached for 30 minutes with automatic cleanup
## 📱 Integration Examples
### Social Media Crawlers
For social media platforms that need to crawl your SPA:
```bash
# Facebook, LINE, Twitter, etc. can access:
https://your-worker.your-subdomain.workers.dev/content/https://your-spa.com/article/123
```
### API Integration
Integrate with your applications:
```javascript
// Fetch rendered HTML content
const response = await fetch('https://your-worker.workers.dev/content/https://example.com');
const htmlContent = await response.text();
// Use the HTML content in your application
document.getElementById('content').innerHTML = htmlContent;
```
### Webhook/Automation
Use in automation workflows:
```bash
# Get rendered content for processing
curl "https://your-worker.workers.dev/content/https://news-site.com/article/123" \
| grep -o '
```
### Error Response
```http
HTTP/1.1 400 Bad Request
Content-Type: application/json
Access-Control-Allow-Origin: *
{
"success": false,
"error": "Invalid URL format",
"url": "invalid-url"
}
```