An open API service indexing awesome lists of open source software.

https://github.com/7a6163/browser-worker

A Cloudflare Worker that uses browser rendering to extract Open Graph (OG) meta data from Single Page Applications (SPAs) for social media sharing on Facebook, LINE, and other platforms.
https://github.com/7a6163/browser-worker

browser-rendering cloudflare-workers puppeteer serverless

Last synced: 8 months ago
JSON representation

A Cloudflare Worker that uses browser rendering to extract Open Graph (OG) meta data from Single Page Applications (SPAs) for social media sharing on Facebook, LINE, and other platforms.

Awesome Lists containing this project

README

          

# Browser Worker - Fast HTML Content Extractor

A high-performance Cloudflare Worker that uses browser rendering to extract HTML content from Single Page Applications (SPAs) and dynamic websites. Optimized with session reuse and intelligent caching for maximum speed.

## 🎯 Purpose

Many modern web applications (React, Vue, Angular) generate their content dynamically using JavaScript. This Worker solves the problem by:
- Using a real browser to render JavaScript-heavy pages completely
- Extracting the fully rendered HTML content
- Providing fast access through intelligent session reuse and caching
- Supporting both social media crawlers and API consumers

## 🚀 Features

- **⚡ High Performance** - Optimized session reuse and connection pooling for 3-5x faster startup
- **🧠 Smart Caching** - Intelligent session management with automatic cleanup
- **🌐 Full Browser Rendering** - Uses Puppeteer to execute JavaScript and render SPAs completely
- **📦 KV Caching** - Optional HTML content caching with configurable TTL
- **🛡️ Resource Optimization** - Blocks unnecessary resources (CSS, fonts, images) for faster loading
- **🔧 Error Handling** - Robust error handling with optimized timeouts
- **🌍 CORS Support** - Can be called from frontend applications

## 📋 Requirements

- Cloudflare Workers account
- Browser Rendering enabled (Puppeteer binding)
- Node.js compatibility flag enabled

## 🛠️ Installation

1. Clone this repository:
```bash
git clone https://github.com/7a6163/browser-worker.git
cd browser-worker
```

2. Install dependencies:
```bash
npm install
```

3. Configure your `wrangler.jsonc`:
```json
{
"name": "browser-worker",
"main": "src/index.ts",
"compatibility_date": "2025-07-05",
"compatibility_flags": ["nodejs_compat"],
"browser": {
"binding": "MYBROWSER"
}
}
```

4. Deploy to Cloudflare Workers:
```bash
npm run deploy
```

## 🔧 Usage

### Basic Usage

The Worker uses a simple `/content/{url}` endpoint to extract HTML content:

```bash
# Extract HTML content from any URL
curl "https://your-worker.your-subdomain.workers.dev/content/https://example.com"

# For URLs with special characters, URL-encode them
curl "https://your-worker.your-subdomain.workers.dev/content/https%3A%2F%2Fexample.com%2Fpath%3Fquery%3Dvalue"
```

### Response Format

The Worker returns the fully rendered HTML content with proper headers:

```http
Content-Type: text/html;charset=UTF-8
Access-Control-Allow-Origin: *
```

### Error Handling

Invalid requests return JSON error responses:

```json
{
"success": false,
"error": "Invalid URL format",
"url": "invalid-url"
}
```

### Local Development

```bash
# Start development server with remote browser support
npm run dev

# Or use wrangler directly
wrangler dev --remote
```

### Performance Optimizations

This Worker includes several performance optimizations:

- **Session Reuse**: Browser sessions are kept alive and reused across requests
- **Connection Pooling**: Maintains up to 5 concurrent sessions for optimal performance
- **Resource Blocking**: Automatically blocks CSS, fonts, and images for faster loading
- **Optimized Timeouts**: Reduced wait times while maintaining reliability
- **Intelligent Caching**: Sessions are cached for 30 minutes with automatic cleanup

## 📱 Integration Examples

### Social Media Crawlers

For social media platforms that need to crawl your SPA:

```bash
# Facebook, LINE, Twitter, etc. can access:
https://your-worker.your-subdomain.workers.dev/content/https://your-spa.com/article/123
```

### API Integration

Integrate with your applications:

```javascript
// Fetch rendered HTML content
const response = await fetch('https://your-worker.workers.dev/content/https://example.com');
const htmlContent = await response.text();

// Use the HTML content in your application
document.getElementById('content').innerHTML = htmlContent;
```

### Webhook/Automation

Use in automation workflows:

```bash
# Get rendered content for processing
curl "https://your-worker.workers.dev/content/https://news-site.com/article/123" \
| grep -o '



```

### Error Response

```http
HTTP/1.1 400 Bad Request
Content-Type: application/json
Access-Control-Allow-Origin: *

{
"success": false,
"error": "Invalid URL format",
"url": "invalid-url"
}
```

Click here if you are not redirected automatically