https://github.com/7a6163/browser-worker

A Cloudflare Worker that uses browser rendering to extract Open Graph (OG) meta data from Single Page Applications (SPAs) for social media sharing on Facebook, LINE, and other platforms.
https://github.com/7a6163/browser-worker

browser-rendering cloudflare-workers puppeteer serverless

Last synced: 2 days ago
JSON representation

A Cloudflare Worker that uses browser rendering to extract Open Graph (OG) meta data from Single Page Applications (SPAs) for social media sharing on Facebook, LINE, and other platforms.

Host: GitHub
URL: https://github.com/7a6163/browser-worker
Owner: 7a6163
License: mit
Created: 2025-07-05T12:36:55.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-07-06T08:48:00.000Z (11 months ago)
Last Synced: 2025-08-10T01:01:49.878Z (10 months ago)
Topics: browser-rendering, cloudflare-workers, puppeteer, serverless
Language: TypeScript
Homepage:
Size: 111 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Browser Worker - Fast HTML Content Extractor

A high-performance Cloudflare Worker that uses browser rendering to extract HTML content from Single Page Applications (SPAs) and dynamic websites. Optimized with session reuse and intelligent caching for maximum speed.

## 🎯 Purpose

Many modern web applications (React, Vue, Angular) generate their content dynamically using JavaScript. This Worker solves the problem by:
- Using a real browser to render JavaScript-heavy pages completely
- Extracting the fully rendered HTML content
- Providing fast access through intelligent session reuse and caching
- Supporting both social media crawlers and API consumers

## 🚀 Features

- **⚡ High Performance** - Optimized session reuse and connection pooling for 3-5x faster startup
- **🧠 Smart Caching** - Intelligent session management with automatic cleanup
- **🌐 Full Browser Rendering** - Uses Puppeteer to execute JavaScript and render SPAs completely
- **📦 KV Caching** - Optional HTML content caching with configurable TTL
- **🛡️ Resource Optimization** - Blocks unnecessary resources (CSS, fonts, images) for faster loading
- **🔧 Error Handling** - Robust error handling with optimized timeouts
- **🌍 CORS Support** - Can be called from frontend applications

## 📋 Requirements

- Cloudflare Workers account
- Browser Rendering enabled (Puppeteer binding)
- Node.js compatibility flag enabled

## 🛠️ Installation

1. Clone this repository:
```bash
git clone https://github.com/7a6163/browser-worker.git
cd browser-worker
```

2. Install dependencies:
```bash
npm install
```

3. Configure your `wrangler.jsonc`:
```json
{
"name": "browser-worker",
"main": "src/index.ts",
"compatibility_date": "2025-07-05",
"compatibility_flags": ["nodejs_compat"],
"browser": {
"binding": "MYBROWSER"
}
}
```

4. Deploy to Cloudflare Workers:
```bash
npm run deploy
```

## 🔧 Usage

### Basic Usage

The Worker uses a simple `/content/{url}` endpoint to extract HTML content:

```bash
# Extract HTML content from any URL
curl "https://your-worker.your-subdomain.workers.dev/content/https://example.com"

# For URLs with special characters, URL-encode them
curl "https://your-worker.your-subdomain.workers.dev/content/https%3A%2F%2Fexample.com%2Fpath%3Fquery%3Dvalue"
```

### Response Format

The Worker returns the fully rendered HTML content with proper headers:

```http
Content-Type: text/html;charset=UTF-8
Access-Control-Allow-Origin: *
```

### Error Handling

Invalid requests return JSON error responses:

```json
{
"success": false,
"error": "Invalid URL format",
"url": "invalid-url"
}
```

### Local Development

```bash
# Start development server with remote browser support
npm run dev

# Or use wrangler directly
wrangler dev --remote
```

### Performance Optimizations

This Worker includes several performance optimizations:

- **Session Reuse**: Browser sessions are kept alive and reused across requests
- **Connection Pooling**: Maintains up to 5 concurrent sessions for optimal performance
- **Resource Blocking**: Automatically blocks CSS, fonts, and images for faster loading
- **Optimized Timeouts**: Reduced wait times while maintaining reliability
- **Intelligent Caching**: Sessions are cached for 30 minutes with automatic cleanup

## 📱 Integration Examples

### Social Media Crawlers

For social media platforms that need to crawl your SPA:

```bash
# Facebook, LINE, Twitter, etc. can access:
https://your-worker.your-subdomain.workers.dev/content/https://your-spa.com/article/123
```

### API Integration

Integrate with your applications:

```javascript
// Fetch rendered HTML content
const response = await fetch('https://your-worker.workers.dev/content/https://example.com');
const htmlContent = await response.text();

// Use the HTML content in your application
document.getElementById('content').innerHTML = htmlContent;
```

### Webhook/Automation

Use in automation workflows:

```bash
# Get rendered content for processing
curl "https://your-worker.workers.dev/content/https://news-site.com/article/123" \
| grep -o '

```

### Error Response

```http
HTTP/1.1 400 Bad Request
Content-Type: application/json
Access-Control-Allow-Origin: *

{
"success": false,
"error": "Invalid URL format",
"url": "invalid-url"
}
```

Click here if you are not redirected automatically

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/7a6163/browser-worker

Awesome Lists containing this project

README