Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/luminati-io/scraping-browser
Scraping Browser is an automated headless browser for effortless web scraping with Puppeteer, Selenium, and Playwright.
https://github.com/luminati-io/scraping-browser
captcha-solving headless-browser headless-browsers javascript nodejs playwright proxy-server puppeteer python scraping-browser selenium web-scraping
Last synced: 5 days ago
JSON representation
Scraping Browser is an automated headless browser for effortless web scraping with Puppeteer, Selenium, and Playwright.
- Host: GitHub
- URL: https://github.com/luminati-io/scraping-browser
- Owner: luminati-io
- Created: 2025-02-05T07:55:32.000Z (8 days ago)
- Default Branch: main
- Last Pushed: 2025-02-05T08:27:27.000Z (8 days ago)
- Last Synced: 2025-02-05T09:29:41.311Z (8 days ago)
- Topics: captcha-solving, headless-browser, headless-browsers, javascript, nodejs, playwright, proxy-server, puppeteer, python, scraping-browser, selenium, web-scraping
- Homepage: https://brightdata.com/products/scraping-browser
- Size: 18.6 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🚀 Scraping Browser by Bright Data
*A fully automated headless browser solution for dynamic web scraping with Puppeteer, Selenium, and Playwright. The Scraping Browser is opened as a GUI on Bright Data's Infrastructure.*
[![Promo](https://github.com/luminati-io/LinkedIn-Scraper/raw/main/Proxies%20and%20scrapers%20GitHub%20bonus%20banner.png)](https://brightdata.com/products/scraping-browser)
🔗 **[Get Started for Free](https://brightdata.com/products/scraping-browser)** | 📖 **[Official Documentation](https://docs.brightdata.com/scraping-automation/scraping-browser/introduction)**
---
## 🔹 Overview
Scraping Browser is a fully-hosted **browser-based scraping solution** that automates **multi-step data collection** with **built-in proxy management**, **CAPTCHA solving**, and **advanced website unblocking**. It supports **Puppeteer, Playwright, and Selenium**, enabling effortless web automation at an **unlimited scale**.## ✅ Why Use Scraping Browser?
- **No Infrastructure Overhead** – Run and scale browser sessions via API without maintaining browser infrastructure.
- **Built-in Unlocking** – Auto-handles **CAPTCHAs, fingerprinting, retries, and JS rendering** under the hood.
- **Multi-Step Navigation** – Automate clicks, scrolling, form submissions, and hover interactions.
- **Unlimited Scaling** – Launch **thousands of concurrent browser sessions** without rate limits.
- **Global Geo-Access** – Unlock localized content with [**72M+ residential IPs across 195 countries**](https://brightdata.com/proxy-types/residential-proxies).
- **Seamless Debugging** – Monitor sessions in real-time with **Chrome DevTools integration**.---
## 🚀 Getting Started
### Install Dependencies
Ensure you have **Node.js**, **Python**, or **C#** installed along with your preferred web automation library:```sh
# Install Puppeteer
npm install puppeteer-core# Install Playwright
npm install playwright# Install Selenium
pip install selenium
```## 🔧 Usage Examples
### Puppeteer Example (JavaScript)
```js
const puppeteer = require('puppeteer-core');const SBR_WS_ENDPOINT = 'wss://brd-customer--zone-:@brd.superproxy.io:9222';
async function main() {
console.log('Connecting to Scraping Browser...');
const browser = await puppeteer.connect({ browserWSEndpoint: SBR_WS_ENDPOINT });
try {
const page = await browser.newPage();
console.log('Connected! Navigating to example.com...');
await page.goto('https://example.com');
console.log('Scraping page content...');
const html = await page.content();
console.log(html);
} finally {
await browser.close();
}
}main().catch(err => console.error(err.stack || err));
```> **💡 Learn more about [web scraping with Puppeteer](https://brightdata.com/blog/how-tos/web-scraping-puppeteer)**
### Playwright Example (Python)
```python
import asyncio
from playwright.async_api import async_playwrightSBR_WS_CDP = 'wss://brd-customer--zone-:@brd.superproxy.io:9222'
async def run(pw):
print('Connecting to Scraping Browser...')
browser = await pw.chromium.connect_over_cdp(SBR_WS_CDP)
try:
page = await browser.new_page()
print('Connected! Navigating to example.com...')
await page.goto('https://example.com')
print('Scraping page content...')
html = await page.content()
print(html)
finally:
await browser.close()async def main():
async with async_playwright() as playwright:
await run(playwright)if __name__ == '__main__':
asyncio.run(main())
```> **💡 Learn more about [web scraping with Playwright](https://brightdata.com/blog/how-tos/playwright-web-scraping)**
### Selenium Example (JavaScript)
```js
const { Builder, Browser } = require('selenium-webdriver');const SBR_WEBDRIVER = 'https://brd-customer--zone-:@brd.superproxy.io:9515';
async function main() {
console.log('Connecting to Scraping Browser...');
const driver = await new Builder().forBrowser(Browser.CHROME).usingServer(SBR_WEBDRIVER).build();
try {
console.log('Connected! Navigating to example.com...');
await driver.get('https://example.com');
console.log('Scraping page content...');
const html = await driver.getPageSource();
console.log(html);
} finally {
driver.quit();
}
}main().catch(err => console.error(err.stack || err));
```> **💡 Learn more about [web scraping with Selenium](https://brightdata.com/blog/how-tos/using-selenium-for-web-scraping)**
## 🔥 Advanced Features
### Debugging with Chrome DevTools
Monitor browser sessions in real-time:
```js
const { exec } = require('child_process');const chromeExecutable = 'google-chrome';
const openDevtools = async (page, client) => {
const frameId = page.mainFrame()._id;
const { url: inspectUrl } = await client.send('Page.inspect', { frameId });
exec(`"${chromeExecutable}" "${inspectUrl}"`, error => {
if (error) throw new Error('Unable to open devtools: ' + error);
});
};
```### CAPTCHA Solving
```js
const client = await page.target().createCDPSession();
const { status } = await client.send('Captcha.solve', { detectTimeout: 30000 });
```> **🤖 Learn more about our [CAPTCHA Solver](https://github.com/luminati-io/Captcha-solver).**
## 🔄 Automatic IP Rotation & Unlocking
Scraping Browser automatically rotates IPs thanks to the integrated [rotating proxies](https://brightdata.com/solutions/rotating-proxies) and handles retries for seamless data collection.## 💰 Pricing
### Flexible Plans
- **Pay-As-You-Go:** $8.40/GB – No commitment.
- **Growth Plan:** $7.14/GB – Ideal for teams.
- **Business Plan:** $6.30/GB – For scaling operations.
- **Enterprise:** Custom pricing for high-volume needs.**Sign up now and get your first deposit matched up to $500!**
[View Pricing](https://brightdata.com/pricing/scraping-browser)
## ❓ Frequently Asked Questions
### What makes Scraping Browser different from a standard headless browser?
Scraping Browser is a fully managed, GUI-based browser that runs on Bright Data's infrastructure and automatically unlocks even the most protected sites.### How does Scraping Browser handle bot detection?
It automates fingerprinting, CAPTCHA solving, retries, and mimics real user behavior to prevent detection.### Is Scraping Browser compatible with Puppeteer, Playwright, and Selenium?
Yes! It seamlessly integrates with all major web automation tools.### When should I use Scraping Browser instead of a proxy?
Use Scraping Browser when you need JavaScript rendering, interactive actions (clicks, scrolls), and multi-step navigation.