https://github.com/autoscrape-labs/pydoll

Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.
https://github.com/autoscrape-labs/pydoll
anti-detection asynchronous bot-detection browser-automation bypasscaptcha captcha-breaking cdp chromium playwright puppeteer python recaptcha-v3 selenium selenium-python turnstile-bypass webdriver webscraping
Last synced: 26 days ago
JSON representation
Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.
Host: GitHub
URL: https://github.com/autoscrape-labs/pydoll
Owner: autoscrape-labs
License: mit
Created: 2024-10-27T15:46:43.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-05-01T02:14:47.000Z (10 months ago)
Last Synced: 2025-05-01T03:25:03.898Z (10 months ago)
Topics: anti-detection, asynchronous, bot-detection, browser-automation, bypasscaptcha, captcha-breaking, cdp, chromium, playwright, puppeteer, python, recaptcha-v3, selenium, selenium-python, turnstile-bypass, webdriver, webscraping
Language: Python
Homepage: https://autoscrape-labs.github.io/pydoll/
Size: 1.6 MB
Stars: 3,474
Watchers: 32
Forks: 197
Open Issues: 12
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project

StarryDivineSky - autoscrape-labs/pydoll
README

          


     




 Pydoll: The Evasion-First Web Automation Framework
  A 100% Typed, async-native automation library built for modern bot evasion and high-performance scraping. 




    

    

        

    

    

    

    

    

    



   📖 Full Documentation •   🚀 Getting Started •   ⚡ Advanced Features •   🧠 Deep Dives •   💖 Support This Project 


Pydoll is built on a simple philosophy: powerful automation shouldn't require you to fight the browser.

Forget broken `webdrivers`, compatibility issues, or being blocked by `navigator.webdriver=true`. Pydoll connects directly to the Chrome DevTools Protocol (CDP), providing a natively asynchronous, robust, and **fully typed** architecture.

It's designed for modern scraping, combining an **intuitive high-level API** (for productivity) with **deep-level control** over the network and browser behavior (for evasion), allowing you to bypass complex anti-bot defenses.

### Sponsors







Pydoll is proudly sponsored by **[Thordata](https://www.thordata.com/?ls=github&lk=pydoll)**: a residential proxy network built for serious web scraping and automation. With **190+ real residential and ISP locations**, fully encrypted connections, and infrastructure optimized for high-performance workflows, Thordata is an excellent choice for scaling your Pydoll automations.

**[Sign up through our link](https://www.thordata.com/?ls=github&lk=pydoll)** to support the project and get **1GB free** to get started.

---







Pydoll excels at behavioral evasion, but it doesn't solve captchas. That's where **[CapSolver](https://dashboard.capsolver.com/passport/register?inviteCode=WPhTbOsbXEpc)** comes in. An AI-powered service that handles reCAPTCHA, Cloudflare challenges, and more, seamlessly integrating with your automation workflows.

**[Register with our invite code](https://dashboard.capsolver.com/passport/register?inviteCode=WPhTbOsbXEpc)** and use code **PYDOLL** to get an extra **6% balance bonus**.

---

### The Pydoll Philosophy

* **Stealth-by-Design:** Pydoll is built for evasion. Our [human-like interactions](https://pydoll.tech/docs/features/automation/human-interactions/) simulate real user clicks, typing, and scrolling to pass behavioral analysis, while granular [Browser Preferences](https://pydoll.tech/docs/features/configuration/browser-preferences/) control lets you patch your browser fingerprint.

* **Async & Typed Architecture:** Built from the ground up on `asyncio` and **100% type-checked** with `mypy`. This means top-tier I/O performance for concurrent tasks and a fantastic Developer Experience (DX) with autocompletion and error-checking in your IDE.

* **Total Network Control:** Go beyond basic HTTP proxies. Pydoll gives you tools to [intercept](https://pydoll.tech/docs/features/network/interception/) (to block ads/trackers) and [monitor](https://pydoll.tech/docs/features/network/monitoring/) traffic, plus [deep documentation](https://pydoll.tech/docs/deep-dive/network/socks-proxies/) on why SOCKS5 is essential to prevent DNS leaks.

* **Hybrid Automation (The Game-Changer):** Use the UI automation to log in, then use `tab.request` to make blazing-fast API calls that [inherit the entire browser session](https://pydoll.tech/docs/features/network/http-requests/).

* **Ergonomics Meets Power:** Easy for the simple, powerful for the complex. Use `tab.find()` for 90% of cases and `tab.query()` for complex [CSS/XPath selectors](https://pydoll.tech/docs/deep-dive/guides/selectors-guide/).

## 📦 Installation

```bash

pip install pydoll-python

```

That's it. No `webdrivers`. No external dependencies.

## 🆕 What's New

Shadow DOM Support: Access Closed Shadow Roots with Zero Effort




Pydoll now provides **full Shadow DOM support**, automatically handling both open and closed shadow roots — something traditional automation tools can't do. Because Pydoll operates at the CDP level (below JavaScript), the `closed` mode restriction simply doesn't apply.

```python

# Get the shadow root of a specific element

shadow = await element.get_shadow_root()

button = await shadow.query('.internal-btn')

await button.click()

# Or discover ALL shadow roots on the page at once

shadow_roots = await tab.find_shadow_roots()

for sr in shadow_roots:

    checkbox = await sr.query('input[type="checkbox"]', raise_exc=False)

    if checkbox:

        await checkbox.click()

```

**Key highlights:**

- **Closed shadow roots just work** — no workarounds, no hacks

- **`find_shadow_roots()`** discovers every shadow root on the page, even when you don't know the host selector

- **`timeout` parameter** for polling until shadow roots appear asynchronously — works on both `find_shadow_roots()` and `get_shadow_root()`

- **`deep=True`** traverses cross-origin iframes (OOPIFs) — essential for widgets like Cloudflare Turnstile captchas

- **Same familiar API** — use `find()`, `query()`, and `click()` inside shadow roots just like anywhere else

```python

# Real-world example: Cloudflare Turnstile inside a cross-origin iframe

shadow_roots = await tab.find_shadow_roots(deep=True, timeout=10)

for sr in shadow_roots:

    checkbox = await sr.query('input[type="checkbox"]', raise_exc=False)

    if checkbox:

        await checkbox.click()

```

[**📖 Shadow DOM Docs**](https://pydoll.tech/docs/deep-dive/architecture/shadow-dom/)

Humanized Keyboard Input




Pydoll's typing engine simulates realistic human typing behavior out of the box:

- **Variable keystroke timing**: 30-120ms between keys (not fixed intervals)

- **Realistic typos**: ~2% error rate with automatic correction behavior

```python

# Realistic typing by default

await element.type_text("hello")

# Opt out when speed is critical

await element.type_text("hello", humanize=False)

```

Humanized Scroll with Physics Engine




The scroll API features a **Cubic Bezier curve physics engine** for realistic scrolling:

- **Momentum & friction**: Natural acceleration and deceleration

- **Micro-pauses**: Brief stops during long scrolls (simulates reading)

- **Jitter injection**: Small random variations in scroll path

- **Overshoot correction**: Occasionally scrolls past target and corrects back

```python

# Humanized by default (physics engine, anti-bot)

await tab.scroll.by(ScrollPosition.DOWN, 500)

await tab.scroll.to_bottom()

# CSS smooth scroll (predictable timing)

await tab.scroll.by(ScrollPosition.DOWN, 500, humanize=False, smooth=True)

```

| Mode | Parameter | Use Case |

|------|-----------|----------|

| **Humanized** | default | **Anti-bot evasion** |

| **Smooth** | `humanize=False, smooth=True` | General browsing simulation |

| **Instant** | `humanize=False, smooth=False` | Speed-critical operations |

[**📖 Human-Like Interactions Docs**](https://pydoll.tech/docs/features/automation/human-interactions/)

Humanized Mouse Movement




All mouse operations produce **human-like cursor movement** by default, using a multi-layered simulation pipeline:

- **Bezier curve paths**: Curved trajectories with asymmetric control points

- **Fitts's Law timing**: Movement duration scales naturally with distance

- **Minimum-jerk velocity**: Bell-shaped speed profile (slow start, peak, slow end)

- **Physiological tremor**: Gaussian noise (σ ≈ 1px) scaled inversely with velocity

- **Overshoot correction**: ~70% chance of overshooting fast movements, then correcting back

```python

# All operations are humanized by default

await tab.mouse.move(500, 300)

await tab.mouse.click(500, 300)

await tab.mouse.drag(100, 200, 500, 400)

# Element clicks also use realistic Bezier curve movement

button = await tab.find(id='submit')

await button.click()

# Opt out when speed matters

await tab.mouse.click(500, 300, humanize=False)

```

[**📖 Mouse Control Docs**](https://pydoll.tech/docs/features/automation/mouse-control/)

## 🚀 Getting Started in 60 Seconds

Thanks to its `async` architecture and context managers, Pydoll is clean and efficient.

```python

import asyncio

from pydoll.browser import Chrome

from pydoll.constants import Key

async def google_search(query: str):

    # Context manager handles browser start() and stop()

    async with Chrome() as browser:

        tab = await browser.start()

        await tab.go_to('https://www.google.com')

        # Intuitive finding API: find by HTML attributes

        search_box = await tab.find(tag_name='textarea', name='q')

        # "Human-like" interactions simulate typing

        await search_box.insert_text(query)

        await tab.keyboard.press(Key.ENTER)

        # Find by text and click (simulates mouse movement)

        first_result = await tab.find(

            tag_name='h3',

            text='autoscrape-labs/pydoll', # Supports partial text matching

            timeout=10,

        )

        await first_result.click()

        # Wait for an element to confirm navigation

        await tab.find(id='repository-container-header', timeout=10)

        print(f"Page loaded: {await tab.title}")

asyncio.run(google_search('pydoll site:github.com'))

```

## ⚡ The Pydoll Feature Ecosystem

Pydoll is a complete toolkit for professional automation.

1. Hybrid Automation (UI + API): The Game-Changer




Tired of manually extracting and managing cookies to use `requests` or `httpx`? Pydoll solves this.

Use the UI automation to pass a complex login (with CAPTCHAs, JS challenges, etc.) and then use `tab.request` to make **authenticated** API calls that automatically inherit all cookies, headers, and session state from the browser. It's the best of both worlds: the robustness of UI automation for auth, and the speed of direct API calls for data extraction.

```python

# 1. Log in via the UI (handles CAPTCHAs, JS, etc.)

await tab.go_to('https://my-site.com/login')

await (await tab.find(id='username')).type_text('user')

await (await tab.find(id='password')).type_text('pass123')

await (await tab.find(id='login-btn')).click()

# 2. Now, use the browser's session to hit the API!

# This request automatically INHERITS the login cookies

response = await tab.request.get('https://my-site.com/api/user/profile')

user_data = response.json()

print(f"Welcome, {user_data['name']}!")

```

[**📖 Read more about Hybrid Automation**](https://pydoll.tech/docs/features/network/http-requests/)

2. Total Network Control: Monitor & Intercept Traffic




    

Take full control of the network stack. Pydoll allows you to not only **monitor** traffic for reverse-engineering APIs but also to **intercept** requests in real-time.

Use this to block ads, trackers, images, or CSS to dramatically speed up your scraping and save bandwidth, or even to modify headers and mock API responses for testing.

```python

import asyncio

from pydoll.browser.chromium import Chrome

from pydoll.protocol.fetch.events import FetchEvent, RequestPausedEvent

from pydoll.protocol.network.types import ErrorReason

async def block_images():

    async with Chrome() as browser:

        tab = await browser.start()

        async def block_resource(event: RequestPausedEvent):

            request_id = event['params']['requestId']

            resource_type = event['params']['resourceType']

            url = event['params']['request']['url']

            # Block images and stylesheets

            if resource_type in ['Image', 'Stylesheet']:

                await tab.fail_request(request_id, ErrorReason.BLOCKED_BY_CLIENT)

            else:

                # Continue other requests

                await tab.continue_request(request_id)

        await tab.enable_fetch_events()

        await tab.on(FetchEvent.REQUEST_PAUSED, block_resource)

        await tab.go_to('https://example.com')

        await asyncio.sleep(3)

        await tab.disable_fetch_events()

asyncio.run(block_images())

```

[**📖 Network Monitoring Docs**](https://pydoll.tech/docs/features/network/monitoring/) | [**📖 Request Interception Docs**](https://pydoll.tech/docs/features/network/interception/)

3. Deep Browser Control: The Fingerprint Evasion Manual




A `User-Agent` isn't enough. Pydoll gives you granular control over [Browser Preferences](https://pydoll.tech/docs/features/configuration/browser-preferences/), allowing you to modify hundreds of internal Chrome settings to build a robust and consistent fingerprint.

Our documentation doesn't just give you the tool; it [explains in detail](https://pydoll.tech/docs/deep-dive/fingerprinting/browser-fingerprinting/) how `canvas`, WebGL, font, and timezone fingerprinting works, and how to use these preferences to defend your automation.

```python

options = ChromiumOptions()

# Create a realistic and clean browser profile

options.browser_preferences = {

    'profile': {

        'default_content_setting_values': {

            'notifications': 2,       # Block notification popups

            'geolocation': 2,        # Block location requests

        },

        'password_manager_enabled': False # Disable "save password" prompt

    },

    'intl': {

        'accept_languages': 'en-US,en', # Make consistent with your proxy IP

    },

    'browser': {

        'check_default_browser': False,   # Don't ask to be default browser

    }

}

```

[**📖 Full Guide to Browser Preferences**](https://pydoll.tech/docs/features/configuration/browser-preferences/)

4. Built for Scale: Concurrency, Contexts & Remote Connections




Pydoll is built for scale. Its `async` architecture allows you to manage [multiple tabs](https://pydoll.tech/docs/features/browser-management/tabs/) and [browser contexts](https://pydoll.tech/docs/features/browser-management/contexts/) (isolated sessions) in parallel.

Furthermore, Pydoll excels in production architectures. You can run your browser in a Docker container and **connect to it remotely** from your Python script, decoupling the controller from the worker. Our documentation includes guides on [how to set up your own remote server](https://pydoll.tech/docs/features/advanced/remote-connections/).

```python

# Example: Scrape 2 sites in parallel

async def scrape_page(url, tab):

    await tab.go_to(url)

    return await tab.title

async def concurrent_scraping():

    async with Chrome() as browser:

        tab_google = await browser.start()

        tab_ddg = await browser.new_tab() # Create a new tab

        # Execute both scraping tasks concurrently

        tasks = [

            scrape_page('https://google.com/', tab_google),

            scrape_page('https://duckduckgo.com/', tab_ddg)

        ]

        results = await asyncio.gather(*tasks)

        print(results)

```

[**📖 Multi-Tab Management Docs**](https://pydoll.tech/docs/features/browser-management/tabs/) | [**📖 Remote Connection Docs**](https://pydoll.tech/docs/features/advanced/remote-connections/)

5. Robust Engineering: `@retry` Decorator & 100% Typed




**Reliable Engineering:** Pydoll is **fully typed**, providing a fantastic Developer Experience (DX) with full autocompletion in your IDE and error-checking before you even run your code. [Read about our Type System](https://pydoll.tech/docs/deep-dive/fundamentals/typing-system/).

**Robust-by-Design:** The `@retry` decorator turns fragile scripts into production-ready automations. It doesn't just "try again"; it lets you execute custom **recovery logic** (`on_retry`), like refreshing the page or rotating a proxy, before the next attempt.

```python

from pydoll.decorators import retry

from pydoll.exceptions import ElementNotFound, NetworkError

@retry(

    max_retries=3,

    exceptions=[ElementNotFound, NetworkError], # Only retry on these specific errors

    on_retry=my_recovery_function,          # Run your custom recovery logic

    exponential_backoff=True              # Wait 2s, 4s, 8s...

)

async def scrape_product(self, url: str):

    # ... your scraping logic ...

```

[**📖 `@retry` Decorator Docs**](https://pydoll.tech/docs/features/advanced/decorators/)

---

## 🧠 More Than an API: A Knowledge Base

Pydoll is not a black box. We believe that to defeat anti-bot systems, you must understand them. Our documentation is one of the most comprehensive public resources on the subject, teaching you not just the "how," but the "why."

### 1. The Battle Against Fingerprinting (Strategic Guide)

Understand how bots are detected and how Pydoll is designed to win.

* **[Evasion Techniques Guide](https://pydoll.tech/docs/deep-dive/fingerprinting/evasion-techniques/)**: Our unified 3-layer evasion strategy.

* **[Network Fingerprinting](https://pydoll.tech/docs/deep-dive/fingerprinting/network-fingerprinting/)**: How your IP, TTL, and TLS (JA3) headers give you away.

* **[Browser Fingerprinting](https://pydoll.tech/docs/deep-dive/fingerprinting/browser-fingerprinting/)**: How `canvas`, WebGL, and fonts create your unique ID.

* **[Behavioral Fingerprinting](https://pydoll.tech/docs/deep-dive/fingerprinting/behavioral-fingerprinting/)**: Why mouse/keyboard telemetry is the new front line of detection.

### 2. The Advanced Networking Manual (The Foundation)

Proxies are more than just `--proxy-server`.

* **[HTTP vs. SOCKS5](https://pydoll.tech/docs/deep-dive/network/socks-proxies/)**: Why SOCKS5 is superior (it solves DNS leaks).

* **[Proxy Detection](https://pydoll.tech/docs/deep-dive/network/proxy-detection/)**: How sites know you're using a proxy (WebRTC Leaks).

* **[Build Your Own Proxy](https://pydoll.tech/docs/deep-dive/network/build-proxy/)**: Yes, we even teach you how to build your own SOCKS5 proxy server in Python.

### 3. Transparent Architecture (Software Engineering)

Software engineering you can trust.

* **[Domain-Driven Design (OOP)](https://pydoll.tech/docs/deep-dive/architecture/browser-domain/)**: The clean architecture behind `Browser`, `Tab`, and `WebElement`.

* **[The FindElements Mixin](https://pydoll.tech/docs/deep-dive/architecture/find-elements-mixin/)**: The magic behind the intuitive `find()` API.

* **[The Connection Layer](https://pydoll.tech/docs/deep-dive/fundamentals/connection-layer/)**: How Pydoll manages `asyncio` and the CDP.

---

## 🤝 Contributing

We would love your help to make Pydoll even better! Check out our [contribution guidelines](CONTRIBUTING.md) to get started.

## 💖 Support This Project

If you find Pydoll useful, consider [sponsoring my work on GitHub](https://github.com/sponsors/thalissonvs). Every contribution helps keep the project alive and drives new features!

## 📄 License

Pydoll is licensed under the [MIT License](LICENSE).



  Pydoll — Web automation, taken seriously.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/autoscrape-labs/pydoll

Awesome Lists containing this project

README

Pydoll: The Evasion-First Web Automation Framework