https://github.com/luminati-io/undetected-chromedriver-web-scraping

How to use the undetected_chromedriver Python library to bypass anti-bot measures for web scraping, with step-by-step instructions, advanced tips, and Bright Data integration recommendations.
https://github.com/luminati-io/undetected-chromedriver-web-scraping
chromedriver python selenium undetected-chromedriver web-scraping
Last synced: 28 days ago
JSON representation
How to use the undetected_chromedriver Python library to bypass anti-bot measures for web scraping, with step-by-step instructions, advanced tips, and Bright Data integration recommendations.
Host: GitHub
URL: https://github.com/luminati-io/undetected-chromedriver-web-scraping
Owner: luminati-io
Created: 2025-02-25T10:15:45.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-02-25T11:34:08.000Z (8 months ago)
Last Synced: 2025-03-22T07:02:03.919Z (7 months ago)
Topics: chromedriver, python, selenium, undetected-chromedriver, web-scraping
Homepage: https://brightdata.com/blog/web-data/web-scraping-with-undetected-chromedriver
Size: 544 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # Using Undetected ChromeDriver for Web Scraping

[![Promo](https://github.com/luminati-io/LinkedIn-Scraper/raw/main/Proxies%20and%20scrapers%20GitHub%20bonus%20banner.png)](https://brightdata.com/) 

This guide explains how to use the Undetected ChromeDriver library for Python to bypass anti-bot systems for web scraping.

- [What Is Undetected ChromeDriver?](#what-is-undetected-chromedriver)

- [How It Works](#how-it-works)

- [Using Undetected ChromeDriver for Web Scraping: Step-by-Step Guide](#using-undetected-chromedriver-for-web-scraping-step-by-step-guide)

- [Advanced Usage of `undetected_chromedriver`](#advanced-usage-of-undetected_chromedriver)

- [Limitations of the `undetected_chromedriver` Library](#limitations-of-the-undetected_chromedriver-library)

## What Is Undetected ChromeDriver?

[Undetected ChromeDriver](https://github.com/ultrafunkamsterdam/undetected-chromedriver) is a Python library that offers a modified version of Selenium’s ChromeDriver. It minimizes browser "leaks" to reduce detection by anti-bot services like Imperva, DataDome, and Distil Networks, and can also help bypass some Cloudflare protections. This makes it especially useful for web scraping on sites with robust anti-scraping measures.

## How It Works

Undetected ChromeDriver minimizes detection by Cloudflare, Imperva, DataDome, and similar solutions through several techniques:

- **Variable Renaming**: It renames Selenium variables to mirror those used by genuine browsers.

- **Authentic User-Agent Strings**: It employs real-world User-Agent strings to avoid being flagged.

- **Simulated Human Interaction**: It allows for natural, human-like interactions.

- **Cookie & Session Management**: It properly manages cookies and sessions during browsing.

- **Proxy Support**: It enables the use of proxies to bypass IP blocking and rate limiting.

These strategies work together to help the browser controlled by the library effectively bypass anti-scraping defenses.

## Using Undetected ChromeDriver for Web Scraping: Step-by-Step Guide

Many websites implement sophisticated anti-bot measures to block automated scripts from accessing their content. As a result, these defenses are highly effective at stopping web scraping bots.

Let's scrape the title and description from the following [GoDaddy product page](https://www.godaddy.com/hosting/wordpress-hosting):

![The GoDaddy target page](https://github.com/luminati-io/undetected-chromedriver-web-scraping/blob/main/Images/image-54-1024x494.png)

With plain Selenium in Python, your scraping script will look like this:

```python

# pip install selenium

from selenium import webdriver

from selenium.webdriver.chrome.options import Options

from selenium.webdriver.chrome.service import Service

from selenium.webdriver.common.by import By

# configure a Chrome instance to start in headless mode

options = Options()

options.add_argument("--headless")

# create a Chrome web driver instance

driver = webdriver.Chrome(service=Service(), options=options)

# connect to the target page

driver.get("https://www.godaddy.com/hosting/wordpress-hosting")

# scraping logic...

# close the browser

driver.quit()

```

Running this script will fail because it will be blocked by an anti-bot solution (Akamai, in this case):

![An "Access Denied" page from GoDaddy](https://github.com/luminati-io/undetected-chromedriver-web-scraping/blob/main/Images/image-55.png)

To work around that, you need to use the `undetected_chromedriver` Python library.

### Step #1: Prerequisites and Project Setup

Undetected ChromeDriver has the following prerequisites:

- **Latest version of Chrome**

- **Python 3.6+**: If Python 3.6 or later is not installed on your machine, [download it from the official site](https://www.python.org/downloads/) and follow the installation instructions.

> **Note**:

>

> The library automatically downloads and patches the driver binary for you, so there is no need to manually download [`ChromeDriver`](https://developer.chrome.com/docs/chromedriver/downloads).

Now, use the following command to create a directory for your project:

```bash

mkdir undetected-chromedriver-scraper

```

The `undetected-chromedriver-scraper` directory will serve as the project folder for your Python scraper.

Navigate into it and initialize a [virtual environment](https://docs.python.org/3/library/venv.html):

```bash

cd undetected-chromedriver-scraper

python -m venv env

```

Open the project folder in your preferred Python IDE and create a `scraper.py` file inside the project folder, following the structure shown below:

![scraper.py in the project folder](https://github.com/luminati-io/undetected-chromedriver-web-scraping/blob/main/Images/image-56.png)

Activate the virtual environment. On Linux or macOS, use:

```bash

./env/bin/activate

```

For Windows, run:

```bash

env/Scripts/activate

```

### Step #2: Install Undetected ChromeDriver

In an activated virtual environment, install Undetected ChromeDriver:

```bash

pip install undetected_chromedriver 

```

### Step #3: Initial Setup

Import `undetected_chromedriver`:

```python

import undetected_chromedriver as uc

```

Initialize a Chrome WebDriver:

```python

driver = uc.Chrome()

```

Like Selenium, this tool launches a browser window that you can control using the Selenium API. The `driver` object supports all standard Selenium methods, plus some extra features.

> **Important**:

>

> The main distinction is that this patched Chrome driver is engineered to bypass certain anti-bot solutions.

Call the `quit()` method to close the driver:

```python

driver.quit() 

```

Here is a basic Undetected ChromeDriver setup:

```python

import undetected_chromedriver as uc

# Initialize a Chrome instance

driver = uc.Chrome()

# Scraping logic...

# Close the browser and release its resources

driver.quit()

```

### Step #4: Use It for Web Scraping

Use the `get()` method to navigate the browser to your target page:

```python

driver.get("https://www.godaddy.com/hosting/wordpress-hosting")

```

Next, visit the page in incognito mode in your browser and inspect the element you want to scrape:

![The DevTools inspection of the HTML elements to scrape data with](https://github.com/luminati-io/undetected-chromedriver-web-scraping/blob/main/Images/image-57-1024x287.png)

Let's extract the product title, tagline, and description. Here is how you can scrape all of these:

```python

headline_element = driver.find_element(By.CSS_SELECTOR, "[data-cy=\"headline\"]")

title_element = headline_element.find_element(By.CSS_SELECTOR, "h1")

title = title_element.text

tagline_element = headline_element.find_element(By.CSS_SELECTOR, "h2")

tagline = tagline_element.text

description_element = headline_element.find_element(By.CSS_SELECTOR, "[data-cy=\"description\"]")

description = description_element.text

```

Import `By` from Selenium to make the above code work:

```python

from selenium.webdriver.common.by import By

```

Store the scraped data in a Python dictionary:

```python

product = {

  "title": title,

  "tagline": tagline,

  "description": description

}

```

Finally, export the data to a JSON file:

```python

with open("product.json", "w") as json_file:

  json.dump(product, json_file, indent=4)

```

Import `json` from the Python standard library:

```python

import json

```

### Step #5: Put It All Together

This is the final scraping script:

```python

import undetected_chromedriver as uc

from selenium.webdriver.common.by import By

import json

# Create a Chrome web driver instance

driver = uc.Chrome()

# Connect to the target page

driver.get("https://www.godaddy.com/hosting/wordpress-hosting")

# Scraping logic

headline_element = driver.find_element(By.CSS_SELECTOR, "[data-cy=\"headline\"]")

title_element = headline_element.find_element(By.CSS_SELECTOR, "h1")

title = title_element.text

tagline_element = headline_element.find_element(By.CSS_SELECTOR, "h2")

tagline = tagline_element.text

description_element = headline_element.find_element(By.CSS_SELECTOR, "[data-cy=\"description\"]")

description = description_element.text

# Populate a dictionary with the scraped data

product = {

  "title": title,

  "tagline": tagline,

  "description": description

}

# Export the scraped data to JSON

with open("product.json", "w") as json_file:

  json.dump(product, json_file, indent=4)

# Close the browser and release its resources

driver.quit() 

```

Execute it:

```bash

python3 scraper.py

```

Or, on Windows:

```bash

python scraper.py

```

This will open a browser showing the target web page:

![a browser showing the target web page](https://github.com/luminati-io/undetected-chromedriver-web-scraping/blob/main/Images/image-58-1024x547.png)

The script will extract data from the page and produce the following `product.json` file:

```json

{

    "title": "Managed WordPress Hosting",

    "tagline": "Get WordPress hosting — simplified",

    "description": "We make it easier to create, launch, and manage your WordPress site"

}

```

## Advanced Usage of `undetected_chromedriver`:

### Choosing a Specific Chrome Version

You can specify a particular version of Chrome for the library to use by setting the `version_main` argument:

```python

import undetected_chromedriver as uc

# Specify the target version of Chrome

driver = uc.Chrome(version_main=105)

```

The library also works with other Chromium-based browsers, but that requires some additional tweaking.

### The `with`Syntax

Use the [`with`](https://docs.python.org/3/reference/compound_stmts.html#with) syntax to avoid manually calling the `quit()` method when you no longer need the driver:

```python

import undetected_chromedriver as uc

with uc.Chrome() as driver:

    driver.get("")

```

When the code inside the `with` block completes, Python will automatically close the browser for you.

> **Note**:

>

> This syntax is supported starting from version 3.1.0.

### Proxy Integration

The syntax for adding a proxy to Undetected ChromeDriver is similar to regular Selenium. Simply pass your proxy URL to the `--proxy-server` flag as shown below:

```python

import undetected_chromedriver as uc

proxy_url = ""

options = uc.ChromeOptions()

options.add_argument(f"--proxy-server={proxy}")

```

> **Note**:

> 

> Chrome does not support authenticated proxies through the `--proxy-server` flag.

### Extended API

The `undetected_chromedriver` library has some extra methods that extend regular Selenium functionality:

- `WebElement.click_safe()`: Use it when clicking a link causes detection.

- `WebElement.children(tag=None, recursive=False)`: Use it to easily find child elements. For example:

```python

# Get the 6th child (of any tag) within the body, then find all  elements recursively

images = body.children()[6].children("img", True)

```

## Limitations of the `undetected_chromedriver` Library

While `undetected_chromedriver` is a powerful Python library, it does have some known limitations. Here are the most important ones you should be aware of!

### IP Blocks

The GitHub page for the library clearly states: "This package does not hide your IP address". Running your script from a datacenter may still result in detection, and a poorly regarded home IP can also lead to blocks.

![IP Blocks Warning on GitHub](https://github.com/luminati-io/undetected-chromedriver-web-scraping/blob/main/Images/image-59.png)

To hide your IP, you must integrate the controlled browser with a proxy server, as demonstrated earlier.

### No Support for GUI Navigation

Due to how the module works, you need to navigate programmatically using the `get()` method. Avoid manual navigation through the browser GUI, as using your keyboard or mouse increases the risk of detection.

This rule also applies when managing new tabs. If you require multiple tabs, open a new one with a blank page by using the URL `data:,` (including the comma), which the driver accepts. Then, continue with your normal automation workflow.

Following these guidelines will help reduce detection and ensure smoother web scraping sessions.

### Limited Support for Headless Mode

Since version 3.4.5, The `undetected_chromedriver` library has an experimental (read: not guaranteed) headless mode. Try this:

```python

driver = uc.Chrome(headless=True)

```

### Stability Issues

As noted on the package’s PyPI page, outcomes can vary due to many factors. While there's no guarantee of success, the developers continually work to understand and counter detection algorithms.

![Alert about unpredictable results on PyPI](https://github.com/luminati-io/undetected-chromedriver-web-scraping/blob/main/Images/image-60-1024x244.png)

This means a script that bypasses anti-bot measures like Distil, Cloudflare, Imperva, DataDome, or hCaptcha today might fail if these defenses are updated tomorrow:

![CAPTCHA triggered by Undetected ChromeDriver](https://github.com/luminati-io/undetected-chromedriver-web-scraping/blob/main/Images/image-61-1024x547.png)

The image above, taken from the official documentation, shows that even developer-provided scripts can sometimes trigger a CAPTCHA, potentially halting your automation.

## Conclusion

While Undetected ChromeDriver provides a patched ChromeDriver for web scraping, advanced anti-bot systems like Cloudflare can still block your scripts. The issue isn’t with Selenium’s API but with the browser’s settings. The true solution is a cloud-based, always-updated, scalable browser with built-in anti-bot capabilities—enter [Scraping Browser](https://brightdata.com/products/scraping-browser).

Create a free Bright Data account today to try out our scraping browser or test our proxies.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/luminati-io/undetected-chromedriver-web-scraping

Awesome Lists containing this project

README