https://github.com/oxylabs/playwright-captcha

A guide on how to use Playwright to bypass CAPTCHA challenges using Python.
https://github.com/oxylabs/playwright-captcha

captcha playwright playwright-python scraping-websites

Last synced: about 1 month ago
JSON representation

A guide on how to use Playwright to bypass CAPTCHA challenges using Python.

Host: GitHub
URL: https://github.com/oxylabs/playwright-captcha
Owner: oxylabs
Created: 2024-09-03T13:39:40.000Z (10 months ago)
Default Branch: main
Last Pushed: 2025-04-09T08:33:30.000Z (2 months ago)
Last Synced: 2025-04-09T09:36:02.258Z (2 months ago)
Topics: captcha, playwright, playwright-python, scraping-websites
Homepage:
Size: 13.7 KB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # How to Bypass CAPTCHA With Playwright

[![Oxylabs promo code](https://raw.githubusercontent.com/oxylabs/product-integrations/refs/heads/master/Affiliate-Universal-1090x275.png)](https://oxylabs.go2cloud.org/aff_c?offer_id=7&aff_id=877&url_id=112)

[![](https://dcbadge.vercel.app/api/server/eWsVUJrnG5)](https://discord.gg/Pds3gBmKMH)

This step-by-step tutorial demonstrates how to use Playwright to bypass CAPTCHA challenges using Python. The tutorial will also discuss the perks of using Oxylabs’ Web Unblocker instead of the `playwright-stealth` library. 

  * [1. Install dependencies](#1-install-dependencies)

  * [2. Import modules](#2-import-modules)

  * [3. Create a headless browser instance](#3-create-a-headless-browser-instance)

  * [4. Apply the stealth settings](#4-apply-the-stealth-settings)

  * [6. Take a screenshot](#6-take-a-screenshot)

  * [7. Execute and test](#7-execute-and-test)

- [Bypass CAPTCHA with Web Unblocker](#bypass-captcha-with-web-unblocker)

  * [1. Create an account](#1-create-an-account)

  * [2. Create API key](#2-create-api-key)

  * [3. Install the requests module](#3-install-the-requests-module)

  * [4. Import the required modules](#4-import-the-required-modules)

  * [6. Make a request](#6-make-a-request)

  * [7. Save the response](#7-save-the-response)

  * [8. Execute and check](#8-execute-and-check)

### 1. Install dependencies

Install the Playwright library and the stealth package.

```pip install playwright playwright-stealth```

### 2. Import modules 

Use the synchronous version of the Playwright library for a straightforward and linear program flow.

```

from playwright.sync_api import sync_playwright

from playwright_stealth import stealth_sync

```

### 3. Create a headless browser instance

Define the `capture_screenshot()` function that encapsulates the whole code to open a headless browser instance, visit the url, and capture the screenshot. In this function, create a new `sync_playwright` instance and then use it to launch the Chromium browser in headless mode.

```

# Define the function to capture the screenshot

def capture_screenshot():

    # Create a playwright instance

    with sync_playwright() as play_wright:

        browser = play_wright.chromium.launch(headless=True)

        # Create a new context and page

        context = browser.new_context()

        page = context.new_page()

```

### 4. Apply the stealth settings

After creating the browser context, enable Playwright CAPTCHA bypasses by applying the stealth settings to the page using the `playwright-stealth` package. Stealth settings help in reducing the chances of automated access detection by hiding the browsers’ automated behavior.

```

        # Apply the stealth settings

        stealth_sync(page)

```

5. Navigate to the page

In the next step, navigate to the target URL by specifying your required URL and navigating to it using the `goto()` page method.

```

        # Navigate to the website

        url = "http://sandbox.oxylabs.io/products"

        page.goto(url)

```

### 6. Take a screenshot

Wait for the page to load completely, take the screenshot, and close the browser.

```

        # Wait for the webpage to load completely

        page.wait_for_load_state("load")

        # Take a screenshot

        screenshot_filename = "oxylabs_screenshot.png"

        page.screenshot(path=screenshot_filename)

        # Close the browser

        browser.close()

        print("Done! You can check the screenshot...")

capture_screenshot()

```

### 7. Execute and test

Here is what our complete code looks like:

```

# Import the required modules

from playwright.sync_api import sync_playwright

from playwright_stealth import stealth_sync

# Define the function to capture the screenshot

def capture_screenshot():

    # Create a playwright instance

    with sync_playwright() as play_wright:

        browser = play_wright.chromium.launch(headless=True)

        # Create a new context and page

        context = browser.new_context()

        page = context.new_page()

        # Apply the stealth settings

        stealth_sync(page)

        # Navigate to the website

        url = "http://sandbox.oxylabs.io/products"

        page.goto(url)

        # Wait for the webpage to load completely

        page.wait_for_load_state("load")

        # Take a screenshot

        screenshot_filename = "oxylabs_screenshot.png"

        page.screenshot(path=screenshot_filename)

        # Close the browser

        browser.close()

        print("Done! You can check the screenshot...")

capture_screenshot()

```

Note: Executing the code saves the screenshot.

## Bypass CAPTCHA with Web Unblocker

Oxylabs’ [Web Unblocker](https://oxylabs.io/products/web-unblocker) employs AI techniques to help you access publicly available information behind the CAPTCHA. You just need to send a simple query and Web Unblocker will automatically choose the fastest CAPTCHA proxy, attach all essential headers, and return the response HTML bypassing any anti-bots of the target websites.

### 1. Create an account

To use Web Unblocker, you'll need an active subscription. You can either get a paid plan or a 7-day free trial [here](https://dashboard.oxylabs.io/). 

### 2. Create API key

After successfully creating your account, you can set your API key username and password from the dashboard. These API key credentials will be used later in the code.

### 3. Install the requests module

You should use a library that can help perform HTTP requests. We will use the `requests` to send HTTP requests to Web  Unblocker API and capture the response.

```pip install requests```

### 4. Import the required modules

In your Python script file, import the modules using the following import statement:

```import requests```

Create the `proxies` dictionary to connect to Web Unblocker and then define the `headers` dictionary that’ll instruct Web Unblocker to use JavaScript rendering. See the [documentation](https://developers.oxylabs.io/advanced-proxy-solutions/web-unblocker) for more details. 

```

# Define proxy dict. Don't forget to pass your Web Unblocker credentials (username and password)

proxies = {

   "http": "http://USERNAME:[email protected]:60000",

   "https": "http://USERNAME:[email protected]:60000",

}

headers = {

    "X-Oxylabs-Render": "html"

}

```

### 6. Make a request

Perform your request by specifying the URL, request type, and proxy by using the following code.

```

response = requests.request(

   "GET",

   "http://sandbox.oxylabs.io/products",

   verify=False,  # Ignore the certificate

   proxies=proxies,

)

```

### 7. Save the response

Write the following code to print the response and save it in an HTML file.

```

# Print result page to stdout

print(response.text)

# Save returned HTML to result.html file

with open("result.html", "w") as f:

   f.write(response.text)

```

### 8. Execute and check

Execute the code and test the output. If the output HTML  file has actual page contents, the script successfully bypassed the CAPTCHA. Here is what our complete code looks like.

```

# Import the modules

import requests

# Define proxy dict. Don't forget to put your real user and pass here as well.

proxies = {

   "http": "http://USERNAME:[email protected]:60000",

   "https": "http://USERNAME:[email protected]:60000",

}

headers = {

    "X-Oxylabs-Render": "html"

}

response = requests.request(

   "GET",

   "http://sandbox.oxylabs.io/products",

   verify=False,  # Ignore the certificate

   proxies=proxies,

   headers=headers,

)

# Print result page to stdout

print(response.text)

# Save returned HTML to result.html file

with open("result.html", "w") as f:

   f.write(response.text)

```

And that's it! For a more detailed tutorial with images, you can check out this [article](https://oxylabs.io/blog/playwright-bypass-captcha).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/oxylabs/playwright-captcha

Awesome Lists containing this project

README