https://github.com/oxylabs/playwright-captcha
A guide on how to use Playwright to bypass CAPTCHA challenges using Python.
https://github.com/oxylabs/playwright-captcha
captcha playwright playwright-python scraping-websites
Last synced: about 1 month ago
JSON representation
A guide on how to use Playwright to bypass CAPTCHA challenges using Python.
- Host: GitHub
- URL: https://github.com/oxylabs/playwright-captcha
- Owner: oxylabs
- Created: 2024-09-03T13:39:40.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-04-09T08:33:30.000Z (2 months ago)
- Last Synced: 2025-04-09T09:36:02.258Z (2 months ago)
- Topics: captcha, playwright, playwright-python, scraping-websites
- Homepage:
- Size: 13.7 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# How to Bypass CAPTCHA With Playwright
[](https://oxylabs.go2cloud.org/aff_c?offer_id=7&aff_id=877&url_id=112)
[](https://discord.gg/Pds3gBmKMH)
This step-by-step tutorial demonstrates how to use Playwright to bypass CAPTCHA challenges using Python. The tutorial will also discuss the perks of using Oxylabs’ Web Unblocker instead of the `playwright-stealth` library.
* [1. Install dependencies](#1-install-dependencies)
* [2. Import modules](#2-import-modules)
* [3. Create a headless browser instance](#3-create-a-headless-browser-instance)
* [4. Apply the stealth settings](#4-apply-the-stealth-settings)
* [6. Take a screenshot](#6-take-a-screenshot)
* [7. Execute and test](#7-execute-and-test)
- [Bypass CAPTCHA with Web Unblocker](#bypass-captcha-with-web-unblocker)
* [1. Create an account](#1-create-an-account)
* [2. Create API key](#2-create-api-key)
* [3. Install the requests module](#3-install-the-requests-module)
* [4. Import the required modules](#4-import-the-required-modules)
* [6. Make a request](#6-make-a-request)
* [7. Save the response](#7-save-the-response)
* [8. Execute and check](#8-execute-and-check)### 1. Install dependencies
Install the Playwright library and the stealth package.```pip install playwright playwright-stealth```
### 2. Import modules
Use the synchronous version of the Playwright library for a straightforward and linear program flow.```
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
```### 3. Create a headless browser instance
Define the `capture_screenshot()` function that encapsulates the whole code to open a headless browser instance, visit the url, and capture the screenshot. In this function, create a new `sync_playwright` instance and then use it to launch the Chromium browser in headless mode.```
# Define the function to capture the screenshot
def capture_screenshot():
# Create a playwright instance
with sync_playwright() as play_wright:
browser = play_wright.chromium.launch(headless=True)# Create a new context and page
context = browser.new_context()
page = context.new_page()
```### 4. Apply the stealth settings
After creating the browser context, enable Playwright CAPTCHA bypasses by applying the stealth settings to the page using the `playwright-stealth` package. Stealth settings help in reducing the chances of automated access detection by hiding the browsers’ automated behavior.```
# Apply the stealth settings
stealth_sync(page)
```5. Navigate to the page
In the next step, navigate to the target URL by specifying your required URL and navigating to it using the `goto()` page method.```
# Navigate to the website
url = "http://sandbox.oxylabs.io/products"
page.goto(url)
```### 6. Take a screenshot
Wait for the page to load completely, take the screenshot, and close the browser.```
# Wait for the webpage to load completely
page.wait_for_load_state("load")# Take a screenshot
screenshot_filename = "oxylabs_screenshot.png"
page.screenshot(path=screenshot_filename)# Close the browser
browser.close()print("Done! You can check the screenshot...")
capture_screenshot()
```### 7. Execute and test
Here is what our complete code looks like:```
# Import the required modules
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync# Define the function to capture the screenshot
def capture_screenshot():
# Create a playwright instance
with sync_playwright() as play_wright:
browser = play_wright.chromium.launch(headless=True)# Create a new context and page
context = browser.new_context()
page = context.new_page()# Apply the stealth settings
stealth_sync(page)# Navigate to the website
url = "http://sandbox.oxylabs.io/products"
page.goto(url)# Wait for the webpage to load completely
page.wait_for_load_state("load")# Take a screenshot
screenshot_filename = "oxylabs_screenshot.png"
page.screenshot(path=screenshot_filename)# Close the browser
browser.close()print("Done! You can check the screenshot...")
capture_screenshot()
```Note: Executing the code saves the screenshot.
## Bypass CAPTCHA with Web Unblocker
Oxylabs’ [Web Unblocker](https://oxylabs.io/products/web-unblocker) employs AI techniques to help you access publicly available information behind the CAPTCHA. You just need to send a simple query and Web Unblocker will automatically choose the fastest CAPTCHA proxy, attach all essential headers, and return the response HTML bypassing any anti-bots of the target websites.
### 1. Create an account
To use Web Unblocker, you'll need an active subscription. You can either get a paid plan or a 7-day free trial [here](https://dashboard.oxylabs.io/).### 2. Create API key
After successfully creating your account, you can set your API key username and password from the dashboard. These API key credentials will be used later in the code.### 3. Install the requests module
You should use a library that can help perform HTTP requests. We will use the `requests` to send HTTP requests to Web Unblocker API and capture the response.```pip install requests```
### 4. Import the required modules
In your Python script file, import the modules using the following import statement:```import requests```
Create the `proxies` dictionary to connect to Web Unblocker and then define the `headers` dictionary that’ll instruct Web Unblocker to use JavaScript rendering. See the [documentation](https://developers.oxylabs.io/advanced-proxy-solutions/web-unblocker) for more details.
```
# Define proxy dict. Don't forget to pass your Web Unblocker credentials (username and password)
proxies = {
"http": "http://USERNAME:[email protected]:60000",
"https": "http://USERNAME:[email protected]:60000",
}headers = {
"X-Oxylabs-Render": "html"
}
```### 6. Make a request
Perform your request by specifying the URL, request type, and proxy by using the following code.```
response = requests.request(
"GET",
"http://sandbox.oxylabs.io/products",
verify=False, # Ignore the certificate
proxies=proxies,
)
```### 7. Save the response
Write the following code to print the response and save it in an HTML file.```
# Print result page to stdout
print(response.text)# Save returned HTML to result.html file
with open("result.html", "w") as f:
f.write(response.text)
```### 8. Execute and check
Execute the code and test the output. If the output HTML file has actual page contents, the script successfully bypassed the CAPTCHA. Here is what our complete code looks like.```
# Import the modules
import requests# Define proxy dict. Don't forget to put your real user and pass here as well.
proxies = {
"http": "http://USERNAME:[email protected]:60000",
"https": "http://USERNAME:[email protected]:60000",
}headers = {
"X-Oxylabs-Render": "html"
}response = requests.request(
"GET",
"http://sandbox.oxylabs.io/products",
verify=False, # Ignore the certificate
proxies=proxies,
headers=headers,
)# Print result page to stdout
print(response.text)# Save returned HTML to result.html file
with open("result.html", "w") as f:
f.write(response.text)
```And that's it! For a more detailed tutorial with images, you can check out this [article](https://oxylabs.io/blog/playwright-bypass-captcha).