https://github.com/brandon-braner/recaptcha_scraper
A Python based webscraper that checks websites for the presence of Google reCAPTCHA
https://github.com/brandon-braner/recaptcha_scraper
portfolio upwork webscraping
Last synced: 2 months ago
JSON representation
A Python based webscraper that checks websites for the presence of Google reCAPTCHA
- Host: GitHub
- URL: https://github.com/brandon-braner/recaptcha_scraper
- Owner: brandon-braner
- Created: 2024-11-18T14:30:49.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-11-18T14:35:39.000Z (7 months ago)
- Last Synced: 2025-02-10T12:43:24.245Z (4 months ago)
- Topics: portfolio, upwork, webscraping
- Language: Python
- Homepage:
- Size: 56.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# reCAPTCHA Website Scraper
A Python-based web scraper that checks websites for the presence of Google reCAPTCHA (both Standard and Enterprise versions).
## Features
- Multi-process scanning of URLs
- SQLite database storage
- Detects both Enterprise and Standard reCAPTCHA implementations
- Handles URL redirects
- Performance optimized with parallel processing## Prerequisites
- Python 3.12+
- Playwright
- SQLite3## Installation
- Clone the repository:
```bash
git clone https://github.com/yourusername/recaptcha-webscraper.git
```- Go into the directory
```bash
cd recaptcha-webscraper
```- Install requirements
```bash
poetry install
```- Install playwright
```bash
poetry run playwright install
```## Running
Prepare your input file:
- Create a urls.csv file with one URL per line in the first column
- Run the python file with
```bash
poetry run python main.py
```