https://github.com/aydinnyunus/ai-captcha-bypass

Captcha Bypass using GPT4-o
https://github.com/aydinnyunus/ai-captcha-bypass

Last synced: 2 days ago
JSON representation

Captcha Bypass using GPT4-o

Host: GitHub
URL: https://github.com/aydinnyunus/ai-captcha-bypass
Owner: aydinnyunus
License: other
Created: 2024-07-14T13:27:26.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-10-02T19:40:20.000Z (6 days ago)
Last Synced: 2025-10-02T21:14:31.026Z (6 days ago)
Language: Python
Size: 1.45 MB
Stars: 805
Watchers: 5
Forks: 65
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# AI-Powered CAPTCHA Solver

This project is a Python-based command-line tool that uses large multimodal models (LMMs) like OpenAI's GPT-4o and Google's Gemini to automatically solve various types of CAPTCHAs. It leverages Selenium for web browser automation to interact with web pages and solve CAPTCHAs in real-time.

A successful solve is recorded as a GIF in the `successful_solves` directory.

## Key Features

- **Multiple AI Providers**: Supports both OpenAI (e.g., GPT-4o) and Google Gemini (e.g., Gemini 2.5 Pro) models.
- **Multiple CAPTCHA Types**: Capable of solving a variety of CAPTCHA challenges.
- **Browser Automation**: Uses Selenium to simulate human interaction with web pages.
- **Extensible**: The modular design makes it easy to add support for new CAPTCHA types or AI models.
- **Benchmarking**: Includes a script to test the performance and success rate of the solvers.

## Supported CAPTCHA Types

The tool can solve the following CAPTCHA types found on the `2captcha.com/demo/` pages:

1. **Text Captcha**: Simple text recognition.
2. **Complicated Text Captcha**: Text with more distortion and noise.
3. **reCAPTCHA v2**: Google's "I'm not a robot" checkbox with image selection challenges.
4. **Puzzle Captcha**: Slider puzzles where a piece must be moved to the correct location.
5. **Audio Captcha**: Transcribing spoken letters or numbers from an audio file.

## Prerequisites

- Python 3.7+
- Mozilla Firefox

## Installation & Configuration

1. **Clone the repository:**
```bash
git clone https://github.com/aydinnyunus/ai-captcha-bypass
cd ai-captcha-bypass
```

2. **Install dependencies:**
```bash
pip install -r requirements.txt
```

3. **Set up your API keys:**
Create a `.env` file in the root directory by copying the example file:
```bash
cp .env.example .env
```
Open the `.env` file and add your API keys for OpenAI and/or Google Gemini:
```
OPENAI_API_KEY="sk-..."
GOOGLE_API_KEY="..."
```

## Usage

The primary script for running the solver is `main.py`. You need to specify the CAPTCHA type to test. You can also specify the AI provider and model.

### Command-Line Arguments

- `captcha_type`: (Required) The type of CAPTCHA to solve.
- Choices: `puzzle`, `text`, `complicated_text`, `recaptcha_v2`, `audio`
- `--provider`: The AI provider to use.
- Choices: `openai`, `gemini` (Default: `openai`)
- `--model`: The specific model to use (e.g., `gpt-4o`, `gemini-2.5-flash`).
- `--file`: Path to an audio file for the `audio` test. (Default: `files/audio.mp3`)

### Examples

**Solve a simple text CAPTCHA using OpenAI (default):**
```bash
python main.py text
```

**Solve a complicated text CAPTCHA using Gemini:**
```bash
python main.py complicated_text --provider gemini
```

**Solve a reCAPTCHA v2 challenge using Gemini:**
```bash
python main.py recaptcha_v2 --provider gemini
```

**Transcribe an audio CAPTCHA:**
```bash
python main.py audio --file files/radio.wav --provider openai
```

**Solve a puzzle CAPTCHA using a specific OpenAI model:**
```bash
python main.py puzzle --provider openai --model gpt-4o
```

## How It Works

1. **Launch Browser**: The script starts a Firefox browser instance using Selenium.
2. **Navigate**: It goes to the demo page for the specified CAPTCHA type.
3. **Capture**: It takes screenshots of the CAPTCHA challenge (image, instructions, or puzzle).
4. **AI Analysis**: The captured images or audio files are sent to the selected AI provider (OpenAI or Gemini) with a specific prompt tailored to the CAPTCHA type.
5. **Get Action**: The AI returns the solution (text, coordinates, or image selections).
6. **Perform Action**: The script uses Selenium to enter the text, move the slider, or click the correct images.
7. **Verify**: The script checks for a success message to confirm the CAPTCHA was solved.

## Success Examples

Here are some examples of the solver successfully bypassing different CAPTCHA types.

| CAPTCHA Type | OpenAI (GPT-4o) | Gemini (2.5 Pro) |
| -------------------- | ------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------ |
| **reCAPTCHA v2** | | |
| **Puzzle** | | |
| **Complicated Text** | | |

## Project Structure

- `main.py`: The main entry point to run the CAPTCHA solver tests. Handles command-line arguments and calls the appropriate test functions.
- `ai_utils.py`: Contains all the functions for interacting with the OpenAI and Gemini APIs. This is where prompts are defined and API calls are made.
- `puzzle_solver.py`: Implements the logic specifically for solving the multi-step slider puzzle CAPTCHA.
- `benchmark.py`: A script for running multiple tests to evaluate the performance and success rate of the different solvers.
- `requirements.txt`: A list of all the Python packages required for the project.
- `screenshots/`: Directory where screenshots of CAPTCHAs are temporarily saved.
- `successful_solves/`: Directory where GIFs of successful solutions are saved.

## Contact

[](https://linkedin.com/in/yunus-ayd%C4%B1n-b9b01a18a/) [ Github ](https://github.com/aydinnyunus/ai-captcha-bypass) [](https://instagram.com/aydinyunus_/) [](https://twitter.com/aydinnyunuss)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aydinnyunus/ai-captcha-bypass

Awesome Lists containing this project

README