https://github.com/systemvll/hcaptcha-dataset-scraper
A simple nodejs script that return hcaptcha images and prompt for training AI.
https://github.com/systemvll/hcaptcha-dataset-scraper
ai data-science dataset hcaptcha hcaptcha-solver machine-learning
Last synced: 5 months ago
JSON representation
A simple nodejs script that return hcaptcha images and prompt for training AI.
- Host: GitHub
- URL: https://github.com/systemvll/hcaptcha-dataset-scraper
- Owner: SystemVll
- Created: 2023-03-22T01:34:19.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2025-06-03T10:26:42.000Z (7 months ago)
- Last Synced: 2025-08-02T09:31:16.390Z (5 months ago)
- Topics: ai, data-science, dataset, hcaptcha, hcaptcha-solver, machine-learning
- Language: JavaScript
- Homepage:
- Size: 33.2 KB
- Stars: 5
- Watchers: 1
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# hCaptcha Image Scraper
This is a Node.js script that uses Puppeteer and the hCaptcha library to collect images from a captcha challenge on the hCaptcha demo page. The purpose of this script is to generate a dataset of captcha images and prompts that can be used for training machine learning models.
Installation
Clone this repository to your local machine and install the dependencies:
```bash
git clone https://github.com/your-username/hcaptcha-image-scraper.git
npm install
```
Run the script using Node.js, specifying the number of images to collect as a command-line argument:
```bash
node main.js
```
For example, to collect 90 images, run:
```bash
node main.js 10
```
The script will open the hCaptcha demo page in a headless Chrome browser and start collecting images from the captcha challenge.
The images and prompts will be saved to the images directory.
Once the script has finished running, you can use the images and prompts to train machine learning models.