Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/prantadas/puppetter-web-scrapping
This is a Puppeteer script written in TypeScript for web scraping purposes. The script automates browser actions to interact with a website, solve reCAPTCHA challenges, and download a PDF file. It uses additional Puppeteer plugins for stealth and reCAPTCHA solving.
https://github.com/prantadas/puppetter-web-scrapping
Last synced: 1 day ago
JSON representation
This is a Puppeteer script written in TypeScript for web scraping purposes. The script automates browser actions to interact with a website, solve reCAPTCHA challenges, and download a PDF file. It uses additional Puppeteer plugins for stealth and reCAPTCHA solving.
- Host: GitHub
- URL: https://github.com/prantadas/puppetter-web-scrapping
- Owner: PrantaDas
- License: mit
- Created: 2024-01-08T07:25:38.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-01-08T07:28:19.000Z (12 months ago)
- Last Synced: 2024-11-05T14:15:14.042Z (about 2 months ago)
- Language: TypeScript
- Size: 27.3 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Puppeteer Web Scraping Script
This is a Puppeteer script written in TypeScript for web scraping purposes. The script automates browser actions to interact with a website, solve reCAPTCHA challenges, and download a PDF file. It uses additional Puppeteer plugins for stealth and reCAPTCHA solving.
## Prerequisites
Before running the script, ensure you have the following installed and configured:
- **Node.js and npm:** [Download and install Node.js](https://nodejs.org/)
- **Git:** [Download and install Git](https://git-scm.com/)## Installation
1. Clone the repository:
```bash
git clone https://github.com/PrantaDas/puppetter-web-scrapping.git
```2. Navigate to the project directory:
```bash
cd puppetter-web-scrapping
```3. Install dependencies:
```bash
pnpm install
```## Configuration
1. Create a `.env` file in the root of the project.
2. Add the following environment variables to the `.env` file:
```env
URL= https://www.gob.mx/curp # Replace with the target URL
IDENTIFIER= replace with the sample CURP or identifier
CAPTCHA_TOKEN=your_captcha_token # Replace with your 2Captcha token
```## Usage
Run the script using the following command:
```bash
pnpm dev
```