https://github.com/p3nnatr4tion/aliexpress-puppeteer
Dynamic Puppeteer Web Scraping for AliExpress Without API Key
https://github.com/p3nnatr4tion/aliexpress-puppeteer
aliexpress-scraper captcha-solving no-api-key puppeteer scraping-websites without-api
Last synced: 20 days ago
JSON representation
Dynamic Puppeteer Web Scraping for AliExpress Without API Key
- Host: GitHub
- URL: https://github.com/p3nnatr4tion/aliexpress-puppeteer
- Owner: p3nnatr4tion
- Created: 2025-05-05T08:08:28.000Z (23 days ago)
- Default Branch: master
- Last Pushed: 2025-05-05T08:40:43.000Z (23 days ago)
- Last Synced: 2025-05-05T09:47:20.565Z (23 days ago)
- Topics: aliexpress-scraper, captcha-solving, no-api-key, puppeteer, scraping-websites, without-api
- Language: JavaScript
- Homepage:
- Size: 23.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# aliexpress-puppeteer
Dynamic Puppeteer Web Scraping for AliExpress Without API Key
Scrape product data from AliExpress based on a given keyword using Puppeteer.
## Prerequisites
* **Node.js** (version 22.13.1)
* **npm** (version 10.9.2)Install them from [Node.js official website](https://nodejs.org/).
## Setup and Installation
1. **Clone the repository**:
```bash
git clone https://github.com/p3nnatr4tion/aliexpress-puppeteer.git
cd aliexpress-puppeteer
```2. **Install dependencies**:
```bash
npm install
```3. **Install Puppeteer version 24.8.0**:
```bash
npm install [email protected]
```4. **Configure the scraper** in `scraper-starter.js`:
Modify `keyword` and `maxPage` to your needs:```javascript
const keyword = "laptop"; // Search keyword
const maxPage = 1; // Max number of pages to scrape
```5. **Run the scraper**:
```bash
node scraper-starter.js
```## Features
* **Stealth Plugin**: Avoids detection by using Puppeteer’s Stealth Plugin.
* **Tab Pooling**: Limits concurrent tabs to avoid overload, improving stability.
* **Retry Mechanism**: Retries failed operations up to 3 times.
* **Captcha Handling**: Automatically solves CAPTCHA challenges.
* **Random Delays**: Introduces random delays between requests to mimic human behavior.
* **Comprehensive Data**: Collects detailed product info such as title, price, specifications, images, reviews, shipping, and more.
* **Efficient Page Navigation**: Handles scrolling and pagination to collect data from multiple pages.## Dependencies
* **Puppeteer**: Web scraping library (version 24.8.0)
* **Node.js** and **npm**## Notes
* Make sure to comply with [AliExpress Terms of Service](https://www.aliexpress.com).
* Be mindful of rate limiting to avoid IP blocks.## License
MIT License