https://github.com/lablnet/alibaba_scraper

This is a robust web scraper that extracts data from the Alibaba website. It's multi-threaded and utilizes Playwright to efficiently scrape data from the website. This script is capable of scraping the entire Alibaba site, which would take approximately 4-6 months to complete.
https://github.com/lablnet/alibaba_scraper

alibaba data ecom mit-license open-source products scraper

Last synced: over 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/lablnet/alibaba_scraper
Owner: lablnet
License: mit
Created: 2024-05-01T09:21:15.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-05-03T15:19:23.000Z (about 2 years ago)
Last Synced: 2025-01-21T16:45:12.077Z (over 1 year ago)
Topics: alibaba, data, ecom, mit-license, open-source, products, scraper
Language: JavaScript
Homepage: https://lablnet.com/project/alibabascraper
Size: 39.1 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Alibaba Scraper
[https://lablnet.com/project/alibabascraper](https://lablnet.com/project/alibabascraper)

### Installation.
- Clone the repository.
- Run `npm install` to install the dependencies.
- Copy `.env.example` to `.env` and update the values.
- Run `node ./alibaba/categories.js` to get the categories and store them in the database.
- Run `node ./alibaba/processProducts.js` to start the scraper.
- As you can not keep the terminal open so you can use nohup to run the script in background.
- `nohup node ./alibaba/processProducts.js &`
- The script will create `categories_queue1` queue file in the root directory, and it will keep runing until the queue is empty.

### Features

* Scrape data from Alibaba website
* Multi-threaded
* Save data to Amazon DynamoDB
* Proxy support
* Proper error handling and logging

### License

* MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lablnet/alibaba_scraper

Awesome Lists containing this project

README