Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kemkartanya/product-urls-crawler
https://github.com/kemkartanya/product-urls-crawler
python webcrawler
Last synced: 12 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/kemkartanya/product-urls-crawler
- Owner: kemkartanya
- Created: 2025-01-13T20:06:26.000Z (20 days ago)
- Default Branch: main
- Last Pushed: 2025-01-14T09:44:44.000Z (19 days ago)
- Last Synced: 2025-01-14T09:48:02.754Z (19 days ago)
- Topics: python, webcrawler
- Language: Python
- Homepage:
- Size: 11.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Product-URLs-Crawler
## How It Works
The spider starts with the URLs in start_urls.It extracts product URLs from these pages based on the defined pattern.
For each page, it yields the product URLs and follows all links (pagination) recursively, repeating the process on subsequent pages.
## How to Run
pip install scrapy
pip install scrapy-playwright
playwright installscrapy crawl nykaa_myntra_spider -o output/nykaa_myntra_urls.json