Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zhanymkanov/marketplace_parser
Products and Reviews Crawler
https://github.com/zhanymkanov/marketplace_parser
crawler python scrapy
Last synced: about 2 months ago
JSON representation
Products and Reviews Crawler
- Host: GitHub
- URL: https://github.com/zhanymkanov/marketplace_parser
- Owner: zhanymkanov
- License: cc0-1.0
- Created: 2019-11-24T06:30:07.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2020-07-27T08:12:21.000Z (over 4 years ago)
- Last Synced: 2024-10-11T18:57:45.950Z (3 months ago)
- Topics: crawler, python, scrapy
- Language: Python
- Homepage:
- Size: 2.64 MB
- Stars: 1
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# About
Marketplace Web-Crawler# How it works
## Parser steps
1. Use Products API to get JSON List of the products
2. Use Products JSON List to crawl Products Specifications from HTML pages
3. Use Products JSON List to request Reviews API for every product
4. Clean the collected JSON files
5. Extract valuable information from Product Specifications
6. Dump data into the database### Comment on API access
Although API is not private, it is nor public.I had to do some stuff with my outgoing traffic to find out its endpoints.
Therefore, I think it is not tethical to put it online.
## Installation
### Prerequisites
1. Python 3.8+
2. Docker - optional### Installation steps
1. Get the project
```
git clone https://github.com/zhanymkanov/reviews_parser
```
2a. Install the packages without docker
```
pip install -r requirements/base.txt
```
2b. Install the packages with docker
```
docker-compose up -d --build
```