Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/zhanymkanov/marketplace_parser

Products and Reviews Crawler
https://github.com/zhanymkanov/marketplace_parser

crawler python scrapy

Last synced: about 2 months ago
JSON representation

Products and Reviews Crawler

Awesome Lists containing this project

README

        

# About
Marketplace Web-Crawler

# How it works
## Parser steps
1. Use Products API to get JSON List of the products
2. Use Products JSON List to crawl Products Specifications from HTML pages
3. Use Products JSON List to request Reviews API for every product
4. Clean the collected JSON files
5. Extract valuable information from Product Specifications
6. Dump data into the database

### Comment on API access

Although API is not private, it is nor public.

I had to do some stuff with my outgoing traffic to find out its endpoints.

Therefore, I think it is not tethical to put it online.

## Installation
### Prerequisites
1. Python 3.8+
2. Docker - optional

### Installation steps
1. Get the project
```
git clone https://github.com/zhanymkanov/reviews_parser
```
2a. Install the packages without docker
```
pip install -r requirements/base.txt
```
2b. Install the packages with docker
```
docker-compose up -d --build
```