Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zhanymkanov/reviews_tazalau
Parsed Reviews Processing
https://github.com/zhanymkanov/reviews_tazalau
data-cleaning data-processing python
Last synced: 27 days ago
JSON representation
Parsed Reviews Processing
- Host: GitHub
- URL: https://github.com/zhanymkanov/reviews_tazalau
- Owner: zhanymkanov
- License: cc0-1.0
- Created: 2019-12-16T05:58:47.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T03:18:52.000Z (almost 2 years ago)
- Last Synced: 2023-03-05T11:17:50.540Z (over 1 year ago)
- Topics: data-cleaning, data-processing, python
- Language: Roff
- Homepage:
- Size: 2.51 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# About
The project cleans data collected from https://github.com/zhanymkanov/marketplace_parser## Parsed data can be downloaded here:
https://github.com/zhanymkanov/reviews_dataset
- Partially cleaned (both row and cleaned versioans are available though)
- ~190k rows# Processing steps
### Part I
1. Walks through all categories, parses each product JSON file, detects language of product's review (either RU or KZ) using https://github.com/nlacslab/kaznlp
2. Combines all data from products into one CSV file per category
3. Combines all categorized CSV files into one large `all.csv` file### Part II
1. Removes all stopwords from `all.csv` reviews
2. Lemmatizes reviews using https://github.com/nlpub/pymystem3
3. Outputs new `cleaned_data.csv`# Installation
## Prerequisites
1. Python 3
2. Docker, docker-compose
3. Sucessfully parsed data from https://github.com/zhanymkanov/marketplace_parser
4. Make sure data is collected from parser and stored in proper directories locally## Installation
1. Clone the project
```
git clone https://github.com/zhanymkanov/reviews_tazalau
```
2. Go to project directory
```
cd reviews_tazalau
```
3. Set up the docker container
```
docker-compose build
```## Usage
Run the `main.py`