An open API service indexing awesome lists of open source software.

https://github.com/generousman/zhihurecapp

A Flask App For Analyzing ZhihuRec Dataset.
https://github.com/generousman/zhihurecapp

Last synced: 5 months ago
JSON representation

A Flask App For Analyzing ZhihuRec Dataset.

Awesome Lists containing this project

README

          

# ZhihuRec Data-mining
A flask app for analyzing ZhihuRec dataset.

## Requirement

``` bash
pip install requirements.txt
```

## Usage
- [Dataset] Put dataset ZhihuRec in the root directory.
- [Work Path] Set the work path in root directory.
- [Preprocess] Run the io.py, to convert answer_infos.txt into .csv files.

`1.`First, run this command to get answers' csv files:

``` bash
python tools/io.py
```

Or just download from here:
```
Baidu NetDisk
Link:https://pan.baidu.com/s/1Ey-R9yo6_HNuoZuhEJivjg
Code: 8rc7
```
Unzip and put the folder `answer_csv` into `source/`

`2.`Then you can use this command to run the flask app:

``` bash
python app.py
```

The flask app will run on the "127.0.0.1:5000"

## Files
- `[model]` The tf-idf model will be saved here.
- `[source]` Processed files
- `[answer_csv]` Answers' csv files. All files are sorted.
- `[xxxx.csv]` The xxxx means the start(min) answer's index in this file.
- `[tools]` Tools help you analyze the dataset.
- `[io.py]` Used to read/write/convert dataset.
- `[tfidf.py]` TF-IDF algorithm. its mainly functions are
- `train()`
- `load_tfidf()`
- `save_tfidf()`
- `compare_similarity()`.
- `[zhihuRec]` The dataset. You should put txt files here.
- `[app.py]` The entry of the flask app.
- `[preprocess.py]` Use the code in `tools` to create tfidf matrix, and save the result into `model`.