https://github.com/generousman/zhihurecapp
A Flask App For Analyzing ZhihuRec Dataset.
https://github.com/generousman/zhihurecapp
Last synced: 5 months ago
JSON representation
A Flask App For Analyzing ZhihuRec Dataset.
- Host: GitHub
- URL: https://github.com/generousman/zhihurecapp
- Owner: GenerousMan
- Created: 2022-11-01T08:26:00.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2022-11-21T17:53:35.000Z (over 3 years ago)
- Last Synced: 2023-05-23T10:41:26.878Z (about 3 years ago)
- Language: Python
- Size: 11.9 MB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ZhihuRec Data-mining
A flask app for analyzing ZhihuRec dataset.
## Requirement
``` bash
pip install requirements.txt
```
## Usage
- [Dataset] Put dataset ZhihuRec in the root directory.
- [Work Path] Set the work path in root directory.
- [Preprocess] Run the io.py, to convert answer_infos.txt into .csv files.
`1.`First, run this command to get answers' csv files:
``` bash
python tools/io.py
```
Or just download from here:
```
Baidu NetDisk
Link:https://pan.baidu.com/s/1Ey-R9yo6_HNuoZuhEJivjg
Code: 8rc7
```
Unzip and put the folder `answer_csv` into `source/`
`2.`Then you can use this command to run the flask app:
``` bash
python app.py
```
The flask app will run on the "127.0.0.1:5000"
## Files
- `[model]` The tf-idf model will be saved here.
- `[source]` Processed files
- `[answer_csv]` Answers' csv files. All files are sorted.
- `[xxxx.csv]` The xxxx means the start(min) answer's index in this file.
- `[tools]` Tools help you analyze the dataset.
- `[io.py]` Used to read/write/convert dataset.
- `[tfidf.py]` TF-IDF algorithm. its mainly functions are
- `train()`
- `load_tfidf()`
- `save_tfidf()`
- `compare_similarity()`.
- `[zhihuRec]` The dataset. You should put txt files here.
- `[app.py]` The entry of the flask app.
- `[preprocess.py]` Use the code in `tools` to create tfidf matrix, and save the result into `model`.