Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zlsh80826/appropriateresponseprediction
Predict Appropriate Response in Chinese
https://github.com/zlsh80826/appropriateresponseprediction
cntk cs565600 deep-learning nlp prediction
Last synced: about 1 month ago
JSON representation
Predict Appropriate Response in Chinese
- Host: GitHub
- URL: https://github.com/zlsh80826/appropriateresponseprediction
- Owner: zlsh80826
- Created: 2018-10-27T16:15:27.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T01:16:23.000Z (almost 2 years ago)
- Last Synced: 2023-03-03T00:13:45.331Z (over 1 year ago)
- Topics: cntk, cs565600, deep-learning, nlp, prediction
- Language: Jupyter Notebook
- Homepage:
- Size: 23.9 MB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 16
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Appropriate Answer Prediction
(科技大擂台 與AI對話)
----## Feature
* A CNTK (Microsoft deep learning toolkit) implementation of CS565600 competition
* We use LSTM + attention to do this task
* For more model information, please refer to the [report](https://github.com/zlsh80826/AppropriateResponsePrediction/blob/master/script/Report.ipynb)
* If you meet any problem in this repo, feel free to contact [email protected]## Requirements
Here are some required libraries for training
### General
* python3
* cuda-9.0 (CNTK required)
* openmpi-1.10 (CNTK required)
* gcc >= 6 (CNTK required)### Python
* Please refer requirements.txt## Usage
We recommand you to run all the scripts in script directory
```Bash
cd AppropriateResponsePrediction/script
```Each script contain helper, you can check it for customed settings.
```Bash
python .py --help
```### Preprocess
This script will convert the text format program to processed `npy` format.
You can specify `--threads` to indicate how many threads you want to use.
```Bash
python preprocessing.py
```### Train Fasttext
This script will train the Traditional Chinese Embedding with processed data.
```Bash
python train_fasttext.py
```### Generate The Training Data
Default settings will generate 4 million training data, which will consume about 8 GB disk space.
```Bash
python gen_training.py
```### Convert tsv to ctf
CNTK support large training file, but we need to convert it to ctf format.
```Bash
python tsv2ctf.py
```### Train
Default settings will run 300 epochs and save the checkpoint of each epoch.
```Bash
python train.py
```### Inference
Inference script will read the checkpoint and do the inference. So you can inference while training, 4 - 10 epochs result is good enough in my experimence.
```Bash
python inference.py
```### Performance
Based on the [Kaggle Leaderboard](https://www.kaggle.com/c/datalabcup-predicting-appropriate-response/leaderboard), our implementation is second prize (the first two are fake).
| |Public score|Private score|
|--------------|------------|-------------|
|Single Model | 73.6 | 71.6 |
|Ensemble Model| 76.4 | 72 |