https://github.com/zlsh80826/appropriateresponseprediction

Predict Appropriate Response in Chinese
https://github.com/zlsh80826/appropriateresponseprediction

cntk cs565600 deep-learning nlp prediction

Last synced: 3 months ago
JSON representation

Predict Appropriate Response in Chinese

Host: GitHub
URL: https://github.com/zlsh80826/appropriateresponseprediction
Owner: zlsh80826
Created: 2018-10-27T16:15:27.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2022-12-08T01:16:23.000Z (over 2 years ago)
Last Synced: 2025-02-18T11:01:48.595Z (5 months ago)
Topics: cntk, cs565600, deep-learning, nlp, prediction
Language: Jupyter Notebook
Homepage:
Size: 23.9 MB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 16
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Appropriate Answer Prediction

(科技大擂台與AI對話)

----

## Feature

* A CNTK (Microsoft deep learning toolkit) implementation of CS565600 competition
* We use LSTM + attention to do this task
* For more model information, please refer to the [report](https://github.com/zlsh80826/AppropriateResponsePrediction/blob/master/script/Report.ipynb)
* If you meet any problem in this repo, feel free to contact [email protected]

## Requirements

Here are some required libraries for training

### General
* python3
* cuda-9.0 (CNTK required)
* openmpi-1.10 (CNTK required)
* gcc >= 6 (CNTK required)

### Python
* Please refer requirements.txt

## Usage

We recommand you to run all the scripts in script directory

```Bash
cd AppropriateResponsePrediction/script
```

Each script contain helper, you can check it for customed settings.

```Bash
python .py --help
```

### Preprocess

This script will convert the text format program to processed `npy` format.

You can specify `--threads` to indicate how many threads you want to use.

```Bash
python preprocessing.py
```

### Train Fasttext

This script will train the Traditional Chinese Embedding with processed data.

```Bash
python train_fasttext.py
```

### Generate The Training Data

Default settings will generate 4 million training data, which will consume about 8 GB disk space.

```Bash
python gen_training.py
```

### Convert tsv to ctf

CNTK support large training file, but we need to convert it to ctf format.

```Bash
python tsv2ctf.py
```

### Train

Default settings will run 300 epochs and save the checkpoint of each epoch.

```Bash
python train.py
```

### Inference

Inference script will read the checkpoint and do the inference. So you can inference while training, 4 - 10 epochs result is good enough in my experimence.

```Bash
python inference.py
```

### Performance

Based on the [Kaggle Leaderboard](https://www.kaggle.com/c/datalabcup-predicting-appropriate-response/leaderboard), our implementation is second prize (the first two are fake).

| |Public score|Private score|
|--------------|------------|-------------|
|Single Model | 73.6 | 71.6 |
|Ensemble Model| 76.4 | 72 |

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zlsh80826/appropriateresponseprediction

Awesome Lists containing this project

README