https://github.com/HKUST-KnowComp/R-Net
Tensorflow Implementation of R-Net
https://github.com/HKUST-KnowComp/R-Net
machine-comprehension nlp r-net squad tensorflow
Last synced: 6 months ago
JSON representation
Tensorflow Implementation of R-Net
- Host: GitHub
- URL: https://github.com/HKUST-KnowComp/R-Net
- Owner: HKUST-KnowComp
- License: mit
- Created: 2017-11-28T02:12:47.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-08-08T18:04:09.000Z (over 7 years ago)
- Last Synced: 2024-11-27T03:34:39.873Z (about 1 year ago)
- Topics: machine-comprehension, nlp, r-net, squad, tensorflow
- Language: Python
- Homepage:
- Size: 188 KB
- Stars: 578
- Watchers: 34
- Forks: 210
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-qa - R-Net - An end-to-end neural networks model for reading comprehension style question answering, which aims to answer questions from a given passage. (Codes / Most QA systems have roughly 3 parts)
README
# R-Net
* A Tensorflow implementation of [R-NET: MACHINE READING COMPREHENSION WITH SELF-MATCHING NETWORKS](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf). This project is specially designed for the [SQuAD](https://arxiv.org/pdf/1606.05250.pdf) dataset.
* Should you have any question, please contact Wenxuan Zhou (wzhouad@connect.ust.hk).
## Requirements
There have been a lot of known problems caused by using different software versions. Please check your versions before opening issues or emailing me.
#### General
* Python >= 3.4
* unzip, wget
#### Python Packages
* tensorflow-gpu >= 1.5.0
* spaCy >= 2.0.0
* tqdm
* ujson
## Usage
To download and preprocess the data, run
```bash
# download SQuAD and Glove
sh download.sh
# preprocess the data
python config.py --mode prepro
```
Hyper parameters are stored in config.py. To debug/train/test the model, run
```bash
python config.py --mode debug/train/test
```
To get the official score, run
```bash
python evaluate-v1.1.py ~/data/squad/dev-v1.1.json log/answer/answer.json
```
The default directory for tensorboard log file is `log/event`
See release for trained model.
## Detailed Implementaion
* The original paper uses additive attention, which consumes lots of memory. This project adopts scaled multiplicative attention presented in [Attention Is All You Need](https://arxiv.org/abs/1706.03762).
* This project adopts variational dropout presented in [A Theoretically Grounded Application of Dropout in Recurrent Neural Networks](https://arxiv.org/abs/1512.05287).
* To solve the degradation problem in stacked RNN, outputs of each layer are concatenated to produce the final output.
* When the loss on dev set increases in a certain period, the learning rate is halved.
* During prediction, the project adopts search method presented in [Machine Comprehension Using Match-LSTM and Answer Pointer](https://arxiv.org/abs/1608.07905).
* To address efficiency issue, this implementation uses bucketing method (contributed by xiongyifan) and CudnnGRU. The bucketing method can speedup training, but will lower the F1 score by 0.3%.
## Performance
#### Score
||EM|F1|
|---|---|---|
|original paper|71.1|79.5|
|this project|71.07|79.51|


#### Training Time (s/it)
||Native|Native + Bucket|Cudnn|Cudnn + Bucket|
|---|---|---|---|---|
|E5-2640|6.21|3.56|-|-|
|TITAN X|2.56|1.31|0.41|0.28|
## Extensions
These settings may increase the score but not used in the model by default. You can turn these settings on in `config.py`.
* [Pretrained GloVe character embedding](https://github.com/minimaxir/char-embeddings). Contributed by yanghanxy.
* [Fasttext Embedding](https://fasttext.cc/docs/en/english-vectors.html). Contributed by xiongyifan. May increase the F1 by 1% (reported by xiongyifan).