Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/thunlp/OpenQA

The source code of ACL 2018 paper "Denoising Distantly Supervised Open-Domain Question Answering".
https://github.com/thunlp/OpenQA

question-answering reading-comprehension

Last synced: about 2 months ago
JSON representation

The source code of ACL 2018 paper "Denoising Distantly Supervised Open-Domain Question Answering".

Host: GitHub
URL: https://github.com/thunlp/OpenQA
Owner: thunlp
License: mit
Created: 2018-05-08T05:09:22.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2018-11-07T09:50:10.000Z (over 5 years ago)
Last Synced: 2024-02-06T13:53:18.394Z (5 months ago)
Topics: question-answering, reading-comprehension
Language: Python
Homepage:
Size: 71.3 KB
Stars: 206
Watchers: 24
Forks: 49
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Lists

Awesome-Medical-Research - Open QA

README

        # Open-QA

The source codes for paper "Denoising Distantly Supervised Open-Domain Question Answering", which is modified based on the [code](https://github.com/facebookresearch/DrQA) of paper " Reading Wikipedia to Answer Open-Domain Questions."

Requirements

==========

pytorch 0.3.0

numpy

scikit-learn

termcolor

regex

tqdm

prettytable

scipy

nltk

pexpect 4.2.1

Evaluation Results

==========

 

 

 

  Dataset

  Quasar-T

  SearchQA

  TrivialQA

 SQuAD

 

 

  Models

  EM

  F1

  EM

  F1

  EM

  F1

  EM

  F1

 

 

  GA (Dhingra et al., 2017)

  26.4

  26.4

  -

  -

  -

  -

 

 

  BiDAF (Seo

  et al., 2017)

  25.9

  28.5

  28.6

  34.6

  -

  -

  -

  -

 

 

  AQA (Buck

  et al., 2017)

  -

  -

  40.5

  47.4

  -

  -

  -

  -

 

 

  R^3 (Wang

  et al., 2018a)

  35.3

  41.7

  49

  55.3

  47.3

  53.7

 

  29.1

  37.5

 

 

  Our Model

  42.2

  49.3

  58.8

  64.5

  48.7

  56.3

 

  28.7

  36.6

 

Data

==========

We provide Quasar-T, SearchQA and TrivialQA  dataset we used for the task in data/ directory. We preprocess the original data to make it satisfy the input format of our codes, and can be download at [here](https://thunlp.oss-cn-qingdao.aliyuncs.com/OpenQA_data.tar.gz).

To run our code, the dataset should be put in the folder data/ using the following format:

datasets/

+ train.txt, dev.txt, test.txt:  format for each line: \{"question": quetion, "answers":[answer1, answer2, ...]\}.

+ train.json, dev.json, test.json: format [\{"question": question, "document":document1\},\{"question": question, "document":document2\}, ...]. 

embeddings/

+ glove.840B.300d.txt: word vectors obtained from [here](https://nlp.stanford.edu/projects/glove/).

corenlp/

+ all jar files from Stanford Corenlp.

Codes

==========

The source codes of our models are put in the folders src/.

Train and Test

==========

For training and test, you need to:

1. Pre-train the paragraph reader:	python main.py --batch-size 256 --model-name quasart_reader --num-epochs 10 --dataset quasart --mode reader

2. Pre-train the paragraph selector: 	python main.py --batch-size 64 --model-name quasart_selector --num-epochs 10 --dataset quasart --mode selector --pretrained models/quasart_reader.mdl

3. Train the whole model: python main.py --batch-size 32 --model-name quasart_all --num-epochs 10 --dataset quasart --mode all --pretrained models/quasart_selector.mdl

Cite

==========

If you use the code, please cite the following paper:

1. Yankai Lin, Haozhe Ji, Zhiyuan Liu, and Maosong Sun. 2018. Denoising Distantly Supervised Open-Domain Question Answering. In Proceedings of ACL. pages 1736--1745. [[pdf]](http://www.thunlp.org/~lyk/publications/acl2018_denoising.pdf)

Reference

=========

1. Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William Cohen,  and Ruslan Salakhutdinov.  2017.  Gated-attention readers for text comprehension. In Proceedings of ACL. pages 1832--1846.

2. Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2017.  Bidirectional attention flow for machine comprehension. In Proceedings of ICLR.

3. Christian Buck,  Jannis Bulian, Massimiliano Ciaramita, Andrea Gesmundo, Neil Houlsby, Wojciech Gajewski, and Wei Wang. 2017. Ask the right questions: Active question reformulation with reinforcement learning. arXiv preprint arXiv:1705.07830.

4. Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang,Tim Klinger, Wei Zhang, Shiyu Chang, Gerald Tesauro, Bowen Zhou, and Jing Jiang. 2018.  R3: Reinforced ranker-reader for open-domain question answering. In Proceedings of AAAI. pages 5981--5988.