https://github.com/google-deepmind/narrativeqa

This repository contains the NarrativeQA dataset. It includes the list of documents with Wikipedia summaries, links to full stories, and questions and answers.
https://github.com/google-deepmind/narrativeqa

Last synced: about 1 month ago
JSON representation

This repository contains the NarrativeQA dataset. It includes the list of documents with Wikipedia summaries, links to full stories, and questions and answers.

Host: GitHub
URL: https://github.com/google-deepmind/narrativeqa
Owner: google-deepmind
License: apache-2.0
Created: 2017-12-20T14:39:57.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2020-04-15T09:16:14.000Z (about 5 years ago)
Last Synced: 2024-12-06T22:20:35.016Z (5 months ago)
Language: Shell
Homepage:
Size: 4.76 MB
Stars: 461
Watchers: 25
Forks: 66
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

Reading-Comprehension-Question-Answering-Papers - paper
awesome-datasets - NarrativeQA - and-answer pairs. (Document Question Answering / English)

README

# The NarrativeQA Reading Comprehension Challenge Dataset

This repository contains the NarrativeQA dataset. It includes the list of
documents with Wikipedia summaries, links to full stories, and questions and
answers.

For a detailed description of this see the paper
[The NarrativeQA Reading Comprehension
Challenge](https://arxiv.org/abs/1712.07040). Please cite the paper if you use
this corpus in your work.

### Files

* documents.csv - contains document_id, set, kind, story_url, story_file_size,
wiki_url, wiki_title, story_word_count, story_start, story_end. The word count
is approximate after some basic cleanup and tokenization.
* third_party/wikipedia/summaries.csv - contains document_id, set, summary,
summary_tokenized. The summaries are from Wikipedia.
* qaps.csv - contains document_id, set, question, answer1, answer2,
question_tokenized, answer1_tokenized, answer2_tokenized.
* download_stories.sh - script to download the stories.
* compare.sh - compare downloaded story's file size to the document size we had.
(At the time of publication, all stories have <3.5% file difference (except
one), likely due to punctuation encoding.)

### Bibtex

```
@article{narrativeqa,
author = {Tom\'a\v s Ko\v cisk\'y and Jonathan Schwarz and Phil Blunsom and
Chris Dyer and Karl Moritz Hermann and G\'abor Melis and
Edward Grefenstette},
title = {The {NarrativeQA} Reading Comprehension Challenge},
journal = {Transactions of the Association for Computational Linguistics},
url = {https://TBD},
volume = {TBD},
year = {2018},
pages = {TBD},
}
```

### Dataset Metadata
The following table is necessary for this dataset to be indexed by search
engines such as Google Dataset Search.

property
value

name
The NarrativeQA Reading Comprehension Challenge Dataset

alternateName
NarrativeQA

url
https://github.com/deepmind/narrativeqa

sameAs
https://github.com/deepmind/narrativeqa

description
This repository contains the NarrativeQA dataset. It includes the list of documents with Wikipedia summaries, links to full stories, and questions and answers.

provider

property
value

name
DeepMind

sameAs
https://en.wikipedia.org/wiki/DeepMind

license

property
value

name
Apache License, Version 2.0

url
https://www.apache.org/licenses/LICENSE-2.0.html

citation
https://identifiers.org/arxiv:1712.07040

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/google-deepmind/narrativeqa

Awesome Lists containing this project

README