Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/google-deepmind/narrativeqa
This repository contains the NarrativeQA dataset. It includes the list of documents with Wikipedia summaries, links to full stories, and questions and answers.
https://github.com/google-deepmind/narrativeqa
Last synced: 2 months ago
JSON representation
This repository contains the NarrativeQA dataset. It includes the list of documents with Wikipedia summaries, links to full stories, and questions and answers.
- Host: GitHub
- URL: https://github.com/google-deepmind/narrativeqa
- Owner: google-deepmind
- License: apache-2.0
- Created: 2017-12-20T14:39:57.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2020-04-15T09:16:14.000Z (over 4 years ago)
- Last Synced: 2024-04-16T04:53:39.838Z (9 months ago)
- Language: Shell
- Homepage:
- Size: 4.76 MB
- Stars: 432
- Watchers: 23
- Forks: 65
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Reading-Comprehension-Question-Answering-Papers - paper
- awesome-datasets - NarrativeQA - and-answer pairs. (Document Question Answering / English)
README
# The NarrativeQA Reading Comprehension Challenge Dataset
This repository contains the NarrativeQA dataset. It includes the list of
documents with Wikipedia summaries, links to full stories, and questions and
answers.For a detailed description of this see the paper
[The NarrativeQA Reading Comprehension
Challenge](https://arxiv.org/abs/1712.07040). Please cite the paper if you use
this corpus in your work.### Files
* documents.csv - contains document_id, set, kind, story_url, story_file_size,
wiki_url, wiki_title, story_word_count, story_start, story_end. The word count
is approximate after some basic cleanup and tokenization.
* third_party/wikipedia/summaries.csv - contains document_id, set, summary,
summary_tokenized. The summaries are from Wikipedia.
* qaps.csv - contains document_id, set, question, answer1, answer2,
question_tokenized, answer1_tokenized, answer2_tokenized.
* download_stories.sh - script to download the stories.
* compare.sh - compare downloaded story's file size to the document size we had.
(At the time of publication, all stories have <3.5% file difference (except
one), likely due to punctuation encoding.)### Bibtex
```
@article{narrativeqa,
author = {Tom\'a\v s Ko\v cisk\'y and Jonathan Schwarz and Phil Blunsom and
Chris Dyer and Karl Moritz Hermann and G\'abor Melis and
Edward Grefenstette},
title = {The {NarrativeQA} Reading Comprehension Challenge},
journal = {Transactions of the Association for Computational Linguistics},
url = {https://TBD},
volume = {TBD},
year = {2018},
pages = {TBD},
}
```### Dataset Metadata
The following table is necessary for this dataset to be indexed by search
engines such as Google Dataset Search.
property
value
name
The NarrativeQA Reading Comprehension Challenge Dataset
alternateName
NarrativeQA
url
https://github.com/deepmind/narrativeqa
sameAs
https://github.com/deepmind/narrativeqa
description
This repository contains the NarrativeQA dataset. It includes the list of
documents with Wikipedia summaries, links to full stories, and questions and answers.
provider
property
value
name
DeepMind
sameAs
https://en.wikipedia.org/wiki/DeepMind
license
property
value
name
Apache License, Version 2.0
url
https://www.apache.org/licenses/LICENSE-2.0.html
citation
https://identifiers.org/arxiv:1712.07040