An open API service indexing awesome lists of open source software.

https://github.com/andreaschandra/causalqa

CausalQA: A Benchmark for Causal Question Answering
https://github.com/andreaschandra/causalqa

Last synced: 4 months ago
JSON representation

CausalQA: A Benchmark for Causal Question Answering

Awesome Lists containing this project

README

          

# CausalQA: A Benchmark for Causal Question Answering

Fork from [official repository](https://github.com/webis-de/coling22-benchmark-for-causal-question-answering)

### Data

Download dataset and pretrained model [here](https://zenodo.org/record/7186761#.Y3DxncdBy5c)

You can [download](https://webis.de/data#webis-causalqa-22) the Webis-CausalQA-22 corpus. To recreate the ELI5 part, check instructions bellow.

The 10 datasets used to construct Webis-CausalQA-22 corpus:

| Dataset | Website | License | License type |
| :---------------- | :-------------------------------------------------------------------------------------- | :--------------------------------------------------- | :-------------------- |
| PAQ | [Page](https://github.com/facebookresearch/PAQ) | https://github.com/facebookresearch/PAQ#data-license | CC BY-SA 3.0 |
| GooAQ | [Page](https://github.com/allenai/gooaq/ | https://github.com/allenai/gooaq/blob/main/LICENSE | Apache License V. 2.0 |
| MS MARCO | [Page](https://microsoft.github.io/msmarco/ | same as source | Own Terms |
| Natural Questions | [Page](https://ai.google.com/research/NaturalQuestions/download | same as source | CC BY-SA 3.0 |
| ELI5 | https://github.com/facebookresearch/ELI5 or https://huggingface.co/datasets/eli5 (used) | same as source | Hosting not allowed |
| SearchQA | https://github.com/nyu-dl/dl4ir-searchQA | same as source | No information |
| SQuAD 2.0 | https://rajpurkar.github.io/SQuAD-explorer/ | same as source | CC BY-SA 4.0 |
| NewsQA | https://github.com/Maluuba/newsqa | same as source | Own Terms |
| HotpotQA | https://hotpotqa.github.io/ | same as source | CC BY-SA 4.0 |
| TriviaQA | https://nlp.cs.washington.edu/triviaqa/index.html | same as source | No information |

`ELI5` is also available in Hugging Face https://huggingface.co/datasets/eli5 that contains a script for downloading the data. This blog post provides a guide of how to download the data as well: https://yjernite.github.io/lfqa.html (was used).

_Example to obtain the `ELI5` data_

```
pip install nlp

import nlp
eli5 = nlp.load_dataset('eli5')

train_set = eli5['train_eli5']
val_set = eli5['validation_eli5']
```

Use the [regex rules](rules/causal-rules.ipynb) to identify causal questions.