https://github.com/andreaschandra/causalqa
CausalQA: A Benchmark for Causal Question Answering
https://github.com/andreaschandra/causalqa
Last synced: 4 months ago
JSON representation
CausalQA: A Benchmark for Causal Question Answering
- Host: GitHub
- URL: https://github.com/andreaschandra/causalqa
- Owner: andreaschandra
- Created: 2022-11-13T12:28:21.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-11-20T06:19:16.000Z (over 3 years ago)
- Last Synced: 2025-10-14T07:06:22.742Z (8 months ago)
- Language: Jupyter Notebook
- Size: 186 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# CausalQA: A Benchmark for Causal Question Answering
Fork from [official repository](https://github.com/webis-de/coling22-benchmark-for-causal-question-answering)
### Data
Download dataset and pretrained model [here](https://zenodo.org/record/7186761#.Y3DxncdBy5c)
You can [download](https://webis.de/data#webis-causalqa-22) the Webis-CausalQA-22 corpus. To recreate the ELI5 part, check instructions bellow.
The 10 datasets used to construct Webis-CausalQA-22 corpus:
| Dataset | Website | License | License type |
| :---------------- | :-------------------------------------------------------------------------------------- | :--------------------------------------------------- | :-------------------- |
| PAQ | [Page](https://github.com/facebookresearch/PAQ) | https://github.com/facebookresearch/PAQ#data-license | CC BY-SA 3.0 |
| GooAQ | [Page](https://github.com/allenai/gooaq/ | https://github.com/allenai/gooaq/blob/main/LICENSE | Apache License V. 2.0 |
| MS MARCO | [Page](https://microsoft.github.io/msmarco/ | same as source | Own Terms |
| Natural Questions | [Page](https://ai.google.com/research/NaturalQuestions/download | same as source | CC BY-SA 3.0 |
| ELI5 | https://github.com/facebookresearch/ELI5 or https://huggingface.co/datasets/eli5 (used) | same as source | Hosting not allowed |
| SearchQA | https://github.com/nyu-dl/dl4ir-searchQA | same as source | No information |
| SQuAD 2.0 | https://rajpurkar.github.io/SQuAD-explorer/ | same as source | CC BY-SA 4.0 |
| NewsQA | https://github.com/Maluuba/newsqa | same as source | Own Terms |
| HotpotQA | https://hotpotqa.github.io/ | same as source | CC BY-SA 4.0 |
| TriviaQA | https://nlp.cs.washington.edu/triviaqa/index.html | same as source | No information |
`ELI5` is also available in Hugging Face https://huggingface.co/datasets/eli5 that contains a script for downloading the data. This blog post provides a guide of how to download the data as well: https://yjernite.github.io/lfqa.html (was used).
_Example to obtain the `ELI5` data_
```
pip install nlp
import nlp
eli5 = nlp.load_dataset('eli5')
train_set = eli5['train_eli5']
val_set = eli5['validation_eli5']
```
Use the [regex rules](rules/causal-rules.ipynb) to identify causal questions.