Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pubmedqa/pubmedqa
PubMedQA: A Dataset for Biomedical Research Question Answering
https://github.com/pubmedqa/pubmedqa
Last synced: 3 months ago
JSON representation
PubMedQA: A Dataset for Biomedical Research Question Answering
- Host: GitHub
- URL: https://github.com/pubmedqa/pubmedqa
- Owner: pubmedqa
- License: mit
- Created: 2019-08-23T12:48:40.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2023-04-18T13:19:13.000Z (over 1 year ago)
- Last Synced: 2024-06-11T05:34:25.059Z (5 months ago)
- Language: Python
- Homepage: https://pubmedqa.github.io
- Size: 688 KB
- Stars: 216
- Watchers: 5
- Forks: 27
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - pubmedqa/pubmedqa
README
# PubMedQA
## Download
PQA-L is already in `./data/`[PQA-U](https://drive.google.com/open?id=1RsGLINVce-0GsDkCLDuLZmoLuzfmoCuQ)
[PQA-A](https://drive.google.com/open?id=15v1x6aQDlZymaHGP7cZJZZYFfeJt2NdS)
## Split the dataset
After downloading PQA-A and PQA-U as `ori_pqaa.json` and `ori_pqau.json` in the `./data/`, enter the `./preprocess/` directory and split the dataset:```bash
cd preprocess
python split_dataset.py pqaa
python split_dataset.py pqal
```Please be aware that there is no offical code for splitting PQA-U.
## Evaluation and submission
To evaluate your model predictions, please prepare the results in a json format where the key is PMID and value is one of "yes", "no", and "maybe". Run the following script to get the performance:```bash
python evaluation.py PREDICTIONS_PATH
```To submit a system on the Leaderboard, please send an email that contains the model predictions and a brief description of the system to Qiao Jin via [[email protected]](mailto:[email protected]).
## Human performance
After splitting the PQA-L and having `./data/test_set.json`, one can run the following script to get human performance:```bash
python get_human_performance.py
```## Citation
If you use PubMedQA in your research, please cite our paper by:
```
@inproceedings{jin2019pubmedqa,
title={PubMedQA: A Dataset for Biomedical Research Question Answering},
author={Jin, Qiao and Dhingra, Bhuwan and Liu, Zhengping and Cohen, William and Lu, Xinghua},
booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)},
pages={2567--2577},
year={2019}
}
```