{"id":13754163,"url":"https://github.com/pubmedqa/pubmedqa","last_synced_at":"2025-05-09T22:31:13.688Z","repository":{"id":59485180,"uuid":"203997402","full_name":"pubmedqa/pubmedqa","owner":"pubmedqa","description":"PubMedQA: A Dataset for Biomedical Research Question Answering","archived":false,"fork":false,"pushed_at":"2023-04-18T13:19:13.000Z","size":704,"stargazers_count":231,"open_issues_count":0,"forks_count":27,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-08-03T09:06:55.716Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://pubmedqa.github.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pubmedqa.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2019-08-23T12:48:40.000Z","updated_at":"2024-08-02T00:47:10.000Z","dependencies_parsed_at":"2024-02-24T10:34:10.554Z","dependency_job_id":"6b3d64f5-35d2-4631-bbbb-64ba279cd388","html_url":"https://github.com/pubmedqa/pubmedqa","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pubmedqa%2Fpubmedqa","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pubmedqa%2Fpubmedqa/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pubmedqa%2Fpubmedqa/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pubmedqa%2Fpubmedqa/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pubmedqa","download_url":"https://codeload.github.com/pubmedqa/pubmedqa/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224884612,"owners_count":17386121,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T09:01:46.120Z","updated_at":"2024-11-16T06:31:48.294Z","avatar_url":"https://github.com/pubmedqa.png","language":"Python","funding_links":[],"categories":["A01_文本生成_文本对话","Information Extraction and NLP","🧪 Scientific Pretraining, SFT, Reasoning, and Agent Datasets","Datasets \u0026 Benchmarks"],"sub_categories":["大语言对话模型及数据","🧬 Life Sciences","Text + BioMulti"],"readme":"# PubMedQA\n\n## Download\nPQA-L is already in `./data/`\n\n[PQA-U](https://drive.google.com/open?id=1RsGLINVce-0GsDkCLDuLZmoLuzfmoCuQ)\n\n[PQA-A](https://drive.google.com/open?id=15v1x6aQDlZymaHGP7cZJZZYFfeJt2NdS)\n\n## Split the dataset\nAfter downloading PQA-A and PQA-U as `ori_pqaa.json` and `ori_pqau.json` in the `./data/`, enter the `./preprocess/` directory and split the dataset:\n\n```bash\ncd preprocess\npython split_dataset.py pqaa\npython split_dataset.py pqal\n```\n\nPlease be aware that there is no offical code for splitting PQA-U.\n\n## Evaluation and submission\nTo evaluate your model predictions, please prepare the results in a json format where the key is PMID and value is one of \"yes\", \"no\", and \"maybe\". Run the following script to get the performance:\n\n```bash\npython evaluation.py PREDICTIONS_PATH\n```\n\nTo submit a system on the Leaderboard, please send an email that contains the model predictions and a brief description of the system to Qiao Jin via [qiaojin.andy@gmail.com](mailto:qiaojin.andy@gmail.com).\n\n\n## Human performance\nAfter splitting the PQA-L and having `./data/test_set.json`, one can run the following script to get human performance:\n\n```bash\npython get_human_performance.py\n```\n\n## Citation\nIf you use PubMedQA in your research, please cite our paper by:\n```\n@inproceedings{jin2019pubmedqa,\n  title={PubMedQA: A Dataset for Biomedical Research Question Answering},\n  author={Jin, Qiao and Dhingra, Bhuwan and Liu, Zhengping and Cohen, William and Lu, Xinghua},\n  booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)},\n  pages={2567--2577},\n  year={2019}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpubmedqa%2Fpubmedqa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpubmedqa%2Fpubmedqa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpubmedqa%2Fpubmedqa/lists"}