# PubMedQA ### Paper Title: `PubMedQA: A Dataset for Biomedical Research Question Answering` Abstract: https://arxiv.org/abs/1909.06146 PubMedQA is a novel biomedical question answering (QA) dataset collected from PubMed abstracts. The task of PubMedQA is to answer research questions with yes/no/maybe (e.g.: Do preoperative statins reduce atrial fibrillation after coronary artery bypass grafting?) using the corresponding abstracts. PubMedQA has 1k expert-annotated, 61.2k unlabeled and 211.3k artificially generated QA instances. Each PubMedQA instance is composed of (1) a question which is either an existing research article title or derived from one, (2) a context which is the corresponding abstract without its conclusion, (3) a long answer, which is the conclusion of the abstract and, presumably, answers the research question, and (4) a yes/no/maybe answer which summarizes the conclusion. Homepage: https://pubmedqa.github.io/ ### Citation ``` @inproceedings{jin2019pubmedqa, title={PubMedQA: A Dataset for Biomedical Research Question Answering}, author={Jin, Qiao and Dhingra, Bhuwan and Liu, Zhengping and Cohen, William and Lu, Xinghua}, booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)}, pages={2567--2577}, year={2019} } ``` ### Groups and Tasks #### Groups * Not part of a group yet #### Tasks * `pubmed_qa` ### Checklist For adding novel benchmarks/datasets to the library: * [ ] Is the task an existing benchmark in the literature? * [ ] Have you referenced the original paper that introduced the task? * [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test? If other tasks on this dataset are already supported: * [ ] Is the "Main" variant of this task clearly denoted? * [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates? * [ ] Have you noted which, if any, published evaluation setups are matched by this variant?