Unverified Commit 2b26690f authored by Ben Shoham Ofir's avatar Ben Shoham Ofir Committed by GitHub
Browse files

Added MedConceptsQA Benchmark (#2010)



* Added MedConceptsQA Benchmark

* pre-commit factor

* update group name

* update in naming

* changed name

* Changed mcqa to med_concepts_qa prefix

* Added med_concepts_qa to README.md

* Changed config files according the new format

* Updated README

---------
Co-authored-by: default avatarlintangsutawika <lintang@eleuther.ai>
parent a7a2923f
...@@ -62,6 +62,7 @@ ...@@ -62,6 +62,7 @@
| [logiqa2](logiqa2/README.md) | Large-scale logical reasoning dataset adapted from the Chinese Civil Service Examination. | English, Chinese | | [logiqa2](logiqa2/README.md) | Large-scale logical reasoning dataset adapted from the Chinese Civil Service Examination. | English, Chinese |
| [mathqa](mathqa/README.md) | Question answering tasks involving mathematical reasoning and problem-solving. | English | | [mathqa](mathqa/README.md) | Question answering tasks involving mathematical reasoning and problem-solving. | English |
| [mc_taco](mc_taco/README.md) | Question-answer pairs that require temporal commonsense comprehension. | English | | [mc_taco](mc_taco/README.md) | Question-answer pairs that require temporal commonsense comprehension. | English |
| [med_concepts_qa](med_concepts_qa/README.md) | Benchmark for evaluating LLMs on their abilities to interpret medical codes and distinguish between medical concept. | English |
| medmcqa | Medical multiple choice questions assessing detailed medical knowledge. | English | | medmcqa | Medical multiple choice questions assessing detailed medical knowledge. | English |
| medqa | Multiple choice question answering based on the United States Medical License Exams. | | | medqa | Multiple choice question answering based on the United States Medical License Exams. | |
| [mgsm](mgsm/README.md) | Benchmark of multilingual grade-school math problems. | Spanish, French, German, Russian, Chinese, Japanese, Thai, Swahili, Bengali, Telugu | | [mgsm](mgsm/README.md) | Benchmark of multilingual grade-school math problems. | Spanish, French, German, Russian, Chinese, Japanese, Thai, Swahili, Bengali, Telugu |
......
# MedConceptsQA
### Paper
Title: `MedConceptsQA: Open Source Medical Concepts QA Benchmark`
Abstract: https://arxiv.org/abs/2405.07348
MedConceptsQA is a dedicated open source benchmark for medical concepts question answering. The benchmark comprises of questions of various medical concepts across different vocabularies: diagnoses, procedures, and drugs.
The questions are categorized into three levels of difficulty: easy, medium, and hard.
Our benchmark serves as a valuable resource for evaluating the
abilities of Large Language Models to interpret medical codes and distinguish
between medical concepts.
### Citation
```
@article{shoham2024medconceptsqa,
title={MedConceptsQA--Open Source Medical Concepts QA Benchmark},
author={Shoham, Ofir Ben and Rappoport, Nadav},
journal={arXiv preprint arXiv:2405.07348},
year={2024}
}
```
### Groups and Tasks
#### Groups
* `med_concepts_qa`: Contains all the QA tasks (diagnosis, procedures ,and drugs).
#### Tasks
* `med_concepts_qa_icd9cm` - ICD9-CM (diagnosis codes, ICD9 format) question-answering. This involves providing information, clarifications, and answering questions related to ICD-9-CM (International Classification of Diseases, 9th Revision, Clinical Modification) diagnosis codes.
* `med_concepts_qa_icd10cm` - ICD10-CM (diagnosis codes, ICD10 format) question-answering. This involves providing information, clarifications, and answering questions related to ICD-10-CM (International Classification of Diseases, 10th Revision, Clinical Modification) diagnosis codes.
* `med_concepts_qa_icd9proc` - ICD9-Proc (procedure codes, ICD9 format) question-answering. This involves providing information, clarifications, and answering questions related to ICD-9-PCS (International Classification of Diseases, 9th Revision, Procedure Coding System) procedure codes.
* `med_concepts_qa_icd10proc` - ICD10-Proc (procedure codes, ICD10 format) question-answering. This involves providing information, clarifications, and answering questions related to ICD-10-PCS (International Classification of Diseases, 10th Revision, Procedure Coding System) procedure codes.
* `med_concepts_qa_atc` - ATC (Anatomical Therapeutic Chemical Classification System) question-answering. This involves providing information, clarifications, and answering questions related to the ATC classification system, which is used for the classification of drugs and other medical products according to the organ or system on which they act and their therapeutic, pharmacological, and chemical properties.
dataset_path: ofir408/MedConceptsQA
output_type: multiple_choice
description: "Answer A,B,C,D according to the answer to this multiple choice question.\n"
fewshot_split: dev
fewshot_config:
sampler: first_n
num_fewshot: 4
test_split: test
doc_to_text: "{{question}}\nAnswer:"
doc_to_target: answer_id
doc_to_choice: ['A', 'B', 'C', 'D']
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
from typing import List
import yaml
def generate_yaml_content(vocab_name: str, level: str):
content = {
"dataset_name": f"{vocab_name}_{level}",
"tag": f"med_concepts_qa_{vocab_name}_tasks",
"include": "_default_template_yaml",
"task": f"med_concepts_qa_{vocab_name}_{level}",
"task_alias": f"{vocab_name}_{level}",
}
return content
def generate_yaml_files(
vocab_names: List[str], levels: List[str], file_name_prefix: str
):
for vocab_name in vocab_names:
for level in levels:
yaml_content = generate_yaml_content(vocab_name, level)
filename = f"{file_name_prefix}_{vocab_name}_{level}.yaml"
with open(filename, "w") as yaml_file:
yaml.dump(yaml_content, yaml_file, default_flow_style=False)
print(f"Done to generated {filename}")
if __name__ == "__main__":
generate_yaml_files(
vocab_names=["icd9cm", "icd10cm", "icd9proc", "icd10proc", "atc"],
levels=["easy", "medium", "hard"],
file_name_prefix="med_concepts_qa",
)
group: med_concepts_qa
task:
- med_concepts_qa_icd9cm
- med_concepts_qa_icd10cm
- med_concepts_qa_icd9proc
- med_concepts_qa_icd10proc
- med_concepts_qa_atc
aggregate_metric_list:
- metric: acc
aggregation: mean
group: med_concepts_qa_atc
task:
- med_concepts_qa_atc_tasks
aggregate_metric_list:
- metric: acc
aggregation: mean
\ No newline at end of file
group: med_concepts_qa_icd10cm
task:
- med_concepts_qa_icd10cm_tasks
aggregate_metric_list:
- metric: acc
aggregation: mean
group: med_concepts_qa_icd10proc
task:
- med_concepts_qa_icd10proc_tasks
aggregate_metric_list:
- metric: acc
aggregation: mean
\ No newline at end of file
group: med_concepts_qa_icd9cm
task:
- med_concepts_qa_icd9cm_tasks
aggregate_metric_list:
- metric: acc
aggregation: mean
\ No newline at end of file
group: med_concepts_qa_icd9proc
task:
- med_concepts_qa_icd9proc_tasks
aggregate_metric_list:
- metric: acc
aggregation: mean
\ No newline at end of file
dataset_name: atc_easy
include: _default_template_yaml
tag: med_concepts_qa_atc_tasks
task: med_concepts_qa_atc_easy
task_alias: atc_easy
dataset_name: atc_hard
include: _default_template_yaml
tag: med_concepts_qa_atc_tasks
task: med_concepts_qa_atc_hard
task_alias: atc_hard
dataset_name: atc_medium
include: _default_template_yaml
tag: med_concepts_qa_atc_tasks
task: med_concepts_qa_atc_medium
task_alias: atc_medium
dataset_name: icd10cm_easy
include: _default_template_yaml
tag: med_concepts_qa_icd10cm_tasks
task: med_concepts_qa_icd10cm_easy
task_alias: icd10cm_easy
dataset_name: icd10cm_hard
include: _default_template_yaml
tag: med_concepts_qa_icd10cm_tasks
task: med_concepts_qa_icd10cm_hard
task_alias: icd10cm_hard
dataset_name: icd10cm_medium
include: _default_template_yaml
tag: med_concepts_qa_icd10cm_tasks
task: med_concepts_qa_icd10cm_medium
task_alias: icd10cm_medium
dataset_name: icd10proc_easy
include: _default_template_yaml
tag: med_concepts_qa_icd10proc_tasks
task: med_concepts_qa_icd10proc_easy
task_alias: icd10proc_easy
dataset_name: icd10proc_hard
include: _default_template_yaml
tag: med_concepts_qa_icd10proc_tasks
task: med_concepts_qa_icd10proc_hard
task_alias: icd10proc_hard
dataset_name: icd10proc_medium
include: _default_template_yaml
tag: med_concepts_qa_icd10proc_tasks
task: med_concepts_qa_icd10proc_medium
task_alias: icd10proc_medium
dataset_name: icd9cm_easy
include: _default_template_yaml
tag: med_concepts_qa_icd9cm_tasks
task: med_concepts_qa_icd9cm_easy
task_alias: icd9cm_easy
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment