Unverified Commit 7257aa2e authored by Maxime's avatar Maxime Committed by GitHub
Browse files

Multiple Choice Questions and Large Languages Models: A Case Study with...

Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data (#1867)

* glianorex tasks

* Create README.md

* Update README.md

* Update README.md

* fix formatting

* fix internal formatting
parent 070d31df
# Glianorex
The goal of this benchmark is to isolate the test answering capabilities from the content knowledge.
### Paper
Title: Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data
Abstract: https://arxiv.org/abs/2406.02394
To test the relevance of MCQs to assess LLM performance without prior data exposure, we created a fictional medical benchmark and knowledge base on a non-existent gland, the Glianorex. Using GPT-4 we generated a comprehensive textbook on the Glianorex in both English and French, and created multiple-choice questions in both English and French.
### Tasks
All tasks are multiple choice questions with 4 options, only one correct option.
- `glianorex`: Evaluates all tasks listed below.
- `glianorex_en`: Evaluates the accuracy on 264 questions in English.
- `glianorex_fr`: Evaluates the accuracy on 264 questions in French.
task: glianorex
dataset_path: maximegmd/glianorex
output_type: multiple_choice
test_split: train
doc_to_text: !function preprocess_glianorex.doc_to_text
doc_to_target: !function preprocess_glianorex.doc_to_target
doc_to_choice: [ 'A', 'B', 'C', 'D' ]
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
task: glianorex_en
dataset_path: maximegmd/glianorex
output_type: multiple_choice
test_split: train
doc_to_text: !function preprocess_glianorex.doc_to_text
doc_to_target: !function preprocess_glianorex.doc_to_target
process_docs: !function preprocess_glianorex.filter_english
doc_to_choice: [ 'A', 'B', 'C', 'D' ]
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
task: glianorex_fr
dataset_path: maximegmd/glianorex
output_type: multiple_choice
test_split: train
doc_to_text: !function preprocess_glianorex.doc_to_text
doc_to_target: !function preprocess_glianorex.doc_to_target
process_docs: !function preprocess_glianorex.filter_french
doc_to_choice: [ 'A', 'B', 'C', 'D' ]
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
import datasets
def doc_to_text(doc) -> str:
option_choices = doc["options"]
answers = "".join((f"{k}. {v}\n") for k, v in option_choices.items())
return f"Question: {doc['question']}\n{answers}Answer:"
def doc_to_target(doc) -> int:
return doc["answer_idx"]
def filter_dataset(dataset: datasets.Dataset, lang: str) -> datasets.Dataset:
return dataset.filter(lambda example: example["language"].startswith(lang))
def filter_french(dataset: datasets.Dataset) -> datasets.Dataset:
return filter_dataset(dataset, "fr")
def filter_english(dataset: datasets.Dataset) -> datasets.Dataset:
return filter_dataset(dataset, "en")
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment