Unverified Commit 6fbebb4b authored by Angelika Romanou's avatar Angelika Romanou Committed by GitHub
Browse files

Add INCLUDE tasks (#2769)



* Add INCLUDE tasks

* pacify pre-commit

---------
Co-authored-by: default avatarBaber <baber@hey.com>
parent bb4fa95e
# INCLUDE
### Paper
Title: `INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge`
Abstract: [https://arxiv.org/abs/2411.19799](https://arxiv.org/abs/2411.19799)
INCLUDE is a comprehensive knowledge- and reasoning-centric benchmark across 44 languages that evaluates multilingual LLMs for performance in the actual language environments where they would be deployed. It contains 22,637 4-option multiple-choice-questions (MCQ) extracted from academic and professional exams, covering 57 topics, including regional knowledge.
> 🤗 [CohereForAI/include-base-44](https://huggingface.co/datasets/CohereForAI/include-base-44): Benchmark which supports 44 languages, each with 500 regional samples and 50 region-agnostic ones.
### Tasks:
We add the following evaluations:
- prompting with instructions in english for the 0-shot setting (`default`)
- prompting with instructions in english for the 5-shot setting (`few_shot_en`)
- prompting with instructions in the in-sample language for the 5-shot setting (`few_shot_og`)
### Languages
Albanian, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Chinese, Croatian, Dutch, Estonian, Finnish, French, Georgian, German, Greek, Hebrew, Hindi, Hungarian, Indonesia, Italian, Japanese, Kazakh, Korean, Lithuanian, Malay, Malayalam, Nepali, North Macedonian, Persian, Polish, Portuguese, russian, Serbian, Spanish, Tagalog, Tamil, Telugu, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese
### Domains
**Academic:** Accounting, Agriculture, Anthropology, Architecture and Design, Arts & Humanities, Biology, Business administration, Business ethics, Business, Chemistry, Computer Science, Culturology, Earth science, Economics, Education, Engineering, Environmental studies and forestry, Family and consumer science, Finance, Geography, Health, History, Human physical performance and recreation, Industrial and labor relations, International trade, Journalism, media studies, and communication, Language, Law, Library and museum studies, Literature, Logic, Management, Marketing, Math, Medicine, Military Sciences, Multiple exams, Performing arts, Philosophy, Physics, Political sciences, Psychology, Public Administration, Public Policy, Qualimetry, Religious studies, Risk management and insurance, Social Work, Social work, Sociology, STEM, Transportation, Visual Arts
**Licenses:** Driving License, Marine License, Medical License, Professional Certifications
### Citation
```bibtex
@article{romanou2024include,
title={INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge},
author={Angelika Romanou and Negar Foroutan and Anna Sotnikova and Zeming Chen and Sree Harsha Nelaturu and Shivalika Singh and Rishabh Maheshwary and Micol Altomare and Mohamed A Haggag and Imanol Schlag and Marzieh Fadaee and Sara Hooker and Antoine Bosselut and others},
journal={ICLR},
year={2024},
primaryClass={cs.CL},
eprint={2411.19799},
url={https://arxiv.org/abs/2411.19799},
}
```
dataset_path: CohereForAI/include-base-44
dataset_name: Albanian
test_split: test
output_type: multiple_choice
doc_to_text: "{{question.strip()}}\nA. {{option_a}}\nB. {{option_b}}\nC. {{option_c}}\n
D. {{option_d}}\nAnswer:"
doc_to_choice:
- A
- B
- C
- D
doc_to_target: answer
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
metadata:
version: 0.0
group: include_base_44_albanian
task:
- include_base_44_albanian_arts_humanities
- include_base_44_albanian_stem
- include_base_44_albanian_business_commerce
- include_base_44_albanian_social_science
- include_base_44_albanian_health_oriented_education
aggregate_metric_list:
- metric: acc
weight_by_size: true
metadata:
version: 0.0
include: _albanian_template_yaml
description: The following is multiple-choice question about Arts & Humanities.
process_docs: !function 'utils.process_arts_humanities'
task: include_base_44_albanian_arts_humanities
include: _albanian_template_yaml
description: The following is multiple-choice question about Business & Commerce.
process_docs: !function 'utils.process_business_commerce'
task: include_base_44_albanian_business_commerce
include: _albanian_template_yaml
description: The following is multiple-choice question about Health oriented education.
process_docs: !function 'utils.process_health_oriented_education'
task: include_base_44_albanian_health_oriented_education
include: _albanian_template_yaml
description: The following is multiple-choice question about Social Science.
process_docs: !function 'utils.process_social_science'
task: include_base_44_albanian_social_science
include: _albanian_template_yaml
description: The following is multiple-choice question about STEM.
process_docs: !function 'utils.process_stem'
task: include_base_44_albanian_stem
from functools import partial
CATEGORIES = [
"Applied Science",
"Arts & Humanities",
"Business & Commerce",
"Driving License",
"General knowledge",
"Health oriented education",
"Marine License",
"Medical License",
"Professional certification",
"STEM",
"Social Science",
]
def process_docs(dataset, category):
return dataset.filter(lambda x: x["domain"] == category)
process_functions = {
f"process_{category.lower().replace(' & ', '_').replace(' ', '_')}": partial(
process_docs, category=category
)
for category in CATEGORIES
}
globals().update(process_functions)
dataset_path: CohereForAI/include-base-44
dataset_name: Arabic
test_split: test
output_type: multiple_choice
doc_to_text: "{{question.strip()}}\nA. {{option_a}}\nB. {{option_b}}\nC. {{option_c}}\n
D. {{option_d}}\nAnswer:"
doc_to_choice:
- A
- B
- C
- D
doc_to_target: answer
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
metadata:
version: 0.0
group: include_base_44_arabic
task:
- include_base_44_arabic_arts_humanities
- include_base_44_arabic_stem
- include_base_44_arabic_social_science
- include_base_44_arabic_driving_license
- include_base_44_arabic_general_knowledge
- include_base_44_arabic_business_commerce
aggregate_metric_list:
- metric: acc
weight_by_size: true
metadata:
version: 0.0
include: _arabic_template_yaml
description: The following is multiple-choice question about Arts & Humanities.
process_docs: !function 'utils.process_arts_humanities'
task: include_base_44_arabic_arts_humanities
include: _arabic_template_yaml
description: The following is multiple-choice question about Business & Commerce.
process_docs: !function 'utils.process_business_commerce'
task: include_base_44_arabic_business_commerce
include: _arabic_template_yaml
description: The following is multiple-choice question about Driving License.
process_docs: !function 'utils.process_driving_license'
task: include_base_44_arabic_driving_license
include: _arabic_template_yaml
description: The following is multiple-choice question about General knowledge.
process_docs: !function 'utils.process_general_knowledge'
task: include_base_44_arabic_general_knowledge
include: _arabic_template_yaml
description: The following is multiple-choice question about Social Science.
process_docs: !function 'utils.process_social_science'
task: include_base_44_arabic_social_science
include: _arabic_template_yaml
description: The following is multiple-choice question about STEM.
process_docs: !function 'utils.process_stem'
task: include_base_44_arabic_stem
from functools import partial
CATEGORIES = [
"Applied Science",
"Arts & Humanities",
"Business & Commerce",
"Driving License",
"General knowledge",
"Health oriented education",
"Marine License",
"Medical License",
"Professional certification",
"STEM",
"Social Science",
]
def process_docs(dataset, category):
return dataset.filter(lambda x: x["domain"] == category)
process_functions = {
f"process_{category.lower().replace(' & ', '_').replace(' ', '_')}": partial(
process_docs, category=category
)
for category in CATEGORIES
}
globals().update(process_functions)
dataset_path: CohereForAI/include-base-44
dataset_name: Armenian
test_split: test
output_type: multiple_choice
doc_to_text: "{{question.strip()}}\nA. {{option_a}}\nB. {{option_b}}\nC. {{option_c}}\n
D. {{option_d}}\nAnswer:"
doc_to_choice:
- A
- B
- C
- D
doc_to_target: answer
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
metadata:
version: 0.0
group: include_base_44_armenian
task:
- include_base_44_armenian_driving_license
- include_base_44_armenian_social_science
- include_base_44_armenian_arts_humanities
- include_base_44_armenian_stem
aggregate_metric_list:
- metric: acc
weight_by_size: true
metadata:
version: 0.0
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment