Unverified Commit 4ab07597 authored by khalil's avatar khalil Committed by GitHub
Browse files

add Arabic EXAMS benchmark (#1498)



* add Arabic EXAMS benchmark

* fixed the linter issue, and add more information on the readme

* Update README.md

---------
Co-authored-by: default avatarLintang Sutawika <lintang@sutawika.com>
parent 282b9e76
# Arabic EXAMS
### Paper
EXAMS: a resource specialized in multilingual high school exam questions.
The original paper [EXAMS](https://aclanthology.org/2020.emnlp-main.438/)
The Arabic EXAMS dataset includes five subjects
- Islamic studies
- Biology
- Physics
- Science
- Social
The original dataset [EXAMS-QA](https://github.com/mhardalov/exams-qa)
EXAMS is a benchmark dataset for cross-lingual and multilingual question answering for high school examinations.
With 24,000 high-quality high school exam questions in 16 languages, covering 8 language families and 24 school subjects from Natural Sciences and Social Sciences, among others.
EXAMS offers unique fine-grained evaluation framework across multiple languages and subjects
Homepage for Arabic EXAMS: [EXAMS Arabic Homepage](https://github.com/FreedomIntelligence/AceGPT/tree/main/eval/benchmark_eval/benchmarks/EXAMS_Arabic)
### Citation
### Groups and Tasks
#### Groups
- `EXAMS Arabic`: include IslamicStudies, Biology, Science, Physics, Social.
#### Tasks
The following tasks evaluate subjects in Arabic EXAMS dataset using loglikelihood-based multiple-choice scoring:
- `aexams_IslamicStudies`
- `aexams_Biology`
- `aexams_Science`
- `aexams_Physics`
- `aexams_Social`
### Checklist
* [x] Is the task an existing benchmark in the literature?
* [x] Have you referenced the original paper that introduced the task?
* [x] If yes, does the original paper provide a reference implementation?
* [x] Yes, original implementation contributed by author of the benchmark
If other tasks on this dataset are already supported:
* [x] Is the "Main" variant of this task clearly denoted?
* [x] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [x] Have you noted which, if any, published evaluation setups are matched by this variant?
group: aexams
dataset_path: Hennara/aexams
test_split: test
fewshot_split: dev
fewshot_config:
sampler: first_n
output_type: multiple_choice
doc_to_text: "{{question.strip()}}\nA. {{A}}\nB. {{B}}\nC. {{C}}\nD. {{D}}\nالجواب:"
doc_to_choice: ["A", "B", "C", "D"]
doc_to_target: "{{['A', 'B', 'C', 'D'].index(answer)}}"
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
metadata:
version: 0.0
"dataset_name": "Biology"
"description": "قم بالإجابة على مايلي في مجال العلوم الحيوية\n\n"
"include": "_default_template_yaml"
"task": "aexams_Biology"
"dataset_name": "IslamicStudies"
"description": "قم بالإجابة على مايلي في مجال العلوم الإسلامية \n\n"
"include": "_default_template_yaml"
"task": "aexams_IslamicStudies"
"dataset_name": "Physics"
"description": "قم بالإجابة على مايلي في مجال الفيزياء \n\n"
"include": "_default_template_yaml"
"task": "aexams_Physics"
"dataset_name": "Science"
"description": "قم بالإجابة على مايلي في مجال العلوم \n\n"
"include": "_default_template_yaml"
"task": "aexams_Science"
"dataset_name": "Social"
"description": "قم بالإجابة على مايلي في مجال العلوم الإجتماعية \n\n"
"include": "_default_template_yaml"
"task": "aexams_Social"
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment