add Arabic EXAMS benchmark (#1498)

* add Arabic EXAMS benchmark * fixed the linter issue, and add more information on the readme * Update README.md --------- Co-authored-by: Lintang Sutawika <lintang@sutawika.com>

add Arabic EXAMS benchmark (#1498)
* add Arabic EXAMS benchmark * fixed the linter issue, and add more information on the readme * Update README.md --------- Co-authored-by: Lintang Sutawika <lintang@sutawika.com>
4ab07597 · khalil · GitHub · 282b9e76 · 4ab07597 · 4ab07597
Unverified Commit 4ab07597 authored Mar 11, 2024 by khalil Committed by GitHub Mar 11, 2024
7 changed files
--- a/lm_eval/tasks/aexams/README.md
+++ b/lm_eval/tasks/aexams/README.md
+# Arabic EXAMS
+
+### Paper
+
+EXAMS: a resource specialized in multilingual high school exam questions.
+The original paper [EXAMS](https://aclanthology.org/2020.emnlp-main.438/)
+
+The Arabic EXAMS dataset includes five subjects
+
+  - Islamic studies
+  - Biology
+  - Physics
+  - Science
+  - Social
+
+The original dataset [EXAMS-QA](https://github.com/mhardalov/exams-qa)
+
+EXAMS is a benchmark dataset for cross-lingual and multilingual question answering for high school examinations.
+With 24,000 high-quality high school exam questions in 16 languages, covering 8 language families and 24 school subjects from Natural Sciences and Social Sciences, among others.
+EXAMS offers unique fine-grained evaluation framework across multiple languages and subjects
+
+Homepage for Arabic EXAMS: [EXAMS Arabic Homepage](https://github.com/FreedomIntelligence/AceGPT/tree/main/eval/benchmark_eval/benchmarks/EXAMS_Arabic)
+
+### Citation
+
+
+### Groups and Tasks
+
+#### Groups
+
+- `EXAMS Arabic`: include IslamicStudies, Biology, Science, Physics, Social.
+
+#### Tasks
+
+
+The following tasks evaluate subjects in Arabic EXAMS dataset using loglikelihood-based multiple-choice scoring:
+- `aexams_IslamicStudies`
+- `aexams_Biology`
+- `aexams_Science`
+- `aexams_Physics`
+- `aexams_Social`
+
+### Checklist
+
+* [x] Is the task an existing benchmark in the literature?
+  * [x] Have you referenced the original paper that introduced the task?
+  * [x] If yes, does the original paper provide a reference implementation?
+    * [x] Yes, original implementation contributed by author of the benchmark
+
+If other tasks on this dataset are already supported:
+* [x] Is the "Main" variant of this task clearly denoted?
+* [x] Have you provided a short sentence in a README on what each new variant adds / evaluates?
+* [x] Have you noted which, if any, published evaluation setups are matched by this variant?
--- a/lm_eval/tasks/aexams/_default_template_yaml
+++ b/lm_eval/tasks/aexams/_default_template_yaml
+group: aexams
+dataset_path: Hennara/aexams
+test_split: test
+fewshot_split: dev
+fewshot_config:
+  sampler: first_n
+output_type: multiple_choice
+doc_to_text: "{{question.strip()}}\nA. {{A}}\nB. {{B}}\nC. {{C}}\nD. {{D}}\nالجواب："
+doc_to_choice: ["A", "B", "C", "D"]
+doc_to_target: "{{['A', 'B', 'C', 'D'].index(answer)}}"
+metric_list:
+  - metric: acc
+    aggregation: mean
+    higher_is_better: true
+  - metric: acc_norm
+    aggregation: mean
+    higher_is_better: true
+metadata:
+  version: 0.0
--- a/lm_eval/tasks/aexams/aexams_Biology.yaml
+++ b/lm_eval/tasks/aexams/aexams_Biology.yaml
+"dataset_name": "Biology"
+"description": "قم بالإجابة على مايلي في مجال العلوم الحيوية\n\n"
+"include": "_default_template_yaml"
+"task": "aexams_Biology"
--- a/lm_eval/tasks/aexams/aexams_IslamicStudies.yaml
+++ b/lm_eval/tasks/aexams/aexams_IslamicStudies.yaml
+"dataset_name": "IslamicStudies"
+"description": "قم بالإجابة على مايلي في مجال العلوم الإسلامية \n\n"
+"include": "_default_template_yaml"
+"task": "aexams_IslamicStudies"
--- a/lm_eval/tasks/aexams/aexams_Physics.yaml
+++ b/lm_eval/tasks/aexams/aexams_Physics.yaml
+"dataset_name": "Physics"
+"description": "قم بالإجابة على مايلي في مجال الفيزياء \n\n"
+"include": "_default_template_yaml"
+"task": "aexams_Physics"
--- a/lm_eval/tasks/aexams/aexams_Science.yaml
+++ b/lm_eval/tasks/aexams/aexams_Science.yaml
+"dataset_name": "Science"
+"description": "قم بالإجابة على مايلي في مجال العلوم \n\n"
+"include": "_default_template_yaml"
+"task": "aexams_Science"
--- a/lm_eval/tasks/aexams/aexams_Social.yaml
+++ b/lm_eval/tasks/aexams/aexams_Social.yaml
+"dataset_name": "Social"
+"description": "قم بالإجابة على مايلي في مجال العلوم الإجتماعية \n\n"
+"include": "_default_template_yaml"
+"task": "aexams_Social"