Add the Arabic version with refactor to Arabic pica to be in alghafa folder (#1940)

305fb636 · khalil · GitHub · bea1a859 · 305fb636 · 305fb636
Unverified Commit 305fb636 authored Jun 10, 2024 by khalil Committed by GitHub Jun 10, 2024
4 changed files
--- a/lm_eval/tasks/alghafa/copa_ar/README.md
+++ b/lm_eval/tasks/alghafa/copa_ar/README.md
+#Arabic COPA
+
+### Paper
+
+Original Title: `COPA`
+
+
+
+The Choice Of Plausible Alternatives (COPA) evaluation provides researchers with a tool for assessing progress in open-domain commonsense causal reasoning.
+
+[Homepage](https://people.ict.usc.edu/~gordon/copa.html)
+
+AlGhafa has translated this dataset to Arabic[AlGafa](https://aclanthology.org/2023.arabicnlp-1.21.pdf)
+
+The link to the Arabic version of the dataset [PICA](https://gitlab.com/tiiuae/alghafa/-/tree/main/arabic-eval/copa_ar)
+
+### Citation
+
+### Groups and Tasks
+
+#### Groups
+
+* Not part of a group yet.
+
+#### Tasks
+
+* `copa_ar`
+
+### Checklist
+
+For adding novel benchmarks/datasets to the library:
+* [x] Is the task an existing benchmark in the literature?
+  * [x] Have you referenced the original paper that introduced the task?
+  * [x] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
+
+
+If other tasks on this dataset are already supported:
+* [x] Is the "Main" variant of this task clearly denoted?
+* [x] Have you provided a short sentence in a README on what each new variant adds / evaluates?
+* [x] Have you noted which, if any, published evaluation setups are matched by this variant?
--- a/lm_eval/tasks/alghafa/copa_ar/copa_ar.yaml
+++ b/lm_eval/tasks/alghafa/copa_ar/copa_ar.yaml
+task: copa_ar
+dataset_path: Hennara/copa_ar
+dataset_name: null
+output_type: multiple_choice
+training_split: null
+validation_split: null
+test_split: test
+doc_to_text: "السؤال: {{query}}\nالجواب:"
+doc_to_choice: "{{[sol1, sol2]}}"
+doc_to_target: label
+should_decontaminate: true
+doc_to_decontamination_query: query
+metric_list:
+  - metric: acc
+    aggregation: mean
+    higher_is_better: true
+  - metric: acc_norm
+    aggregation: mean
+    higher_is_better: true
+metadata:
+  version: 1.0
--- a/lm_eval/tasks/piqa_ar/README.md
+++ b/lm_eval/tasks/piqa_ar/README.md
--- a/lm_eval/tasks/piqa_ar/piqa_ar.yaml
+++ b/lm_eval/tasks/piqa_ar/piqa_ar.yaml