Add new benchmark: Basque bench (#2153)

* Add basque_bench * Add flores_eu group * Update _flores_common_yaml * Run linters, updated flores, mgsm, copa, and readme * Apply suggestions from code review Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com> --------- Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com> Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

Add new benchmark: Basque bench (#2153)
* Add basque_bench * Add flores_eu group * Update _flores_common_yaml * Run linters, updated flores, mgsm, copa, and readme * Apply suggestions from code review Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com> --------- Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com> Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
c887796d · zxcvuser · GitHub · 0e763862 · c887796d · c887796d
Unverified Commit c887796d authored Oct 04, 2024 by zxcvuser Committed by GitHub Oct 04, 2024
7 changed files
--- a/lm_eval/tasks/basque_bench/flores_eu/flores_it-eu.yaml
+++ b/lm_eval/tasks/basque_bench/flores_eu/flores_it-eu.yaml
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_it-eu
+doc_to_text: 'Italian sentence: {{sentence_ita_Latn}}
+
+  Basque sentence:'
+doc_to_target: '{{sentence_eus_Latn}}'
--- a/lm_eval/tasks/basque_bench/flores_eu/flores_pt-eu.yaml
+++ b/lm_eval/tasks/basque_bench/flores_eu/flores_pt-eu.yaml
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_pt-eu
+doc_to_text: 'Portuguese sentence: {{sentence_por_Latn}}
+
+  Basque sentence:'
+doc_to_target: '{{sentence_eus_Latn}}'
--- a/lm_eval/tasks/basque_bench/mgsm_cot_native_eu.yaml
+++ b/lm_eval/tasks/basque_bench/mgsm_cot_native_eu.yaml
+task: mgsm_native_cot_eu
+dataset_path: HiTZ/MGSM-eu
+dataset_name: null
+doc_to_target: '{% if answer is not none %}{{answer[27:]}}{% else %}{{answer_number|string}}{%endif %}'
+doc_to_text: '{% if answer is not none %}{{question+"\nErantzuna urratsez urrats:"}}{% else %}{{"Galdera: "+question+"\nErantzuna urratsez urrats:"}}{% endif %}'
+output_type: generate_until
+training_split: train
+test_split: test
+target_delimiter: " "
+generation_kwargs:
+  until:
+    - "\n\n"
+    - "\n"
+    - "Galdera:"
+    - </s>
+    - <|im_end|>
+  do_sample: false
+  temperature: 0.0
+filter_list:
+  - name: "get-answer"
+    filter:
+      - function: "regex"
+        regex_pattern: "Erantzuna [$%]? ?(-?[0-9]+([ .,][0-9.,]+)?) ?[$%]? da"
+      - function: "take_first"
+metric_list:
+  - metric: exact_match
+    aggregation: mean
+    higher_is_better: true
+    ignore_case: true
+    ignore_punctuation: true
+    regexes_to_ignore:
+      - " "
+metadata:
+  version: 1.0
--- a/lm_eval/tasks/basque_bench/mgsm_direct_eu.yaml
+++ b/lm_eval/tasks/basque_bench/mgsm_direct_eu.yaml
+task: mgsm_direct_eu
+dataset_path: HiTZ/MGSM-eu
+dataset_name: null
+doc_to_target: '{{answer_number|string}}'
+doc_to_text: '{% if answer is not none %}{{question+"\nErantzuna:"}}{% else %}{{"Galdera: "+question+"\nErantzuna:"}}{% endif %}'
+output_type: generate_until
+training_split: train
+test_split: test
+target_delimiter: " "
+generation_kwargs:
+  until:
+    - "\n\n"
+    - "\n"
+    - "Galdera:"
+    - </s>
+    - <|im_end|>
+  do_sample: false
+  temperature: 0.0
+filter_list:
+  - name: remove_whitespace
+    filter:
+      - function: remove_whitespace
+      - function: take_first
+  - name: flexible-extract
+    filter:
+    - function: regex
+      group_select: -1
+      regex_pattern: (-?[0-9]+([ .,][0-9.,]+)?)
+    - function: take_first
+metric_list:
+  - metric: exact_match
+    aggregation: mean
+    higher_is_better: true
+    ignore_case: true
+    ignore_punctuation: true
+    regexes_to_ignore:
+      - " "
+metadata:
+  version: 1.0
--- a/lm_eval/tasks/basque_bench/utils.py
+++ b/lm_eval/tasks/basque_bench/utils.py
+from functools import partial
+
+
+# ~~~~~~~~~~~ XCOPA ~~~~~~~~~~~ #
+
+xcopa_connectors = {"cause": " Izan ere,", "effect": " Beraz,"}
+
+
+def xcopa_doc_to_text(doc):
+    conn = xcopa_connectors[doc["question"]]
+    return doc["premise"].strip() + f"{conn}"
+
+
+def xcopa_doc_to_choice(doc):
+    def convert_choice(choice):
+        return choice[0].lower() + choice[1:]
+
+    return [convert_choice(doc["choice1"]), convert_choice(doc["choice2"])]
+
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
--- a/lm_eval/tasks/basque_bench/wnli_eu.yaml
+++ b/lm_eval/tasks/basque_bench/wnli_eu.yaml
+task: wnli_eu
+dataset_path: HiTZ/wnli-eu
+dataset_name: null
+output_type: multiple_choice
+training_split: null
+validation_split: validation
+test_split: null
+doc_to_text: "{{sentence1}}\nGaldera: {{sentence2}} Egia edo Gezurra?\nErantzuna:"
+doc_to_target: label
+doc_to_choice: ["Gezurra", "Egia"]
+metric_list:
+  - metric: acc
+metadata:
+  version: 1.0
--- a/lm_eval/tasks/basque_bench/xcopa_eu.yaml
+++ b/lm_eval/tasks/basque_bench/xcopa_eu.yaml
+task: xcopa_eu
+dataset_path: HiTZ/XCOPA-eu
+dataset_name: null
+output_type: multiple_choice
+training_split: null
+validation_split: validation
+test_split: test
+doc_to_text: !function utils.xcopa_doc_to_text
+doc_to_target: label
+doc_to_choice: !function utils.xcopa_doc_to_choice
+metric_list:
+  - metric: acc
+metadata:
+  version: 1.0