PR fixing the issue #1391 (wrong contexts in the mgsm task) (#1440)

* fix the issue #1391, wrong contexts in mgsm tasks * fix yaml issue for having two target_delimiter lines. For COT tasks, keep the one with a space (default) * regenerate all task yaml files - change naming so that file name will match with task name - task|file follows a consistent naming way, mgsm_(mode)_(lang) for three modes, i.e., direct, en_cot, and native_cot * English CoTs should have a space as target_delimiter * Update utils.py * Apply suggestions from code review --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

PR fixing the issue #1391 (wrong contexts in the mgsm task) (#1440)
* fix the issue #1391, wrong contexts in mgsm tasks * fix yaml issue for having two target_delimiter lines. For COT tasks, keep the one with a space (default) * regenerate all task yaml files - change naming so that file name will match with task name - task|file follows a consistent naming way, mgsm_(mode)_(lang) for three modes, i.e., direct, en_cot, and native_cot * English CoTs should have a space as target_delimiter * Update utils.py * Apply suggestions from code review --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
a72babbf · Lei Chen · GitHub · 00dc9960 · a72babbf · a72babbf
Unverified Commit a72babbf authored Feb 22, 2024 by Lei Chen Committed by GitHub Feb 22, 2024
8 changed files
--- a/lm_eval/tasks/mgsm/native_cot/mgsm_native_cot_fr.yaml
+++ b/lm_eval/tasks/mgsm/native_cot/mgsm_native_cot_fr.yaml
+# Generated by utils.py
+dataset_name: fr
+doc_to_target: '{% if answer is not none %}{{answer[26:]}}{% else %}{{answer_number|string}}{% endif %}'
+doc_to_text: '{% if answer is not none %}{{question+"\nRéponse étape par étape :"}}{% else %}{{"Question : "+question+"\nRéponse étape par étape :"}}{% endif %}'
+filter_list:
+- filter:
+  - function: regex
+    regex_pattern: La réponse est (\-?[0-9\.\,]+)
+  - function: take_first
+  name: get-answer
+include: cot_yaml
+task: mgsm_native_cot_fr
--- a/lm_eval/tasks/mgsm/native_cot/mgsm_native_cot_ja.yaml
+++ b/lm_eval/tasks/mgsm/native_cot/mgsm_native_cot_ja.yaml
+# Generated by utils.py
+dataset_name: ja
+doc_to_target: '{% if answer is not none %}{{answer[11:]}}{% else %}{{answer_number|string}}{% endif %}'
+doc_to_text: '{% if answer is not none %}{{question+"\nステップごとの答え:"}}{% else %}{{"問題: "+question+"\nステップごとの答え:"}}{% endif %}'
+filter_list:
+- filter:
+  - function: regex
+    regex_pattern: 答えは(\-?[0-9\.\,]+)です。
+  - function: take_first
+  name: get-answer
+include: cot_yaml
+target_delimiter: ""
+task: mgsm_native_cot_ja
--- a/lm_eval/tasks/mgsm/native_cot/mgsm_native_cot_ru.yaml
+++ b/lm_eval/tasks/mgsm/native_cot/mgsm_native_cot_ru.yaml
+# Generated by utils.py
+dataset_name: ru
+doc_to_target: '{% if answer is not none %}{{answer[18:]}}{% else %}{{answer_number|string}}{% endif %}'
+doc_to_text: '{% if answer is not none %}{{question+"\nПошаговоерешение:"}}{% else %}{{"Задача: "+question+"\nПошаговоерешение:"}}{% endif %}'
+filter_list:
+- filter:
+  - function: regex
+    regex_pattern: Ответ — (\-?[0-9\.\,]+)
+  - function: take_first
+  name: get-answer
+include: cot_yaml
+task: mgsm_native_cot_ru
--- a/lm_eval/tasks/mgsm/native_cot/mgsm_native_cot_sw.yaml
+++ b/lm_eval/tasks/mgsm/native_cot/mgsm_native_cot_sw.yaml
+# Generated by utils.py
+dataset_name: sw
+doc_to_target: '{% if answer is not none %}{{answer[25:]}}{% else %}{{answer_number|string}}{% endif %}'
+doc_to_text: '{% if answer is not none %}{{question+"\nJibu la Hatua kwa Hatua:"}}{% else %}{{"Swali: "+question+"\nJibu la Hatua kwa Hatua:"}}{% endif %}'
+filter_list:
+- filter:
+  - function: regex
+    regex_pattern: Jibu ni (\-?[0-9\.\,]+)
+  - function: take_first
+  name: get-answer
+include: cot_yaml
+task: mgsm_native_cot_sw
--- a/lm_eval/tasks/mgsm/native_cot/mgsm_native_cot_te.yaml
+++ b/lm_eval/tasks/mgsm/native_cot/mgsm_native_cot_te.yaml
+# Generated by utils.py
+dataset_name: te
+doc_to_target: '{% if answer is not none %}{{answer[19:]}}{% else %}{{answer_number|string}}{% endif %}'
+doc_to_text: '{% if answer is not none %}{{question+"\nదశలవారీగా సమాధానం:"}}{% else %}{{"ప్రశ్న: "+question+"\nదశలవారీగా సమాధానం:"}}{% endif %}'
+filter_list:
+- filter:
+  - function: regex
+    regex_pattern: సమాధానం (\-?[0-9\.\,]+)
+  - function: take_first
+  name: get-answer
+include: cot_yaml
+task: mgsm_native_cot_te
--- a/lm_eval/tasks/mgsm/native_cot/mgsm_native_cot_th.yaml
+++ b/lm_eval/tasks/mgsm/native_cot/mgsm_native_cot_th.yaml
+# Generated by utils.py
+dataset_name: th
+doc_to_target: '{% if answer is not none %}{{answer[18:]}}{% else %}{{answer_number|string}}{% endif %}'
+doc_to_text: '{% if answer is not none %}{{question+"\nคำตอบทีละขั้นตอน:"}}{% else %}{{"โจทย์: "+question+"\nคำตอบทีละขั้นตอน:"}}{% endif %}'
+filter_list:
+- filter:
+  - function: regex
+    regex_pattern: คำตอบคือ (\-?[0-9\.\,]+)
+  - function: take_first
+  name: get-answer
+include: cot_yaml
+task: mgsm_native_cot_th
--- a/lm_eval/tasks/mgsm/native_cot/mgsm_native_cot_zh.yaml
+++ b/lm_eval/tasks/mgsm/native_cot/mgsm_native_cot_zh.yaml
+# Generated by utils.py
+dataset_name: zh
+doc_to_target: '{% if answer is not none %}{{answer[6:]}}{% else %}{{answer_number|string}}{% endif %}'
+doc_to_text: '{% if answer is not none %}{{question+"\n逐步解答:"}}{% else %}{{"问题: "+question+"\n逐步解答:"}}{% endif %}'
+filter_list:
+- filter:
+  - function: regex
+    regex_pattern: 答案是 (\-?[0-9\.\,]+)。
+  - function: take_first
+  name: get-answer
+include: cot_yaml
+target_delimiter: ""
+task: mgsm_native_cot_zh
--- a/lm_eval/tasks/mgsm/utils.py
+++ b/lm_eval/tasks/mgsm/utils.py
@@ -128,23 +128,25 @@ def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
            yaml_template = "cot_yaml"
            filter_list = {}
+            DELIMITER = None
            if mode == "direct":
                ANSWER = LANGUAGES[lang]["DIRECT"]
                REGEX = None
-                task_name = f"mgsm_{lang}_direct"
+                task_name = f"mgsm_direct_{lang}"
                yaml_template = "direct_yaml"
            elif mode == "native-cot":
                ANSWER = LANGUAGES[lang]["ANSWER"]
                REGEX = LANGUAGES[lang]["REGEX"]
-                task_name = f"mgsm_{lang}_native-cot"
+                task_name = f"mgsm_native_cot_{lang}"
                filter_list = add_regex_pattern(REGEX)
+                DELIMITER = "" if lang in ["zh", "ja"]
            elif mode == "en-cot":
                ANSWER = LANGUAGES["en"]["ANSWER"]
                REGEX = LANGUAGES["en"]["REGEX"]
-                task_name = f"mgsm_{lang}_en-cot"
+                task_name = f"mgsm_en_cot_{lang}"
            file_name = f"{task_name}.yaml"
+            ANSWER_TO_SKIP = len(LANGUAGES[lang]["ANSWER"])+1
            with open(
                f"{output_dir}/{file_name}", "w" if overwrite else "x", encoding="utf8"
            ) as f:
@@ -153,18 +155,19 @@ def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
                    {
                        "include": yaml_template,
                        "dataset_name": lang,
-                        "task": f"mgsm_{lang}_direct",
+                        "task": f"{task_name}",
                        "doc_to_text": f"""{{% if answer is not none %}}"""
                        f"""{{{{question+"\\n{ANSWER}"}}}}"""
                        f"""{{% else %}}"""
                        f"""{{{{"{QUESTION} "+question+"\\n{ANSWER}"}}}}"""
                        f"""{{% endif %}}""",
                        "doc_to_target": f"""{{% if answer is not none %}}"""
-                        f"""{{{{answer[{len(ANSWER)}+1]}}}}"""
+                        f"""{{{{answer[{ANSWER_TO_SKIP}:]}}}}"""
                        f"""{{% else %}}"""
                        f"""{{{{answer_number|string}}}}"""
                        f"""{{% endif %}}""",
                        **filter_list,
+                        **({"target_delimiter": DELIMITER} if DELIMITER else {}),
                    },
                    f,
                    allow_unicode=True,