Unverified Commit cf51e699 authored by Yoonsoo Kim's avatar Yoonsoo Kim Committed by GitHub
Browse files

mmlu pro generation_kwargs until Q: -> Question: (#2945)



* mmlu pro generation_kwargs until Q: -> Question:

* pacify pre-commit

* change stop token

---------
Co-authored-by: default avatarBaber <baber@hey.com>
parent af8b87cc
......@@ -4,7 +4,7 @@ import os
import numpy as np
from metrics import (
classification_score,
# classification_score,
code_sim_score,
count_score,
qa_f1_score,
......@@ -29,10 +29,10 @@ dataset2metric = {
"qmsum": rouge_score,
"multi_news": rouge_score,
"vcsum": rouge_zh_score,
"trec": classification_score,
# "trec": classification_score,
"triviaqa": qa_f1_score,
"samsum": rouge_score,
"lsht": classification_score,
# "lsht": classification_score,
"passage_retrieval_en": retrieval_score,
"passage_count": count_score,
"passage_retrieval_zh": retrieval_zh_score,
......
......@@ -64,3 +64,5 @@ If other tasks on this dataset are already supported:
* Added one newline to task description(s) as per [reference implementation](https://github.com/TIGER-AI-Lab/MMLU-Pro/blob/47b9891aacb8bd7cda29d5c5ba17b9434dd333bc/evaluate_from_local.py#L93)
* (tasks, group) 2025-03-20 -- (version 2.0 --> version 2.1)
* Changed default max_length from 2048 to 8192 and max_gen_toks from 256 to 2048.
* (tasks, group) 2025-05-20 -- (version 2.1 --> version 3)
* changed stop sequence from "Q:" to "Question:" PR #2945
......@@ -17,9 +17,7 @@ filter_list:
- function: "take_first"
generation_kwargs:
until:
- "</s>"
- "Q:"
- "<|im_end|>"
- "Question:"
max_gen_toks: 2048
do_sample: false
temperature: 0.0
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment