Commit bec2b500 authored by nikuya3's avatar nikuya3
Browse files

Preprocess answer choices of hellaswag task

parent 8c653b93
...@@ -7,8 +7,8 @@ output_type: multiple_choice ...@@ -7,8 +7,8 @@ output_type: multiple_choice
training_split: train training_split: train
validation_split: validation validation_split: validation
test_split: null test_split: null
template_aliases: "{% set answer_choices = endings %}{% set gold = label %}" template_aliases: "{% set gold = label %}{% set answer_choices = endings|map('regex_replace', '\\[.*?\\]', '')|map('replace', ' [title]', '. ')|map('replace', ' ', ' ')|list %}"
doc_to_text: !function preprocess_hellaswag.doc_to_text doc_to_text: "{% set text = activity_label ~ ': ' ~ ctx_a ~ ' ' ~ ctx_b.capitalize() %}{{text|regex_replace('\\[.*?\\]', '')|replace(' [title]', '. ')|replace(' ', ' ')}}"
doc_to_target: "{{gold}}" doc_to_target: "{{gold}}"
metric_list: metric_list:
- metric: acc - metric: acc
......
import re
def preprocess(text):
text = text.strip()
# NOTE: Brackets are artifacts of the WikiHow dataset portion of HellaSwag.
text = text.replace(" [title]", ". ")
text = re.sub("\\[.*?\\]", "", text)
text = text.replace(" ", " ")
return text
def doc_to_text(doc):
ctx = doc["ctx_a"] + " " + doc["ctx_b"].capitalize()
query = preprocess(doc["activity_label"] + ": " + ctx)
return query
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment