Unverified Commit 147e9d61 authored by Baber Abbasi's avatar Baber Abbasi Committed by GitHub
Browse files

[longbench] fix metric calculation (#2983)

* use all answers

* use middle truncation

* maybe fix classification score

* strip classification preds

* [vllm] remove stop tokens post-hoc

* strip all preds

* pacify pre-commit

* start on truncation utility

* add to readme

* add a footgun doc

* fix newline in yaml templates

* do not strip code_sim preds!

* fix pre-commit config

* fix instruction warning

* add not to longbench readme
parent 9f152e0b
......@@ -6,15 +6,16 @@ dataset_path: THUDM/LongBench
test_split: test
dataset_name: triviaqa_e
doc_to_text: 'Answer the question based on the given passage. Only give me the answer and do not output any other words. The following are some examples.\n\n{{context}}\n\n{{input}}'
doc_to_target: '{{answers[0]}}'
doc_to_target: '{{answers}}'
process_results: !function metrics.get_qa_f1_score
generation_kwargs:
max_gen_toks: 32
temperature: 1
do_sample: True
until: ['\n']
until: ["\n"]
metric_list:
- metric: !function metrics.qa_f1_score
- metric: "qa_f1_score"
aggregation: mean
higher_is_better: True
metadata:
version: 2.0
version: 3.0
......@@ -6,15 +6,16 @@ dataset_path: THUDM/LongBench
test_split: test
dataset_name: vcsum
doc_to_text: '下面有一段会议记录,请你阅读后,写一段总结,总结会议的内容。\n会议记录:\n{{context}}\n\n会议总结:'
doc_to_target: '{{answers[0]}}'
doc_to_target: '{{answers}}'
process_results: !function metrics.get_rouge_zh_score
generation_kwargs:
max_gen_toks: 512
temperature: 1
do_sample: True
until: []
metric_list:
- metric: !function metrics.rouge_zh_score
- metric: "rouge_zh_score"
aggregation: mean
higher_is_better: True
metadata:
version: 2.0
version: 3.0
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment