Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
d684b9eb
Commit
d684b9eb
authored
Dec 18, 2024
by
Baber
Browse files
fix do_sample
parent
adbfcce1
Changes
33
Show whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
22 additions
and
22 deletions
+22
-22
lm_eval/tasks/longbench/2wikimqa.yaml
lm_eval/tasks/longbench/2wikimqa.yaml
+1
-1
lm_eval/tasks/longbench/2wikimqa_e.yaml
lm_eval/tasks/longbench/2wikimqa_e.yaml
+1
-1
lm_eval/tasks/longbench/dureader.yaml
lm_eval/tasks/longbench/dureader.yaml
+1
-1
lm_eval/tasks/longbench/gov_report.yaml
lm_eval/tasks/longbench/gov_report.yaml
+1
-1
lm_eval/tasks/longbench/gov_report_e.yaml
lm_eval/tasks/longbench/gov_report_e.yaml
+1
-1
lm_eval/tasks/longbench/hotpotqa.yaml
lm_eval/tasks/longbench/hotpotqa.yaml
+1
-1
lm_eval/tasks/longbench/hotpotqa_e.yaml
lm_eval/tasks/longbench/hotpotqa_e.yaml
+3
-3
lm_eval/tasks/longbench/lcc.yaml
lm_eval/tasks/longbench/lcc.yaml
+1
-1
lm_eval/tasks/longbench/lcc_e.yaml
lm_eval/tasks/longbench/lcc_e.yaml
+1
-1
lm_eval/tasks/longbench/lsht.yaml
lm_eval/tasks/longbench/lsht.yaml
+1
-1
lm_eval/tasks/longbench/multi_news.yaml
lm_eval/tasks/longbench/multi_news.yaml
+1
-1
lm_eval/tasks/longbench/multi_news_e.yaml
lm_eval/tasks/longbench/multi_news_e.yaml
+1
-1
lm_eval/tasks/longbench/multifieldqa_en.yaml
lm_eval/tasks/longbench/multifieldqa_en.yaml
+1
-1
lm_eval/tasks/longbench/multifieldqa_en_e.yaml
lm_eval/tasks/longbench/multifieldqa_en_e.yaml
+1
-1
lm_eval/tasks/longbench/multifieldqa_zh.yaml
lm_eval/tasks/longbench/multifieldqa_zh.yaml
+1
-1
lm_eval/tasks/longbench/musique.yaml
lm_eval/tasks/longbench/musique.yaml
+1
-1
lm_eval/tasks/longbench/narrativeqa.yaml
lm_eval/tasks/longbench/narrativeqa.yaml
+1
-1
lm_eval/tasks/longbench/passage_count.yaml
lm_eval/tasks/longbench/passage_count.yaml
+1
-1
lm_eval/tasks/longbench/passage_count_e.yaml
lm_eval/tasks/longbench/passage_count_e.yaml
+1
-1
lm_eval/tasks/longbench/passage_retrieval_en.yaml
lm_eval/tasks/longbench/passage_retrieval_en.yaml
+1
-1
No files found.
lm_eval/tasks/longbench/2wikimqa.yaml
View file @
d684b9eb
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
32
max_gen_toks
:
32
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.qa_f1_score
-
metric
:
!function
metrics.qa_f1_score
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/longbench/2wikimqa_e.yaml
View file @
d684b9eb
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
32
max_gen_toks
:
32
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.qa_f1_score
-
metric
:
!function
metrics.qa_f1_score
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/longbench/dureader.yaml
View file @
d684b9eb
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
128
max_gen_toks
:
128
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.rouge_zh_score
-
metric
:
!function
metrics.rouge_zh_score
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/longbench/gov_report.yaml
View file @
d684b9eb
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
512
max_gen_toks
:
512
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.rouge_score
-
metric
:
!function
metrics.rouge_score
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/longbench/gov_report_e.yaml
View file @
d684b9eb
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
512
max_gen_toks
:
512
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.rouge_score
-
metric
:
!function
metrics.rouge_score
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/longbench/hotpotqa.yaml
View file @
d684b9eb
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
32
max_gen_toks
:
32
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.qa_f1_score
-
metric
:
!function
metrics.qa_f1_score
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/longbench/hotpotqa_e.yaml
View file @
d684b9eb
...
@@ -6,14 +6,14 @@ dataset_path: THUDM/LongBench
...
@@ -6,14 +6,14 @@ dataset_path: THUDM/LongBench
test_split
:
test
test_split
:
test
dataset_name
:
hotpotqa_e
dataset_name
:
hotpotqa_e
doc_to_text
:
'
Answer
the
question
based
on
the
given
passages.
Only
give
me
the
answer
and
do
not
output
any
other
words.\n\nThe
following
are
given
passages.\n{{context}}\n\nAnswer
the
question
based
on
the
given
passages.
Only
give
me
the
answer
and
do
not
output
any
other
words.\n\nQuestion:
{{input}}\nAnswer:'
doc_to_text
:
'
Answer
the
question
based
on
the
given
passages.
Only
give
me
the
answer
and
do
not
output
any
other
words.\n\nThe
following
are
given
passages.\n{{context}}\n\nAnswer
the
question
based
on
the
given
passages.
Only
give
me
the
answer
and
do
not
output
any
other
words.\n\nQuestion:
{{input}}\nAnswer:'
doc_to_target
:
"
{{answers}}
"
doc_to_target
:
'
{{answers}}
'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
32
max_gen_toks
:
32
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.qa_f1_score
-
metric
:
!function
metrics.qa_f1_score
aggregation
:
mean
aggregation
:
mean
higher_is_better
:
t
rue
higher_is_better
:
T
rue
metadata
:
metadata
:
version
:
1.0
version
:
1.0
lm_eval/tasks/longbench/lcc.yaml
View file @
d684b9eb
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
64
max_gen_toks
:
64
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.code_sim_score
-
metric
:
!function
metrics.code_sim_score
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/longbench/lcc_e.yaml
View file @
d684b9eb
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
64
max_gen_toks
:
64
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.code_sim_score
-
metric
:
!function
metrics.code_sim_score
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/longbench/lsht.yaml
View file @
d684b9eb
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
64
max_gen_toks
:
64
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.classification_score
-
metric
:
!function
metrics.classification_score
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/longbench/multi_news.yaml
View file @
d684b9eb
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
512
max_gen_toks
:
512
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.rouge_score
-
metric
:
!function
metrics.rouge_score
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/longbench/multi_news_e.yaml
View file @
d684b9eb
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
512
max_gen_toks
:
512
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.rouge_score
-
metric
:
!function
metrics.rouge_score
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/longbench/multifieldqa_en.yaml
View file @
d684b9eb
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
64
max_gen_toks
:
64
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.qa_f1_score
-
metric
:
!function
metrics.qa_f1_score
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/longbench/multifieldqa_en_e.yaml
View file @
d684b9eb
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
64
max_gen_toks
:
64
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.qa_f1_score
-
metric
:
!function
metrics.qa_f1_score
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/longbench/multifieldqa_zh.yaml
View file @
d684b9eb
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
64
max_gen_toks
:
64
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.qa_f1_zh_score
-
metric
:
!function
metrics.qa_f1_zh_score
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/longbench/musique.yaml
View file @
d684b9eb
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
32
max_gen_toks
:
32
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.qa_f1_score
-
metric
:
!function
metrics.qa_f1_score
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/longbench/narrativeqa.yaml
View file @
d684b9eb
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
128
max_gen_toks
:
128
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.qa_f1_score
-
metric
:
!function
metrics.qa_f1_score
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/longbench/passage_count.yaml
View file @
d684b9eb
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
32
max_gen_toks
:
32
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.count_score
-
metric
:
!function
metrics.count_score
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/longbench/passage_count_e.yaml
View file @
d684b9eb
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
32
max_gen_toks
:
32
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.count_score
-
metric
:
!function
metrics.count_score
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/longbench/passage_retrieval_en.yaml
View file @
d684b9eb
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
...
@@ -10,7 +10,7 @@ doc_to_target: '{{answers}}'
generation_kwargs
:
generation_kwargs
:
max_gen_toks
:
32
max_gen_toks
:
32
temperature
:
1
temperature
:
1
do_sample
:
Fals
e
do_sample
:
Tru
e
metric_list
:
metric_list
:
-
metric
:
!function
metrics.retrieval_score
-
metric
:
!function
metrics.retrieval_score
aggregation
:
mean
aggregation
:
mean
...
...
Prev
1
2
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment