Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
26bc3eab
Unverified
Commit
26bc3eab
authored
Oct 19, 2023
by
Lintang Sutawika
Committed by
GitHub
Oct 19, 2023
Browse files
Merge branch 'big-refactor' into model-written-eval
parents
0d701496
cf617ab1
Changes
381
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
26 additions
and
18 deletions
+26
-18
lm_eval/tasks/bigbench/generate_until/word_sorting.yaml
lm_eval/tasks/bigbench/generate_until/word_sorting.yaml
+4
-0
lm_eval/tasks/bigbench/generate_until/word_unscrambling.yaml
lm_eval/tasks/bigbench/generate_until/word_unscrambling.yaml
+4
-0
lm_eval/tasks/bigbench/greedy_until_template_yaml
lm_eval/tasks/bigbench/greedy_until_template_yaml
+1
-1
lm_eval/tasks/code_x_glue/code-text/go.yaml
lm_eval/tasks/code_x_glue/code-text/go.yaml
+1
-1
lm_eval/tasks/code_x_glue/code-text/java.yaml
lm_eval/tasks/code_x_glue/code-text/java.yaml
+1
-1
lm_eval/tasks/code_x_glue/code-text/javascript.yaml
lm_eval/tasks/code_x_glue/code-text/javascript.yaml
+1
-1
lm_eval/tasks/code_x_glue/code-text/php.yaml
lm_eval/tasks/code_x_glue/code-text/php.yaml
+1
-1
lm_eval/tasks/code_x_glue/code-text/python.yaml
lm_eval/tasks/code_x_glue/code-text/python.yaml
+1
-1
lm_eval/tasks/code_x_glue/code-text/ruby.yaml
lm_eval/tasks/code_x_glue/code-text/ruby.yaml
+1
-1
lm_eval/tasks/coqa/default.yaml
lm_eval/tasks/coqa/default.yaml
+1
-1
lm_eval/tasks/drop/default.yaml
lm_eval/tasks/drop/default.yaml
+1
-1
lm_eval/tasks/gsm8k/gsm8k-cot.yaml
lm_eval/tasks/gsm8k/gsm8k-cot.yaml
+1
-1
lm_eval/tasks/gsm8k/gsm8k.yaml
lm_eval/tasks/gsm8k/gsm8k.yaml
+1
-1
lm_eval/tasks/logiqa2/logieval.yaml
lm_eval/tasks/logiqa2/logieval.yaml
+1
-1
lm_eval/tasks/mgsm/direct/direct_yaml
lm_eval/tasks/mgsm/direct/direct_yaml
+1
-1
lm_eval/tasks/mgsm/en_cot/cot_yaml
lm_eval/tasks/mgsm/en_cot/cot_yaml
+1
-1
lm_eval/tasks/mgsm/native_cot/cot_yaml
lm_eval/tasks/mgsm/native_cot/cot_yaml
+1
-1
lm_eval/tasks/minerva_math/README.md
lm_eval/tasks/minerva_math/README.md
+1
-1
lm_eval/tasks/minerva_math/minerva_math_algebra.yaml
lm_eval/tasks/minerva_math/minerva_math_algebra.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/_mmlu_flan_cot_fewshot_template_yaml
...mlu/flan_cot_fewshot/_mmlu_flan_cot_fewshot_template_yaml
+1
-1
No files found.
lm_eval/tasks/bigbench/g
reedy
_until/word_sorting.yaml
→
lm_eval/tasks/bigbench/g
enerate
_until/word_sorting.yaml
View file @
26bc3eab
# Generated by utils.py
# Generated by utils.py
dataset_name
:
word_sorting_zero_shot
dataset_name
:
word_sorting_zero_shot
include
:
../g
reedy
_until_template_yaml
include
:
../g
enerate
_until_template_yaml
task
:
bigbench_word_sorting_g
reedy
_until
task
:
bigbench_word_sorting_g
enerate
_until
lm_eval/tasks/bigbench/g
reedy
_until/word_unscrambling.yaml
→
lm_eval/tasks/bigbench/g
enerate
_until/word_unscrambling.yaml
View file @
26bc3eab
# Generated by utils.py
# Generated by utils.py
dataset_name
:
word_unscrambling_zero_shot
dataset_name
:
word_unscrambling_zero_shot
include
:
../g
reedy
_until_template_yaml
include
:
../g
enerate
_until_template_yaml
task
:
bigbench_word_unscrambling_g
reedy
_until
task
:
bigbench_word_unscrambling_g
enerate
_until
lm_eval/tasks/bigbench/greedy_until_template_yaml
View file @
26bc3eab
group: bigbench
group: bigbench
dataset_path: bigbench # will switch to `hails/bigbench` when all tasks are pushed
dataset_path: bigbench # will switch to `hails/bigbench` when all tasks are pushed
output_type: g
reedy
_until
output_type: g
enerate
_until
dataset_kwargs:
dataset_kwargs:
# num_shots: 0 # TODO: num of shots for `bigbench` HF dataset should be controlled through this, not through the typical methods
# num_shots: 0 # TODO: num of shots for `bigbench` HF dataset should be controlled through this, not through the typical methods
# subtask_name: null
# subtask_name: null
...
...
lm_eval/tasks/code_x_glue/code-text/go.yaml
View file @
26bc3eab
...
@@ -5,7 +5,7 @@ dataset_path: CM/codexglue_code2text_go
...
@@ -5,7 +5,7 @@ dataset_path: CM/codexglue_code2text_go
training_split
:
train
training_split
:
train
validation_split
:
validation
validation_split
:
validation
test_split
:
test
test_split
:
test
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
generation_kwargs
:
generation_kwargs
:
num_beams
:
10
num_beams
:
10
max_length
:
128
max_length
:
128
...
...
lm_eval/tasks/code_x_glue/code-text/java.yaml
View file @
26bc3eab
...
@@ -5,7 +5,7 @@ dataset_path: CM/codexglue_code2text_java
...
@@ -5,7 +5,7 @@ dataset_path: CM/codexglue_code2text_java
training_split
:
train
training_split
:
train
validation_split
:
validation
validation_split
:
validation
test_split
:
test
test_split
:
test
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
generation_kwargs
:
generation_kwargs
:
num_beams
:
10
num_beams
:
10
max_length
:
128
max_length
:
128
...
...
lm_eval/tasks/code_x_glue/code-text/javascript.yaml
View file @
26bc3eab
...
@@ -5,7 +5,7 @@ dataset_path: CM/codexglue_code2text_javascript
...
@@ -5,7 +5,7 @@ dataset_path: CM/codexglue_code2text_javascript
training_split
:
train
training_split
:
train
validation_split
:
validation
validation_split
:
validation
test_split
:
test
test_split
:
test
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
generation_kwargs
:
generation_kwargs
:
num_beams
:
10
num_beams
:
10
max_length
:
128
max_length
:
128
...
...
lm_eval/tasks/code_x_glue/code-text/php.yaml
View file @
26bc3eab
...
@@ -5,7 +5,7 @@ dataset_path: CM/codexglue_code2text_php
...
@@ -5,7 +5,7 @@ dataset_path: CM/codexglue_code2text_php
training_split
:
train
training_split
:
train
validation_split
:
validation
validation_split
:
validation
test_split
:
test
test_split
:
test
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
generation_kwargs
:
generation_kwargs
:
num_beams
:
10
num_beams
:
10
max_length
:
128
max_length
:
128
...
...
lm_eval/tasks/code_x_glue/code-text/python.yaml
View file @
26bc3eab
...
@@ -5,7 +5,7 @@ dataset_path: CM/codexglue_code2text_python
...
@@ -5,7 +5,7 @@ dataset_path: CM/codexglue_code2text_python
training_split
:
train
training_split
:
train
validation_split
:
validation
validation_split
:
validation
test_split
:
test
test_split
:
test
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
generation_kwargs
:
generation_kwargs
:
num_beams
:
10
num_beams
:
10
max_length
:
128
max_length
:
128
...
...
lm_eval/tasks/code_x_glue/code-text/ruby.yaml
View file @
26bc3eab
...
@@ -5,7 +5,7 @@ dataset_path: CM/codexglue_code2text_ruby
...
@@ -5,7 +5,7 @@ dataset_path: CM/codexglue_code2text_ruby
training_split
:
train
training_split
:
train
validation_split
:
validation
validation_split
:
validation
test_split
:
test
test_split
:
test
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
generation_kwargs
:
generation_kwargs
:
num_beams
:
10
num_beams
:
10
max_length
:
128
max_length
:
128
...
...
lm_eval/tasks/coqa/default.yaml
View file @
26bc3eab
task
:
coqa
task
:
coqa
dataset_path
:
EleutherAI/coqa
dataset_path
:
EleutherAI/coqa
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
training_split
:
train
training_split
:
train
validation_split
:
validation
validation_split
:
validation
doc_to_text
:
!function
utils.doc_to_text
doc_to_text
:
!function
utils.doc_to_text
...
...
lm_eval/tasks/drop/default.yaml
View file @
26bc3eab
task
:
drop
task
:
drop
dataset_path
:
EleutherAI/drop
dataset_path
:
EleutherAI/drop
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
training_split
:
train
training_split
:
train
validation_split
:
validation
validation_split
:
validation
process_docs
:
!function
utils.process_docs
process_docs
:
!function
utils.process_docs
...
...
lm_eval/tasks/gsm8k/gsm8k-cot.yaml
View file @
26bc3eab
...
@@ -3,7 +3,7 @@ group:
...
@@ -3,7 +3,7 @@ group:
task
:
gsm8k_cot
task
:
gsm8k_cot
dataset_path
:
gsm8k
dataset_path
:
gsm8k
dataset_name
:
main
dataset_name
:
main
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
test_split
:
test
test_split
:
test
doc_to_text
:
"
Q:
There
are
15
trees
in
the
grove.
Grove
workers
will
plant
trees
in
the
grove
today.
After
they
are
done,
there
will
be
21
trees.
How
many
trees
did
the
grove
workers
plant
today?
\n\n
A:
There
are
15
trees
originally.
Then
there
were
21
trees
after
some
more
were
planted.
So
there
must
have
been
21
-
15
=
6.
The
answer
is
6.
\n\n\
doc_to_text
:
"
Q:
There
are
15
trees
in
the
grove.
Grove
workers
will
plant
trees
in
the
grove
today.
After
they
are
done,
there
will
be
21
trees.
How
many
trees
did
the
grove
workers
plant
today?
\n\n
A:
There
are
15
trees
originally.
Then
there
were
21
trees
after
some
more
were
planted.
So
there
must
have
been
21
-
15
=
6.
The
answer
is
6.
\n\n\
Q:
If
there
are
3
cars
in
the
parking
lot
and
2
more
cars
arrive,
how
many
cars
are
in
the
parking
lot?
\n\n
A:
There
are
originally
3
cars.
2
more
cars
arrive.
3
+
2
=
5.
The
answer
is
5.
\n\n\
Q:
If
there
are
3
cars
in
the
parking
lot
and
2
more
cars
arrive,
how
many
cars
are
in
the
parking
lot?
\n\n
A:
There
are
originally
3
cars.
2
more
cars
arrive.
3
+
2
=
5.
The
answer
is
5.
\n\n\
...
...
lm_eval/tasks/gsm8k/gsm8k.yaml
View file @
26bc3eab
...
@@ -3,7 +3,7 @@ group:
...
@@ -3,7 +3,7 @@ group:
task
:
gsm8k_yaml
task
:
gsm8k_yaml
dataset_path
:
gsm8k
dataset_path
:
gsm8k
dataset_name
:
main
dataset_name
:
main
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
training_split
:
train
training_split
:
train
fewshot_split
:
train
fewshot_split
:
train
test_split
:
test
test_split
:
test
...
...
lm_eval/tasks/logiqa2/logieval.yaml
View file @
26bc3eab
task
:
logieval
task
:
logieval
dataset_path
:
baber/logiqa2
dataset_path
:
baber/logiqa2
dataset_name
:
logieval
dataset_name
:
logieval
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
training_split
:
train
training_split
:
train
test_split
:
test
test_split
:
test
# Instructions + {content}
# Instructions + {content}
...
...
lm_eval/tasks/mgsm/direct/direct_yaml
View file @
26bc3eab
...
@@ -4,7 +4,7 @@
...
@@ -4,7 +4,7 @@
group: mgsm_direct
group: mgsm_direct
dataset_path: juletxara/mgsm
dataset_path: juletxara/mgsm
dataset_name: null # Overridden by language-specific config.
dataset_name: null # Overridden by language-specific config.
output_type: g
reedy
_until
output_type: g
enerate
_until
training_split: train
training_split: train
test_split: test
test_split: test
target_delimiter: ""
target_delimiter: ""
...
...
lm_eval/tasks/mgsm/en_cot/cot_yaml
View file @
26bc3eab
...
@@ -4,7 +4,7 @@
...
@@ -4,7 +4,7 @@
group: mgsm_cot_native
group: mgsm_cot_native
dataset_path: juletxara/mgsm
dataset_path: juletxara/mgsm
dataset_name: null # Overridden by language-specific config.
dataset_name: null # Overridden by language-specific config.
output_type: g
reedy
_until
output_type: g
enerate
_until
training_split: train
training_split: train
test_split: test
test_split: test
target_delimiter: ""
target_delimiter: ""
...
...
lm_eval/tasks/mgsm/native_cot/cot_yaml
View file @
26bc3eab
...
@@ -4,7 +4,7 @@
...
@@ -4,7 +4,7 @@
group: mgsm_cot_native
group: mgsm_cot_native
dataset_path: juletxara/mgsm
dataset_path: juletxara/mgsm
dataset_name: null # Overridden by language-specific config.
dataset_name: null # Overridden by language-specific config.
output_type: g
reedy
_until
output_type: g
enerate
_until
training_split: train
training_split: train
test_split: test
test_split: test
target_delimiter: ""
target_delimiter: ""
...
...
lm_eval/tasks/minerva_math/README.md
View file @
26bc3eab
...
@@ -37,7 +37,7 @@ Eprint = {arXiv:2206.14858},
...
@@ -37,7 +37,7 @@ Eprint = {arXiv:2206.14858},
#### Groups
#### Groups
-
`math_word_problems`
-
`math_word_problems`
-
`g
reedy
_until`
-
`g
enerate
_until`
#### Tasks
#### Tasks
...
...
lm_eval/tasks/minerva_math/minerva_math_algebra.yaml
View file @
26bc3eab
...
@@ -4,7 +4,7 @@ task: minerva_math_algebra
...
@@ -4,7 +4,7 @@ task: minerva_math_algebra
dataset_path
:
EleutherAI/hendrycks_math
dataset_path
:
EleutherAI/hendrycks_math
process_docs
:
!function
utils.process_docs
process_docs
:
!function
utils.process_docs
dataset_name
:
algebra
dataset_name
:
algebra
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
training_split
:
train
training_split
:
train
test_split
:
test
test_split
:
test
doc_to_text
:
!function
utils.doc_to_text
doc_to_text
:
!function
utils.doc_to_text
...
...
lm_eval/tasks/mmlu/flan_cot_fewshot/_mmlu_flan_cot_fewshot_template_yaml
View file @
26bc3eab
...
@@ -2,7 +2,7 @@ group: mmlu_flan_cot_fewshot
...
@@ -2,7 +2,7 @@ group: mmlu_flan_cot_fewshot
dataset_path: cais/mmlu
dataset_path: cais/mmlu
validation_split: validation
validation_split: validation
fewshot_split: dev
fewshot_split: dev
output_type: g
reedy
_until
output_type: g
enerate
_until
doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA: Let's think step by step."
doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA: Let's think step by step."
doc_to_target: "{{['(A)', '(B)', '(C)', '(D)'][answer]}}"
doc_to_target: "{{['(A)', '(B)', '(C)', '(D)'][answer]}}"
filter_list:
filter_list:
...
...
Prev
1
…
13
14
15
16
17
18
19
20
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment