Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
9fc24ab4
Commit
9fc24ab4
authored
Jul 03, 2024
by
haileyschoelkopf
Browse files
make explicit group configs for leaderboard and other newer tasks
parent
b03c7636
Changes
22
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
97 additions
and
15 deletions
+97
-15
lm_eval/tasks/arc_mt/arc_challenge_mt_fi.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_fi.yaml
+1
-1
lm_eval/tasks/basqueglue/bec.yaml
lm_eval/tasks/basqueglue/bec.yaml
+1
-1
lm_eval/tasks/basqueglue/bhtc.yaml
lm_eval/tasks/basqueglue/bhtc.yaml
+1
-1
lm_eval/tasks/basqueglue/coref.yaml
lm_eval/tasks/basqueglue/coref.yaml
+1
-1
lm_eval/tasks/basqueglue/qnli.yaml
lm_eval/tasks/basqueglue/qnli.yaml
+1
-1
lm_eval/tasks/basqueglue/vaxx.yaml
lm_eval/tasks/basqueglue/vaxx.yaml
+1
-1
lm_eval/tasks/basqueglue/wic.yaml
lm_eval/tasks/basqueglue/wic.yaml
+1
-1
lm_eval/tasks/bbh/cot_fewshot/_bbh.yaml
lm_eval/tasks/bbh/cot_fewshot/_bbh.yaml
+1
-1
lm_eval/tasks/bertaqa/_bertaqa_template
lm_eval/tasks/bertaqa/_bertaqa_template
+1
-1
lm_eval/tasks/inverse_scaling/_inverse_scaling_mc_yaml
lm_eval/tasks/inverse_scaling/_inverse_scaling_mc_yaml
+1
-1
lm_eval/tasks/inverse_scaling/_some_results
lm_eval/tasks/inverse_scaling/_some_results
+39
-0
lm_eval/tasks/leaderboard/bbh_mc/_fewshot_template_yaml
lm_eval/tasks/leaderboard/bbh_mc/_fewshot_template_yaml
+0
-1
lm_eval/tasks/leaderboard/bbh_mc/_leaderboard_bbh.yaml
lm_eval/tasks/leaderboard/bbh_mc/_leaderboard_bbh.yaml
+26
-0
lm_eval/tasks/leaderboard/gpqa/_leaderboard_gpqa.yaml
lm_eval/tasks/leaderboard/gpqa/_leaderboard_gpqa.yaml
+5
-0
lm_eval/tasks/leaderboard/gpqa/_template_yaml
lm_eval/tasks/leaderboard/gpqa/_template_yaml
+0
-1
lm_eval/tasks/leaderboard/ifeval/_leaderboard_instruction_following.yaml
...eaderboard/ifeval/_leaderboard_instruction_following.yaml
+3
-0
lm_eval/tasks/leaderboard/ifeval/ifeval.yaml
lm_eval/tasks/leaderboard/ifeval/ifeval.yaml
+0
-1
lm_eval/tasks/leaderboard/math/_leaderboard_math.yaml
lm_eval/tasks/leaderboard/math/_leaderboard_math.yaml
+9
-0
lm_eval/tasks/leaderboard/math/_template_yaml
lm_eval/tasks/leaderboard/math/_template_yaml
+0
-2
lm_eval/tasks/leaderboard/musr/_musr.yaml
lm_eval/tasks/leaderboard/musr/_musr.yaml
+5
-0
No files found.
lm_eval/tasks/arc_mt/arc_challenge_mt_fi.yaml
View file @
9fc24ab4
group
:
tag
:
-
arc_challenge_mt
-
arc_challenge_mt
task
:
arc_challenge_mt_fi
task
:
arc_challenge_mt_fi
dataset_path
:
LumiOpen/arc_challenge_mt
dataset_path
:
LumiOpen/arc_challenge_mt
...
...
lm_eval/tasks/basqueglue/bec.yaml
View file @
9fc24ab4
group
:
basque-glue
tag
:
basque-glue
task
:
bec2016eu
task
:
bec2016eu
dataset_path
:
orai-nlp/basqueGLUE
dataset_path
:
orai-nlp/basqueGLUE
dataset_name
:
bec
dataset_name
:
bec
...
...
lm_eval/tasks/basqueglue/bhtc.yaml
View file @
9fc24ab4
group
:
basque-glue
tag
:
basque-glue
task
:
bhtc_v2
task
:
bhtc_v2
dataset_path
:
orai-nlp/basqueGLUE
dataset_path
:
orai-nlp/basqueGLUE
dataset_name
:
bhtc
dataset_name
:
bhtc
...
...
lm_eval/tasks/basqueglue/coref.yaml
View file @
9fc24ab4
group
:
basque-glue
tag
:
basque-glue
task
:
epec_koref_bin
task
:
epec_koref_bin
dataset_path
:
orai-nlp/basqueGLUE
dataset_path
:
orai-nlp/basqueGLUE
dataset_name
:
coref
dataset_name
:
coref
...
...
lm_eval/tasks/basqueglue/qnli.yaml
View file @
9fc24ab4
group
:
basque-glue
tag
:
basque-glue
task
:
qnlieu
task
:
qnlieu
dataset_path
:
orai-nlp/basqueGLUE
dataset_path
:
orai-nlp/basqueGLUE
dataset_name
:
qnli
dataset_name
:
qnli
...
...
lm_eval/tasks/basqueglue/vaxx.yaml
View file @
9fc24ab4
group
:
basque-glue
tag
:
basque-glue
task
:
vaxx_stance
task
:
vaxx_stance
dataset_path
:
orai-nlp/basqueGLUE
dataset_path
:
orai-nlp/basqueGLUE
dataset_name
:
vaxx
dataset_name
:
vaxx
...
...
lm_eval/tasks/basqueglue/wic.yaml
View file @
9fc24ab4
group
:
basque-glue
tag
:
basque-glue
task
:
wiceu
task
:
wiceu
dataset_path
:
orai-nlp/basqueGLUE
dataset_path
:
orai-nlp/basqueGLUE
dataset_name
:
wic
dataset_name
:
wic
...
...
lm_eval/tasks/bbh/cot_fewshot/_bbh.yaml
View file @
9fc24ab4
...
@@ -5,7 +5,7 @@ task:
...
@@ -5,7 +5,7 @@ task:
-
bbh_cot_fewshot_date_understanding
-
bbh_cot_fewshot_date_understanding
-
bbh_cot_fewshot_disambiguation_qa
-
bbh_cot_fewshot_disambiguation_qa
-
bbh_cot_fewshot_dyck_languages
-
bbh_cot_fewshot_dyck_languages
-
bbh_cot_fewshot_formal_
languag
es
-
bbh_cot_fewshot_formal_
fallaci
es
-
bbh_cot_fewshot_geometric_shapes
-
bbh_cot_fewshot_geometric_shapes
-
bbh_cot_fewshot_hyperbaton
-
bbh_cot_fewshot_hyperbaton
-
bbh_cot_fewshot_logical_deduction_five_objects
-
bbh_cot_fewshot_logical_deduction_five_objects
...
...
lm_eval/tasks/bertaqa/_bertaqa_template
View file @
9fc24ab4
group
: bertaqa
tag
: bertaqa
dataset_path: HiTZ/BertaQA
dataset_path: HiTZ/BertaQA
dataset_name: null
dataset_name: null
validation_split: null
validation_split: null
...
...
lm_eval/tasks/inverse_scaling/_inverse_scaling_mc_yaml
View file @
9fc24ab4
group
:
tag
:
- inverse_scaling_mc
- inverse_scaling_mc
output_type: multiple_choice
output_type: multiple_choice
test_split: train
test_split: train
...
...
lm_eval/tasks/inverse_scaling/_some_results
0 → 100644
View file @
9fc24ab4
# | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
# |-------------------------------------------|-------|------|-----:|--------|-----:|---|-----:|
# | - inverse_scaling_hindsight_neglect_10shot| 0|none | 0|acc |0.4476|± |0.0281|
# | | |none | 0|acc_norm|0.4476|± |0.0281|
# |inverse_scaling_mc |N/A |none | 0|acc_norm|0.6273|± |0.0096|
# | | |none | 0|acc |0.6210|± |0.0095|
# | - inverse_scaling_neqa | 0|none | 0|acc |0.5300|± |0.0289|
# | | |none | 0|acc_norm|0.5300|± |0.0289|
# | - inverse_scaling_quote_repetition | 0|none | 0|acc |0.9367|± |0.0141|
# | | |none | 0|acc_norm|0.9367|± |0.0141|
# | - inverse_scaling_redefine_math | 0|none | 0|acc |0.7178|± |0.0150|
# | | |none | 0|acc_norm|0.7178|± |0.0150|
# | - inverse_scaling_winobias_antistereotype | 0|none | 0|acc |0.3786|± |0.0239|
# | | |none | 0|acc_norm|0.4126|± |0.0243|
# | Groups |Version|Filter|n-shot| Metric |Value | |Stderr|
# |------------------|-------|------|-----:|--------|-----:|---|-----:|
# |inverse_scaling_mc|N/A |none | 0|acc_norm|0.6273|± |0.0096|
# | | |none | 0|acc |0.6210|± |0.0095|
# hf (pretrained=facebook/opt-2.7b,add_bos_token=True,dtype=float32), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
# | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
# |-------------------------------------------|-------|------|-----:|--------|-----:|---|-----:|
# | - inverse_scaling_hindsight_neglect_10shot| 0|none | 0|acc |0.4476|± |0.0281|
# | | |none | 0|acc_norm|0.4476|± |0.0281|
# |inverse_scaling_mc |N/A |none | 0|acc_norm|0.6291|± |0.0095|
# | | |none | 0|acc |0.6219|± |0.0095|
# | - inverse_scaling_neqa | 0|none | 0|acc |0.5267|± |0.0289|
# | | |none | 0|acc_norm|0.5267|± |0.0289|
# | - inverse_scaling_quote_repetition | 0|none | 0|acc |0.9433|± |0.0134|
# | | |none | 0|acc_norm|0.9433|± |0.0134|
# | - inverse_scaling_redefine_math | 0|none | 0|acc |0.7200|± |0.0150|
# | | |none | 0|acc_norm|0.7200|± |0.0150|
# | - inverse_scaling_winobias_antistereotype | 0|none | 0|acc |0.3762|± |0.0239|
# | | |none | 0|acc_norm|0.4150|± |0.0243|
# | Groups |Version|Filter|n-shot| Metric |Value | |Stderr|
# |------------------|-------|------|-----:|--------|-----:|---|-----:|
# |inverse_scaling_mc|N/A |none | 0|acc_norm|0.6291|± |0.0095|
# | | |none | 0|acc |0.6219|± |0.0095|
lm_eval/tasks/leaderboard/bbh_mc/_fewshot_template_yaml
View file @
9fc24ab4
group: leaderboard_bbh
dataset_path: SaylorTwift/bbh
dataset_path: SaylorTwift/bbh
output_type: multiple_choice
output_type: multiple_choice
test_split: test
test_split: test
...
...
lm_eval/tasks/leaderboard/bbh_mc/_leaderboard_bbh.yaml
0 → 100644
View file @
9fc24ab4
group
:
leaderboard_bbh
task
:
-
leaderboard_bbh_boolean_expressions
-
leaderboard_bbh_causal_judgement
-
leaderboard_bbh_date_understanding
-
leaderboard_bbh_disambiguation_qa
-
leaderboard_bbh_formal_fallacies
-
leaderboard_bbh_geometric_shapes
-
leaderboard_bbh_hyperbaton
-
leaderboard_bbh_logical_deduction_five_objects
-
leaderboard_bbh_logical_deduction_seven_objects
-
leaderboard_bbh_logical_deduction_three_objects
-
leaderboard_bbh_movie_recommendation
-
leaderboard_bbh_navigate
-
leaderboard_bbh_object_counting
-
leaderboard_bbh_penguins_in_a_table
-
leaderboard_bbh_reasoning_about_colored_objects
-
leaderboard_bbh_ruin_names
-
leaderboard_bbh_salient_translation_error_detection
-
leaderboard_bbh_snarks
-
leaderboard_bbh_sports_understanding
-
leaderboard_bbh_temporal_sequences
-
leaderboard_bbh_tracking_shuffled_objects_five_objects
-
leaderboard_bbh_tracking_shuffled_objects_seven_objects
-
leaderboard_bbh_tracking_shuffled_objects_three_objects
-
leaderboard_bbh_web_of_lies
lm_eval/tasks/leaderboard/gpqa/_leaderboard_gpqa.yaml
0 → 100644
View file @
9fc24ab4
group
:
leaderboard_gpqa
task
:
-
leaderboard_gpqa_diamond
-
leaderboard_gpqa_extended
-
leaderboard_gpqa_main
lm_eval/tasks/leaderboard/gpqa/_template_yaml
View file @
9fc24ab4
dataset_path: Idavidrein/gpqa
dataset_path: Idavidrein/gpqa
group: leaderboard_gpqa
output_type: multiple_choice
output_type: multiple_choice
process_docs: !function utils.process_docs
process_docs: !function utils.process_docs
training_split: train
training_split: train
...
...
lm_eval/tasks/leaderboard/ifeval/_leaderboard_instruction_following.yaml
0 → 100644
View file @
9fc24ab4
group
:
leaderboard_instruction_following
task
:
-
leaderboard_ifeval
lm_eval/tasks/leaderboard/ifeval/ifeval.yaml
View file @
9fc24ab4
task
:
leaderboard_ifeval
task
:
leaderboard_ifeval
group
:
leaderboard_instruction_following
dataset_path
:
wis-k/instruction-following-eval
dataset_path
:
wis-k/instruction-following-eval
dataset_name
:
null
dataset_name
:
null
output_type
:
generate_until
output_type
:
generate_until
...
...
lm_eval/tasks/leaderboard/math/_leaderboard_math.yaml
0 → 100644
View file @
9fc24ab4
group
:
leaderboard_math_hard
task
:
-
leaderboard_math_algebra_hard
-
leaderboard_math_counting_and_prob_hard
-
leaderboard_math_geometry_hard
-
leaderboard_math_intermediate_algebra_hard
-
leaderboard_math_num_theory_hard
-
leaderboard_math_prealgebra_hard
-
leaderboard_math_precalculus_hard
lm_eval/tasks/leaderboard/math/_template_yaml
View file @
9fc24ab4
group:
- leaderboard_math_hard
dataset_path: lighteval/MATH-Hard
dataset_path: lighteval/MATH-Hard
process_docs: !function utils.process_docs
process_docs: !function utils.process_docs
output_type: generate_until
output_type: generate_until
...
...
lm_eval/tasks/leaderboard/musr/_musr.yaml
0 → 100644
View file @
9fc24ab4
group
:
leaderboard_musr
task
:
-
leaderboard_musr_murder_mysteries
-
leaderboard_musr_object_placements
-
leaderboard_musr_team_allocation
Prev
1
2
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment