Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
06d3406e
Commit
06d3406e
authored
Sep 04, 2023
by
lintangsutawika
Browse files
update
parent
f23ae748
Changes
129
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
20 additions
and
75 deletions
+20
-75
lm_eval/tasks/bbh/multistep_arithmetic_two.yaml
lm_eval/tasks/bbh/multistep_arithmetic_two.yaml
+0
-4
lm_eval/tasks/bbh/navigate.yaml
lm_eval/tasks/bbh/navigate.yaml
+0
-4
lm_eval/tasks/bbh/object_counting.yaml
lm_eval/tasks/bbh/object_counting.yaml
+0
-4
lm_eval/tasks/bbh/penguins_in_a_table.yaml
lm_eval/tasks/bbh/penguins_in_a_table.yaml
+0
-4
lm_eval/tasks/bbh/reasoning_about_colored_objects.yaml
lm_eval/tasks/bbh/reasoning_about_colored_objects.yaml
+0
-4
lm_eval/tasks/bbh/ruin_names.yaml
lm_eval/tasks/bbh/ruin_names.yaml
+0
-4
lm_eval/tasks/bbh/salient_translation_error_detection.yaml
lm_eval/tasks/bbh/salient_translation_error_detection.yaml
+0
-4
lm_eval/tasks/bbh/snarks.yaml
lm_eval/tasks/bbh/snarks.yaml
+0
-4
lm_eval/tasks/bbh/sports_understanding.yaml
lm_eval/tasks/bbh/sports_understanding.yaml
+0
-4
lm_eval/tasks/bbh/temporal_sequences.yaml
lm_eval/tasks/bbh/temporal_sequences.yaml
+0
-4
lm_eval/tasks/bbh/tracking_shuffled_objects_five_objects.yaml
...val/tasks/bbh/tracking_shuffled_objects_five_objects.yaml
+0
-4
lm_eval/tasks/bbh/tracking_shuffled_objects_seven_objects.yaml
...al/tasks/bbh/tracking_shuffled_objects_seven_objects.yaml
+0
-4
lm_eval/tasks/bbh/tracking_shuffled_objects_three_objects.yaml
...al/tasks/bbh/tracking_shuffled_objects_three_objects.yaml
+0
-4
lm_eval/tasks/bbh/web_of_lies.yaml
lm_eval/tasks/bbh/web_of_lies.yaml
+0
-4
lm_eval/tasks/bbh/word_sorting.yaml
lm_eval/tasks/bbh/word_sorting.yaml
+0
-4
lm_eval/tasks/mmlu/_generate_configs.py
lm_eval/tasks/mmlu/_generate_configs.py
+1
-1
lm_eval/tasks/mmlu/flan_n_shot/_mmlu_flan_generative_template_yaml
...asks/mmlu/flan_n_shot/_mmlu_flan_generative_template_yaml
+10
-7
lm_eval/tasks/mmlu/flan_n_shot/_mmlu_flan_loglikelihood_template_yaml
...s/mmlu/flan_n_shot/_mmlu_flan_loglikelihood_template_yaml
+7
-3
lm_eval/tasks/mmlu/flan_n_shot/mmlu_abstract_algebra.yaml
lm_eval/tasks/mmlu/flan_n_shot/mmlu_abstract_algebra.yaml
+1
-2
lm_eval/tasks/mmlu/flan_n_shot/mmlu_business_ethics.yaml
lm_eval/tasks/mmlu/flan_n_shot/mmlu_business_ethics.yaml
+1
-2
No files found.
lm_eval/tasks/bbh/multistep_arithmetic_two.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
multistep_arithmetic_two
include
:
_template_yaml
task
:
bbh_multistep_arithmetic_two
lm_eval/tasks/bbh/navigate.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
navigate
include
:
_template_yaml
task
:
bbh_navigate
lm_eval/tasks/bbh/object_counting.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
object_counting
include
:
_template_yaml
task
:
bbh_object_counting
lm_eval/tasks/bbh/penguins_in_a_table.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
penguins_in_a_table
include
:
_template_yaml
task
:
bbh_penguins_in_a_table
lm_eval/tasks/bbh/reasoning_about_colored_objects.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
reasoning_about_colored_objects
include
:
_template_yaml
task
:
bbh_reasoning_about_colored_objects
lm_eval/tasks/bbh/ruin_names.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
ruin_names
include
:
_template_yaml
task
:
bbh_ruin_names
lm_eval/tasks/bbh/salient_translation_error_detection.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
salient_translation_error_detection
include
:
_template_yaml
task
:
bbh_salient_translation_error_detection
lm_eval/tasks/bbh/snarks.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
snarks
include
:
_template_yaml
task
:
bbh_snarks
lm_eval/tasks/bbh/sports_understanding.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
sports_understanding
include
:
_template_yaml
task
:
bbh_sports_understanding
lm_eval/tasks/bbh/temporal_sequences.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
temporal_sequences
include
:
_template_yaml
task
:
bbh_temporal_sequences
lm_eval/tasks/bbh/tracking_shuffled_objects_five_objects.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
tracking_shuffled_objects_five_objects
include
:
_template_yaml
task
:
bbh_tracking_shuffled_objects_five_objects
lm_eval/tasks/bbh/tracking_shuffled_objects_seven_objects.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
tracking_shuffled_objects_seven_objects
include
:
_template_yaml
task
:
bbh_tracking_shuffled_objects_seven_objects
lm_eval/tasks/bbh/tracking_shuffled_objects_three_objects.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
tracking_shuffled_objects_three_objects
include
:
_template_yaml
task
:
bbh_tracking_shuffled_objects_three_objects
lm_eval/tasks/bbh/web_of_lies.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
web_of_lies
include
:
_template_yaml
task
:
bbh_web_of_lies
lm_eval/tasks/bbh/word_sorting.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
word_sorting
include
:
_template_yaml
task
:
bbh_word_sorting
lm_eval/tasks/mmlu/_generate_configs.py
View file @
06d3406e
...
@@ -115,4 +115,4 @@ if __name__ == "__main__":
...
@@ -115,4 +115,4 @@ if __name__ == "__main__":
file_save_path
=
args
.
save_prefix_path
+
f
"_
{
subject
}
.yaml"
file_save_path
=
args
.
save_prefix_path
+
f
"_
{
subject
}
.yaml"
eval_logger
.
info
(
f
"Saving yaml for subset
{
subject
}
to
{
file_save_path
}
"
)
eval_logger
.
info
(
f
"Saving yaml for subset
{
subject
}
to
{
file_save_path
}
"
)
with
open
(
file_save_path
,
"w"
)
as
yaml_file
:
with
open
(
file_save_path
,
"w"
)
as
yaml_file
:
yaml
.
dump
(
yaml_dict
,
yaml_file
)
yaml
.
dump
(
yaml_dict
,
yaml_file
,
width
=
float
(
"inf"
)
)
lm_eval/tasks/mmlu/flan_n_shot/_mmlu_flan_generative_template_yaml
View file @
06d3406e
group: mmlu_flan
group: mmlu_flan
dataset_path: cais/mmlu
dataset_path: cais/mmlu
validation_split: validation
# validation_split: validation
test_split: test
fewshot_split: dev
fewshot_split: dev
doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA:"
# doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA: "
doc_to_text: "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nAnswer:"
output_type: greedy_until
output_type: greedy_until
doc_to_target: "{{['(A)', '(B)', '(C)', '(D)'][answer]}}"
# doc_to_target: "{{['(A)', '(B)', '(C)', '(D)'][answer]}}"
doc_to_target: "{{['A', 'B', 'C', 'D'][answer]}}"
metric_list:
metric_list:
- metric: exact_match
- metric: exact_match
aggregation: mean
aggregation: mean
higher_is_better: true
higher_is_better: true
ignore_case: true
#
ignore_case: true
ignore_punctuation: true
#
ignore_punctuation: true
generation_kwargs:
generation_kwargs:
until:
until:
- "</s>"
- "</s>"
do_sample: false
# do_sample: false
temperature: 0.0
# temperature: 0.0
\ No newline at end of file
\ No newline at end of file
lm_eval/tasks/mmlu/flan_n_shot/_mmlu_flan_loglikelihood_template_yaml
View file @
06d3406e
group: mmlu_flan_loglikelihood
group: mmlu_flan_loglikelihood
dataset_path: cais/mmlu
dataset_path: cais/mmlu
validation_split: validation
# validation_split: validation
test_split: test
fewshot_split: dev
fewshot_split: dev
doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA:"
output_type: multiple_choice
output_type: multiple_choice
doc_to_choice: ['(A)', '(B)', '(C)', '(D)']
doc_to_text: "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nAnswer:"
doc_to_choice: ["A", "B", "C", "D"]
doc_to_target: answer
doc_to_target: answer
metric_list:
metric_list:
- metric: acc
- metric: acc
aggregation: mean
aggregation: mean
higher_is_better: true
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
\ No newline at end of file
lm_eval/tasks/mmlu/flan_n_shot/mmlu_abstract_algebra.yaml
View file @
06d3406e
dataset_name
:
abstract_algebra
dataset_name
:
abstract_algebra
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
abstract
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
abstract
algebra.
algebra.
'
'
...
...
lm_eval/tasks/mmlu/flan_n_shot/mmlu_business_ethics.yaml
View file @
06d3406e
dataset_name
:
business_ethics
dataset_name
:
business_ethics
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
business
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
business
ethics.
ethics.
'
'
...
...
Prev
1
2
3
4
5
6
7
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment