Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
835cc40e
"model.properties" did not exist on "17de423932b9d518363ba909b501497e6cd79626"
Commit
835cc40e
authored
Dec 06, 2023
by
lintangsutawika
Browse files
merged latest and added altworld files
parents
8da401e0
c9bbec6e
Changes
430
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
40 additions
and
4 deletions
+40
-4
lm_eval/tasks/hendrycks_ethics/utilitarianism_original_yaml
lm_eval/tasks/hendrycks_ethics/utilitarianism_original_yaml
+2
-0
lm_eval/tasks/hendrycks_ethics/virtue.yaml
lm_eval/tasks/hendrycks_ethics/virtue.yaml
+2
-0
lm_eval/tasks/lambada/lambada_openai.yaml
lm_eval/tasks/lambada/lambada_openai.yaml
+2
-0
lm_eval/tasks/lambada/lambada_standard.yaml
lm_eval/tasks/lambada/lambada_standard.yaml
+2
-0
lm_eval/tasks/lambada_cloze/lambada_openai_cloze.yaml
lm_eval/tasks/lambada_cloze/lambada_openai_cloze.yaml
+2
-0
lm_eval/tasks/lambada_cloze/lambada_standard_cloze.yaml
lm_eval/tasks/lambada_cloze/lambada_standard_cloze.yaml
+2
-0
lm_eval/tasks/lambada_multilingual/lambada_mt_en.yaml
lm_eval/tasks/lambada_multilingual/lambada_mt_en.yaml
+2
-0
lm_eval/tasks/logiqa/logiqa.yaml
lm_eval/tasks/logiqa/logiqa.yaml
+2
-0
lm_eval/tasks/logiqa2/logieval.yaml
lm_eval/tasks/logiqa2/logieval.yaml
+2
-0
lm_eval/tasks/logiqa2/logiqa2.yaml
lm_eval/tasks/logiqa2/logiqa2.yaml
+2
-0
lm_eval/tasks/mathqa/alternative_worlds/_mathqa_alt_yaml
lm_eval/tasks/mathqa/alternative_worlds/_mathqa_alt_yaml
+3
-0
lm_eval/tasks/mathqa/mathqa.yaml
lm_eval/tasks/mathqa/mathqa.yaml
+2
-0
lm_eval/tasks/mc_taco/default.yaml
lm_eval/tasks/mc_taco/default.yaml
+2
-0
lm_eval/tasks/mgsm/direct/direct_yaml
lm_eval/tasks/mgsm/direct/direct_yaml
+2
-0
lm_eval/tasks/mgsm/en_cot/cot_yaml
lm_eval/tasks/mgsm/en_cot/cot_yaml
+2
-0
lm_eval/tasks/mgsm/native_cot/cot_yaml
lm_eval/tasks/mgsm/native_cot/cot_yaml
+2
-0
lm_eval/tasks/minerva_math/minerva_math_algebra.yaml
lm_eval/tasks/minerva_math/minerva_math_algebra.yaml
+2
-0
lm_eval/tasks/mmlu/default/_default_template_yaml
lm_eval/tasks/mmlu/default/_default_template_yaml
+2
-3
lm_eval/tasks/mmlu/flan_cot_fewshot/_cot_prompts.json
lm_eval/tasks/mmlu/flan_cot_fewshot/_cot_prompts.json
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/_mmlu_flan_cot_fewshot_template_yaml
...mlu/flan_cot_fewshot/_mmlu_flan_cot_fewshot_template_yaml
+2
-0
No files found.
lm_eval/tasks/hendrycks_ethics/utilitarianism_original_yaml
View file @
835cc40e
...
...
@@ -12,3 +12,5 @@
# metric_list:
# - metric: acc
# TODO: we want this to be implemented as a winograd_schema task type, actually
# metadata:
# - version: 1.0
lm_eval/tasks/hendrycks_ethics/virtue.yaml
View file @
835cc40e
...
...
@@ -6,3 +6,5 @@ dataset_name: virtue
doc_to_text
:
"
Sentence:
{{scenario}}
\n
Question:
Does
the
character
in
this
sentence
exhibit
the
trait
\"
{{trait}}
\"
?
\n
Answer:"
doc_to_target
:
label
doc_to_choice
:
[
'
no'
,
'
yes'
]
metadata
:
-
version
:
1.0
lm_eval/tasks/lambada/lambada_openai.yaml
View file @
835cc40e
...
...
@@ -16,3 +16,5 @@ metric_list:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
metadata
:
-
version
:
1.0
lm_eval/tasks/lambada/lambada_standard.yaml
View file @
835cc40e
...
...
@@ -17,3 +17,5 @@ metric_list:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
metadata
:
-
version
:
1.0
lm_eval/tasks/lambada_cloze/lambada_openai_cloze.yaml
View file @
835cc40e
...
...
@@ -16,3 +16,5 @@ metric_list:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
metadata
:
-
version
:
1.0
lm_eval/tasks/lambada_cloze/lambada_standard_cloze.yaml
View file @
835cc40e
...
...
@@ -17,3 +17,5 @@ metric_list:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
metadata
:
-
version
:
1.0
lm_eval/tasks/lambada_multilingual/lambada_mt_en.yaml
View file @
835cc40e
...
...
@@ -16,3 +16,5 @@ metric_list:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
metadata
:
-
version
:
1.0
lm_eval/tasks/logiqa/logiqa.yaml
View file @
835cc40e
...
...
@@ -17,3 +17,5 @@ metric_list:
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
-
version
:
1.0
lm_eval/tasks/logiqa2/logieval.yaml
View file @
835cc40e
...
...
@@ -23,3 +23,5 @@ filter_list:
# https://github.com/openai/evals/blob/305b237cdb3884c7ddb6a5d12cb184a83551fcba/evals/api.py#L84
regex_pattern
:
"
^
\\
s*([A-D])"
-
function
:
"
take_first"
metadata
:
-
version
:
0.0
lm_eval/tasks/logiqa2/logiqa2.yaml
View file @
835cc40e
...
...
@@ -17,3 +17,5 @@ metric_list:
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
-
version
:
0.0
lm_eval/tasks/mathqa/alternative_worlds/_mathqa_alt_yaml
View file @
835cc40e
...
...
@@ -8,5 +8,8 @@ metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
- metric: brier_score
higher_is_better: false
lm_eval/tasks/mathqa/mathqa.yaml
View file @
835cc40e
...
...
@@ -18,3 +18,5 @@ metric_list:
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
-
version
:
1.0
lm_eval/tasks/mc_taco/default.yaml
View file @
835cc40e
...
...
@@ -11,3 +11,5 @@ doc_to_decontamination_query: "{{question}} {{sentence}}"
metric_list
:
-
metric
:
acc
-
metric
:
f1
metadata
:
-
version
:
1.0
lm_eval/tasks/mgsm/direct/direct_yaml
View file @
835cc40e
...
...
@@ -25,3 +25,5 @@ metric_list:
higher_is_better: true
ignore_case: true
ignore_punctuation: true
metadata:
- version: 0.0
lm_eval/tasks/mgsm/en_cot/cot_yaml
View file @
835cc40e
...
...
@@ -27,3 +27,5 @@ filter_list:
- function: "regex"
regex_pattern: "The answer is (\\-?[0-9\\.\\,]+)"
- function: "take_first"
metadata:
- version: 0.0
lm_eval/tasks/mgsm/native_cot/cot_yaml
View file @
835cc40e
...
...
@@ -27,3 +27,5 @@ filter_list:
- function: "regex"
regex_pattern: "The answer is (\\-?[0-9\\.\\,]+)"
- function: "take_first"
metadata:
- version: 1.0
lm_eval/tasks/minerva_math/minerva_math_algebra.yaml
View file @
835cc40e
...
...
@@ -19,3 +19,5 @@ metric_list:
-
metric
:
exact_match
aggregation
:
mean
higher_is_better
:
true
metadata
:
-
version
:
0.0
lm_eval/tasks/mmlu/default/_default_template_yaml
View file @
835cc40e
...
...
@@ -11,6 +11,5 @@ metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: brier_score
aggregation: mean
higher_is_better: false
metadata:
- version: 0.0
lm_eval/tasks/mmlu/flan_cot_fewshot/_cot_prompts.json
View file @
835cc40e
This source diff could not be displayed because it is too large. You can
view the blob
instead.
lm_eval/tasks/mmlu/flan_cot_fewshot/_mmlu_flan_cot_fewshot_template_yaml
View file @
835cc40e
...
...
@@ -21,3 +21,5 @@ metric_list:
higher_is_better: true
ignore_case: true
ignore_punctuation: true
metadata:
- version: 0.0
Prev
1
…
11
12
13
14
15
16
17
18
19
…
22
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment