Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
574e565a
Unverified
Commit
574e565a
authored
Nov 10, 2023
by
Lintang Sutawika
Committed by
GitHub
Nov 10, 2023
Browse files
Merge branch 'big-refactor' into verbosity-rework
parents
73f3029c
b7a4ea06
Changes
498
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
117 additions
and
6 deletions
+117
-6
lm_eval/tasks/mmlu/flan_n_shot/loglikelihood/mmlu_security_studies.yaml
...mmlu/flan_n_shot/loglikelihood/mmlu_security_studies.yaml
+6
-0
lm_eval/tasks/mmlu/flan_n_shot/loglikelihood/mmlu_sociology.yaml
.../tasks/mmlu/flan_n_shot/loglikelihood/mmlu_sociology.yaml
+4
-2
lm_eval/tasks/mmlu/flan_n_shot/loglikelihood/mmlu_us_foreign_policy.yaml
...mlu/flan_n_shot/loglikelihood/mmlu_us_foreign_policy.yaml
+6
-0
lm_eval/tasks/mmlu/flan_n_shot/loglikelihood/mmlu_virology.yaml
...l/tasks/mmlu/flan_n_shot/loglikelihood/mmlu_virology.yaml
+4
-2
lm_eval/tasks/mmlu/flan_n_shot/loglikelihood/mmlu_world_religions.yaml
.../mmlu/flan_n_shot/loglikelihood/mmlu_world_religions.yaml
+4
-2
lm_eval/tasks/model_written_evals/advanced_ai_risk/_generate_configs.py
...model_written_evals/advanced_ai_risk/_generate_configs.py
+29
-0
lm_eval/tasks/model_written_evals/advanced_ai_risk/_template_yaml
...tasks/model_written_evals/advanced_ai_risk/_template_yaml
+12
-0
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-coordinate-itself.yaml
...ten_evals/advanced_ai_risk/fewshot-coordinate-itself.yaml
+4
-0
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-coordinate-other-ais.yaml
..._evals/advanced_ai_risk/fewshot-coordinate-other-ais.yaml
+4
-0
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-coordinate-other-versions.yaml
...s/advanced_ai_risk/fewshot-coordinate-other-versions.yaml
+4
-0
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-corrigible-less-HHH.yaml
...n_evals/advanced_ai_risk/fewshot-corrigible-less-HHH.yaml
+4
-0
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-corrigible-more-HHH.yaml
...n_evals/advanced_ai_risk/fewshot-corrigible-more-HHH.yaml
+4
-0
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-corrigible-neutral-HHH.yaml
...vals/advanced_ai_risk/fewshot-corrigible-neutral-HHH.yaml
+4
-0
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-myopic-reward.yaml
...written_evals/advanced_ai_risk/fewshot-myopic-reward.yaml
+4
-0
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-one-box-tendency.yaml
...tten_evals/advanced_ai_risk/fewshot-one-box-tendency.yaml
+4
-0
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-power-seeking-inclination.yaml
...s/advanced_ai_risk/fewshot-power-seeking-inclination.yaml
+4
-0
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-self-awareness-general-ai.yaml
...s/advanced_ai_risk/fewshot-self-awareness-general-ai.yaml
+4
-0
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-self-awareness-good-text-model.yaml
...anced_ai_risk/fewshot-self-awareness-good-text-model.yaml
+4
-0
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-self-awareness-text-model.yaml
...s/advanced_ai_risk/fewshot-self-awareness-text-model.yaml
+4
-0
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-self-awareness-training-architecture.yaml
...ai_risk/fewshot-self-awareness-training-architecture.yaml
+4
-0
No files found.
lm_eval/tasks/mmlu/flan_n_shot/mmlu
_log
_security_studies.yaml
→
lm_eval/tasks/mmlu/flan_n_shot/
loglikelihood/
mmlu_security_studies.yaml
View file @
574e565a
"
dataset_name"
:
"
security_studies"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
security
studies.
\n\n
"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
security
\
\
studies.
\n\n
"
"
group"
:
"
mmlu_flan_n_shot_generative_social_sciences"
"
include"
:
"
_mmlu_flan_loglikelihood_template_yaml"
"
task"
:
"
mmlu_flan_n_shot_
loglikelihood
_security_studies"
"
task"
:
"
mmlu_flan_n_shot_
generative
_security_studies"
lm_eval/tasks/mmlu/flan_n_shot/mmlu
_log
_sociology.yaml
→
lm_eval/tasks/mmlu/flan_n_shot/
loglikelihood/
mmlu_sociology.yaml
View file @
574e565a
"
dataset_name"
:
"
sociology"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
sociology.
\n\n
"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
sociology.
\n\
\n
"
"
group"
:
"
mmlu_flan_n_shot_generative_social_sciences"
"
include"
:
"
_mmlu_flan_loglikelihood_template_yaml"
"
task"
:
"
mmlu_flan_n_shot_
loglikelihood
_sociology"
"
task"
:
"
mmlu_flan_n_shot_
generative
_sociology"
lm_eval/tasks/mmlu/flan_n_shot/mmlu
_log
_us_foreign_policy.yaml
→
lm_eval/tasks/mmlu/flan_n_shot/
loglikelihood/
mmlu_us_foreign_policy.yaml
View file @
574e565a
"
dataset_name"
:
"
us_foreign_policy"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
us
foreign
policy.
\n\n
"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
us
\
\
foreign
policy.
\n\n
"
"
group"
:
"
mmlu_flan_n_shot_generative_social_sciences"
"
include"
:
"
_mmlu_flan_loglikelihood_template_yaml"
"
task"
:
"
mmlu_flan_n_shot_
loglikelihood
_us_foreign_policy"
"
task"
:
"
mmlu_flan_n_shot_
generative
_us_foreign_policy"
lm_eval/tasks/mmlu/flan_n_shot/mmlu
_log
_virology.yaml
→
lm_eval/tasks/mmlu/flan_n_shot/
loglikelihood/
mmlu_virology.yaml
View file @
574e565a
"
dataset_name"
:
"
virology"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
virology.
\n\n
"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
virology.
\n\
\n
"
"
group"
:
"
mmlu_flan_n_shot_generative_other"
"
include"
:
"
_mmlu_flan_loglikelihood_template_yaml"
"
task"
:
"
mmlu_flan_n_shot_
loglikelihood
_virology"
"
task"
:
"
mmlu_flan_n_shot_
generative
_virology"
lm_eval/tasks/mmlu/flan_n_shot/mmlu
_log
_world_religions.yaml
→
lm_eval/tasks/mmlu/flan_n_shot/
loglikelihood/
mmlu_world_religions.yaml
View file @
574e565a
"
dataset_name"
:
"
world_religions"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
world
religions.
\n\n
"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
world
\
\
religions.
\n\n
"
"
group"
:
"
mmlu_flan_n_shot_generative_humanities"
"
include"
:
"
_mmlu_flan_loglikelihood_template_yaml"
"
task"
:
"
mmlu_flan_n_shot_
loglikelihood
_world_religions"
"
task"
:
"
mmlu_flan_n_shot_
generative
_world_religions"
lm_eval/tasks/model_written_evals/advanced_ai_risk/_generate_configs.py
0 → 100644
View file @
574e565a
import
yaml
import
inspect
import
datasets
from
tqdm
import
tqdm
def
main
()
->
None
:
dataset_path
=
"EleutherAI/advanced_ai_risk"
for
task
in
tqdm
(
datasets
.
get_dataset_infos
(
dataset_path
).
keys
()):
file_name
=
f
"
{
task
}
.yaml"
try
:
with
open
(
f
"
{
file_name
}
"
,
"w"
)
as
f
:
f
.
write
(
"# Generated by _generate_configs.py
\n
"
)
yaml
.
dump
(
{
"include"
:
"_template_yaml"
,
"task"
:
f
"
{
dataset_path
.
split
(
'/'
)[
-
1
]
}
_
{
task
}
"
,
"dataset_name"
:
task
,
},
f
,
)
except
FileExistsError
:
pass
if
__name__
==
"__main__"
:
main
()
lm_eval/tasks/model_written_evals/advanced_ai_risk/_template_yaml
0 → 100644
View file @
574e565a
group: advanced_ai_risk
dataset_path: EleutherAI/advanced_ai_risk
output_type: multiple_choice
validation_split: validation
target_delimiter: ""
doc_to_text: "Human: {question}\n\nAssistant:"
doc_to_target: 0
doc_to_choice: "{{[answer_matching_behavior, answer_not_matching_behavior]}}"
should_decontaminate: true
doc_to_decontamination_query: "{{sentence_good}} {{sentence_bad}}"
metric_list:
- metric: acc
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-coordinate-itself.yaml
0 → 100644
View file @
574e565a
# Generated by _generate_configs.py
dataset_name
:
fewshot-coordinate-itself
include
:
_template_yaml
task
:
advanced_ai_risk_fewshot-coordinate-itself
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-coordinate-other-ais.yaml
0 → 100644
View file @
574e565a
# Generated by _generate_configs.py
dataset_name
:
fewshot-coordinate-other-ais
include
:
_template_yaml
task
:
advanced_ai_risk_fewshot-coordinate-other-ais
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-coordinate-other-versions.yaml
0 → 100644
View file @
574e565a
# Generated by _generate_configs.py
dataset_name
:
fewshot-coordinate-other-versions
include
:
_template_yaml
task
:
advanced_ai_risk_fewshot-coordinate-other-versions
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-corrigible-less-HHH.yaml
0 → 100644
View file @
574e565a
# Generated by _generate_configs.py
dataset_name
:
fewshot-corrigible-less-HHH
include
:
_template_yaml
task
:
advanced_ai_risk_fewshot-corrigible-less-HHH
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-corrigible-more-HHH.yaml
0 → 100644
View file @
574e565a
# Generated by _generate_configs.py
dataset_name
:
fewshot-corrigible-more-HHH
include
:
_template_yaml
task
:
advanced_ai_risk_fewshot-corrigible-more-HHH
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-corrigible-neutral-HHH.yaml
0 → 100644
View file @
574e565a
# Generated by _generate_configs.py
dataset_name
:
fewshot-corrigible-neutral-HHH
include
:
_template_yaml
task
:
advanced_ai_risk_fewshot-corrigible-neutral-HHH
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-myopic-reward.yaml
0 → 100644
View file @
574e565a
# Generated by _generate_configs.py
dataset_name
:
fewshot-myopic-reward
include
:
_template_yaml
task
:
advanced_ai_risk_fewshot-myopic-reward
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-one-box-tendency.yaml
0 → 100644
View file @
574e565a
# Generated by _generate_configs.py
dataset_name
:
fewshot-one-box-tendency
include
:
_template_yaml
task
:
advanced_ai_risk_fewshot-one-box-tendency
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-power-seeking-inclination.yaml
0 → 100644
View file @
574e565a
# Generated by _generate_configs.py
dataset_name
:
fewshot-power-seeking-inclination
include
:
_template_yaml
task
:
advanced_ai_risk_fewshot-power-seeking-inclination
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-self-awareness-general-ai.yaml
0 → 100644
View file @
574e565a
# Generated by _generate_configs.py
dataset_name
:
fewshot-self-awareness-general-ai
include
:
_template_yaml
task
:
advanced_ai_risk_fewshot-self-awareness-general-ai
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-self-awareness-good-text-model.yaml
0 → 100644
View file @
574e565a
# Generated by _generate_configs.py
dataset_name
:
fewshot-self-awareness-good-text-model
include
:
_template_yaml
task
:
advanced_ai_risk_fewshot-self-awareness-good-text-model
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-self-awareness-text-model.yaml
0 → 100644
View file @
574e565a
# Generated by _generate_configs.py
dataset_name
:
fewshot-self-awareness-text-model
include
:
_template_yaml
task
:
advanced_ai_risk_fewshot-self-awareness-text-model
lm_eval/tasks/model_written_evals/advanced_ai_risk/fewshot-self-awareness-training-architecture.yaml
0 → 100644
View file @
574e565a
# Generated by _generate_configs.py
dataset_name
:
fewshot-self-awareness-training-architecture
include
:
_template_yaml
task
:
advanced_ai_risk_fewshot-self-awareness-training-architecture
Prev
1
…
12
13
14
15
16
17
18
19
20
…
25
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment