Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
01d89cdc
Commit
01d89cdc
authored
Jan 19, 2025
by
Baber
Browse files
rename
parent
be7be189
Changes
14
Hide whitespace changes
Inline
Side-by-side
Showing
14 changed files
with
51 additions
and
24 deletions
+51
-24
lm_eval/tasks/ruler/cwe.yaml
lm_eval/tasks/ruler/cwe.yaml
+1
-1
lm_eval/tasks/ruler/fwe.yaml
lm_eval/tasks/ruler/fwe.yaml
+1
-1
lm_eval/tasks/ruler/niah_multikey_1.yaml
lm_eval/tasks/ruler/niah_multikey_1.yaml
+2
-2
lm_eval/tasks/ruler/niah_multikey_2.yaml
lm_eval/tasks/ruler/niah_multikey_2.yaml
+2
-2
lm_eval/tasks/ruler/niah_multikey_3.yaml
lm_eval/tasks/ruler/niah_multikey_3.yaml
+2
-2
lm_eval/tasks/ruler/niah_multiquery.yaml
lm_eval/tasks/ruler/niah_multiquery.yaml
+2
-2
lm_eval/tasks/ruler/niah_multivalue.yaml
lm_eval/tasks/ruler/niah_multivalue.yaml
+2
-2
lm_eval/tasks/ruler/niah_single_1.yaml
lm_eval/tasks/ruler/niah_single_1.yaml
+2
-3
lm_eval/tasks/ruler/niah_single_2.yaml
lm_eval/tasks/ruler/niah_single_2.yaml
+2
-2
lm_eval/tasks/ruler/niah_single_3.yaml
lm_eval/tasks/ruler/niah_single_3.yaml
+2
-2
lm_eval/tasks/ruler/qa_squad.yaml
lm_eval/tasks/ruler/qa_squad.yaml
+1
-1
lm_eval/tasks/ruler/qa_utils.py
lm_eval/tasks/ruler/qa_utils.py
+11
-3
lm_eval/tasks/ruler/ruler.yaml
lm_eval/tasks/ruler/ruler.yaml
+20
-0
lm_eval/tasks/ruler/vt.yaml
lm_eval/tasks/ruler/vt.yaml
+1
-1
No files found.
lm_eval/tasks/ruler/cwe.yaml
View file @
01d89cdc
include
:
niah_1.yaml
include
:
niah_
single_
1.yaml
task
:
ruler_cwe
task
:
ruler_cwe
download_dataset
:
!function
cwe_utils.get_cw_dataset
download_dataset
:
!function
cwe_utils.get_cw_dataset
generation_kwargs
:
generation_kwargs
:
...
...
lm_eval/tasks/ruler/fwe.yaml
View file @
01d89cdc
include
:
niah_1.yaml
include
:
niah_
single_
1.yaml
task
:
ruler_fwe
task
:
ruler_fwe
download_dataset
:
!function
fwe_utils.fwe_download
download_dataset
:
!function
fwe_utils.fwe_download
generation_kwargs
:
generation_kwargs
:
...
...
lm_eval/tasks/ruler/niah_
4
.yaml
→
lm_eval/tasks/ruler/niah_
multikey_1
.yaml
View file @
01d89cdc
task
:
niah_
4
task
:
niah_
multikey_1
include
:
niah_1.yaml
include
:
niah_
single_
1.yaml
download_dataset
:
!function
utils.niah_multikey_1
download_dataset
:
!function
utils.niah_multikey_1
lm_eval/tasks/ruler/niah_
5
.yaml
→
lm_eval/tasks/ruler/niah_
multikey_2
.yaml
View file @
01d89cdc
task
:
niah_
5
task
:
niah_
multikey_2
include
:
niah_1.yaml
include
:
niah_
single_
1.yaml
download_dataset
:
!function
utils.niah_multikey_2
download_dataset
:
!function
utils.niah_multikey_2
lm_eval/tasks/ruler/niah_
6
.yaml
→
lm_eval/tasks/ruler/niah_
multikey_3
.yaml
View file @
01d89cdc
task
:
niah_
6
task
:
niah_
multikey_3
include
:
niah_1.yaml
include
:
niah_
single_
1.yaml
download_dataset
:
!function
utils.niah_multikey_3
download_dataset
:
!function
utils.niah_multikey_3
lm_eval/tasks/ruler/niah_
8
.yaml
→
lm_eval/tasks/ruler/niah_
multiquery
.yaml
View file @
01d89cdc
task
:
niah_
8
task
:
niah_
multiquery
include
:
niah_1.yaml
include
:
niah_
single_
1.yaml
download_dataset
:
!function
utils.niah_multiquery
download_dataset
:
!function
utils.niah_multiquery
lm_eval/tasks/ruler/niah_
7
.yaml
→
lm_eval/tasks/ruler/niah_
multivalue
.yaml
View file @
01d89cdc
task
:
niah_
7
task
:
niah_
multivalue
include
:
niah_1.yaml
include
:
niah_
single_
1.yaml
download_dataset
:
!function
utils.niah_multivalue
download_dataset
:
!function
utils.niah_multivalue
lm_eval/tasks/ruler/niah_1.yaml
→
lm_eval/tasks/ruler/niah_
single_
1.yaml
View file @
01d89cdc
tag
:
tag
:
-
ruler
-
longcxt
task
:
niah_1
task
:
niah_
single_
1
dataset_path
:
"
"
dataset_path
:
"
"
dataset_name
:
"
"
dataset_name
:
"
"
output_type
:
generate_until
output_type
:
generate_until
...
@@ -10,7 +10,6 @@ doc_to_text: "{{input}}"
...
@@ -10,7 +10,6 @@ doc_to_text: "{{input}}"
doc_to_target
:
"
{{outputs[0]}}"
doc_to_target
:
"
{{outputs[0]}}"
gen_prefix
:
"
{{gen_prefix}}"
gen_prefix
:
"
{{gen_prefix}}"
process_results
:
!function
utils.process_results
process_results
:
!function
utils.process_results
metric_list
:
metric_list
:
-
metric
:
"
4096"
-
metric
:
"
4096"
aggregation
:
!function
utils.aggregate_metrics
aggregation
:
!function
utils.aggregate_metrics
...
...
lm_eval/tasks/ruler/niah_2.yaml
→
lm_eval/tasks/ruler/niah_
single_
2.yaml
View file @
01d89cdc
task
:
niah_2
task
:
niah_
single_
2
include
:
niah_1.yaml
include
:
niah_
single_
1.yaml
download_dataset
:
!function
utils.niah_single_2
download_dataset
:
!function
utils.niah_single_2
lm_eval/tasks/ruler/niah_3.yaml
→
lm_eval/tasks/ruler/niah_
single_
3.yaml
View file @
01d89cdc
task
:
niah_3
task
:
niah_
single_
3
include
:
niah_1.yaml
include
:
niah_
single_
1.yaml
download_dataset
:
!function
utils.niah_single_3
download_dataset
:
!function
utils.niah_single_3
lm_eval/tasks/ruler/qa_squad.yaml
View file @
01d89cdc
include
:
niah_1.yaml
include
:
niah_
single_
1.yaml
task
:
ruler_qa_squad
task
:
ruler_qa_squad
download_dataset
:
!function
qa_utils.get_squad
download_dataset
:
!function
qa_utils.get_squad
test_split
:
test
test_split
:
test
...
...
lm_eval/tasks/ruler/qa_utils.py
View file @
01d89cdc
...
@@ -15,7 +15,7 @@
...
@@ -15,7 +15,7 @@
import
itertools
# noqa: I001
import
itertools
# noqa: I001
import
random
import
random
from
functools
import
cache
,
partial
from
functools
import
cache
import
datasets
import
datasets
import
requests
import
requests
...
@@ -237,5 +237,13 @@ def get_qa_dataset(ds, **kwargs):
...
@@ -237,5 +237,13 @@ def get_qa_dataset(ds, **kwargs):
}
}
get_squad
=
partial
(
get_qa_dataset
,
"squad"
)
def
get_squad
(
**
kwargs
):
get_hotpotqa
=
partial
(
get_qa_dataset
,
"hotpotqa"
)
return
get_qa_dataset
(
"squad"
,
**
kwargs
)
def
get_hotpotqa
(
**
kwargs
):
return
get_qa_dataset
(
"hotpotqa"
,
**
kwargs
)
# get_squad = lambda **kwargs: partial(get_qa_dataset, "squad")(**kwargs)
# get_hotpotqa = lambda **kwargs: partial(get_qa_dataset, "hotpotqa")(**kwargs)
lm_eval/tasks/ruler/ruler.yaml
0 → 100644
View file @
01d89cdc
group
:
ruler
task
:
-
niah_single_1
-
niah_single_2
-
niah_single_3
-
niah_multikey_1
-
niah_multikey_2
-
niah_multikey_3
-
niah_multiquery
-
niah_multivalue
-
ruler_vt
-
ruler_cwe
-
ruler_fwe
-
ruler_qa_squad
-
ruler_qa_hotpot
aggregate_metric_list
:
-
metric
:
acc
weight_by_size
:
False
metadata
:
version
:
1
lm_eval/tasks/ruler/vt.yaml
View file @
01d89cdc
include
:
niah_1.yaml
include
:
niah_
single_
1.yaml
task
:
ruler_vt
task
:
ruler_vt
download_dataset
:
!function
vt_utils.get_vt_dataset
download_dataset
:
!function
vt_utils.get_vt_dataset
generation_kwargs
:
generation_kwargs
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment