Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
ad506a13
Unverified
Commit
ad506a13
authored
Oct 14, 2025
by
Baber Abbasi
Committed by
GitHub
Oct 14, 2025
Browse files
remove duplicate tags/groups (#3343)
parent
d5ddccd9
Changes
21
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
42 additions
and
42 deletions
+42
-42
lm_eval/tasks/longbench2/_longbench2.yaml
lm_eval/tasks/longbench2/_longbench2.yaml
+5
-5
lm_eval/tasks/longbench2/academic_multi_doc.yaml
lm_eval/tasks/longbench2/academic_multi_doc.yaml
+2
-2
lm_eval/tasks/longbench2/academic_single.yaml
lm_eval/tasks/longbench2/academic_single.yaml
+2
-2
lm_eval/tasks/longbench2/agent_history.yaml
lm_eval/tasks/longbench2/agent_history.yaml
+2
-2
lm_eval/tasks/longbench2/detective.yaml
lm_eval/tasks/longbench2/detective.yaml
+2
-2
lm_eval/tasks/longbench2/dialogue_history.yaml
lm_eval/tasks/longbench2/dialogue_history.yaml
+2
-2
lm_eval/tasks/longbench2/event_order.yaml
lm_eval/tasks/longbench2/event_order.yaml
+2
-2
lm_eval/tasks/longbench2/fin_multi_doc.yaml
lm_eval/tasks/longbench2/fin_multi_doc.yaml
+2
-2
lm_eval/tasks/longbench2/fin_single_doc.yaml
lm_eval/tasks/longbench2/fin_single_doc.yaml
+2
-2
lm_eval/tasks/longbench2/govt_multi_doc.yaml
lm_eval/tasks/longbench2/govt_multi_doc.yaml
+2
-2
lm_eval/tasks/longbench2/govt_single_doc.yaml
lm_eval/tasks/longbench2/govt_single_doc.yaml
+2
-2
lm_eval/tasks/longbench2/graph.yaml
lm_eval/tasks/longbench2/graph.yaml
+2
-2
lm_eval/tasks/longbench2/legal_multi.yaml
lm_eval/tasks/longbench2/legal_multi.yaml
+2
-2
lm_eval/tasks/longbench2/legal_single.yaml
lm_eval/tasks/longbench2/legal_single.yaml
+2
-2
lm_eval/tasks/longbench2/lit_single_doc.yaml
lm_eval/tasks/longbench2/lit_single_doc.yaml
+2
-2
lm_eval/tasks/longbench2/longbench2_code.yaml
lm_eval/tasks/longbench2/longbench2_code.yaml
+1
-1
lm_eval/tasks/longbench2/many_shot.yaml
lm_eval/tasks/longbench2/many_shot.yaml
+2
-2
lm_eval/tasks/longbench2/news_multi.yaml
lm_eval/tasks/longbench2/news_multi.yaml
+2
-2
lm_eval/tasks/longbench2/table.yaml
lm_eval/tasks/longbench2/table.yaml
+2
-2
lm_eval/tasks/longbench2/translate.yaml
lm_eval/tasks/longbench2/translate.yaml
+2
-2
No files found.
lm_eval/tasks/longbench2/_longbench2.yaml
View file @
ad506a13
group
:
longbench2
group
:
longbench2
task
:
task
:
-
longbench2_history
-
longbench2_history
_tasks
-
longbench2_incontext
-
longbench2_incontext
_tasks
-
longbench2_multi
-
longbench2_multi
_tasks
-
longbench2_single
-
longbench2_single
_tasks
-
longbench2_structured
-
longbench2_structured
_tasks
-
longbench2_code
-
longbench2_code
aggregate_metric_list
:
aggregate_metric_list
:
-
metric
:
acc
-
metric
:
acc
...
...
lm_eval/tasks/longbench2/academic_multi_doc.yaml
View file @
ad506a13
include
:
_longbench_common_yaml
include
:
_longbench_common_yaml
tag
:
tag
:
-
longbench2
-
longbench2
_tasks
-
longbench2_multi
-
longbench2_multi
_tasks
task
:
longbench2_academic_multi
task
:
longbench2_academic_multi
dataset_name
:
academic_multi
dataset_name
:
academic_multi
lm_eval/tasks/longbench2/academic_single.yaml
View file @
ad506a13
include
:
_longbench_common_yaml
include
:
_longbench_common_yaml
tag
:
tag
:
-
longbench2
-
longbench2
_tasks
-
longbench2_single
-
longbench2_single
_tasks
task
:
longbench2_academic_single
task
:
longbench2_academic_single
dataset_name
:
academic_single
dataset_name
:
academic_single
lm_eval/tasks/longbench2/agent_history.yaml
View file @
ad506a13
include
:
_longbench_common_yaml
include
:
_longbench_common_yaml
tag
:
tag
:
-
longbench2
-
longbench2
_tasks
-
longbench2_history
-
longbench2_history
_tasks
task
:
longbench2_agent_history
task
:
longbench2_agent_history
dataset_name
:
agent_history_qa
dataset_name
:
agent_history_qa
lm_eval/tasks/longbench2/detective.yaml
View file @
ad506a13
include
:
_longbench_common_yaml
include
:
_longbench_common_yaml
tag
:
tag
:
-
longbench2
-
longbench2
_tasks
-
longbench2_single
-
longbench2_single
_tasks
task
:
longbench2_detective
task
:
longbench2_detective
dataset_name
:
detective
dataset_name
:
detective
lm_eval/tasks/longbench2/dialogue_history.yaml
View file @
ad506a13
include
:
_longbench_common_yaml
include
:
_longbench_common_yaml
tag
:
tag
:
-
longbench2
-
longbench2
_tasks
-
longbench2_history
-
longbench2_history
_tasks
task
:
longbench2_dialogue_history
task
:
longbench2_dialogue_history
dataset_name
:
dialogue_history_qa
dataset_name
:
dialogue_history_qa
lm_eval/tasks/longbench2/event_order.yaml
View file @
ad506a13
include
:
_longbench_common_yaml
include
:
_longbench_common_yaml
tag
:
tag
:
-
longbench2
-
longbench2
_tasks
-
longbench2_single
-
longbench2_single
_tasks
task
:
longbench2_event_order
task
:
longbench2_event_order
dataset_name
:
event_ordering
dataset_name
:
event_ordering
lm_eval/tasks/longbench2/fin_multi_doc.yaml
View file @
ad506a13
include
:
_longbench_common_yaml
include
:
_longbench_common_yaml
tag
:
tag
:
-
longbench2
-
longbench2
_tasks
-
longbench2_multi
-
longbench2_multi
_tasks
task
:
longbench2_fin_multi
task
:
longbench2_fin_multi
dataset_name
:
financial_multi
dataset_name
:
financial_multi
lm_eval/tasks/longbench2/fin_single_doc.yaml
View file @
ad506a13
include
:
_longbench_common_yaml
include
:
_longbench_common_yaml
tag
:
tag
:
-
longbench2
-
longbench2
_tasks
-
longbench2_single
-
longbench2_single
_tasks
task
:
longbench2_fin_single
task
:
longbench2_fin_single
dataset_name
:
financial_single
dataset_name
:
financial_single
lm_eval/tasks/longbench2/govt_multi_doc.yaml
View file @
ad506a13
include
:
_longbench_common_yaml
include
:
_longbench_common_yaml
tag
:
tag
:
-
longbench2
-
longbench2
_tasks
-
longbench2_multi
-
longbench2_multi
_tasks
task
:
longbench2_govt_multi
task
:
longbench2_govt_multi
dataset_name
:
government_multi
dataset_name
:
government_multi
lm_eval/tasks/longbench2/govt_single_doc.yaml
View file @
ad506a13
include
:
_longbench_common_yaml
include
:
_longbench_common_yaml
tag
:
tag
:
-
longbench2
-
longbench2
_tasks
-
longbench2_single
-
longbench2_single
_tasks
task
:
longbench2_govt_single
task
:
longbench2_govt_single
dataset_name
:
government_single
dataset_name
:
government_single
lm_eval/tasks/longbench2/graph.yaml
View file @
ad506a13
include
:
_longbench_common_yaml
include
:
_longbench_common_yaml
tag
:
tag
:
-
longbench2
-
longbench2
_tasks
-
longbench2_structured
-
longbench2_structured
_tasks
task
:
longbench2_graph
task
:
longbench2_graph
dataset_name
:
graph_reasoning
dataset_name
:
graph_reasoning
lm_eval/tasks/longbench2/legal_multi.yaml
View file @
ad506a13
include
:
_longbench_common_yaml
include
:
_longbench_common_yaml
tag
:
tag
:
-
longbench2
-
longbench2
_tasks
-
longbench2_multi
-
longbench2_multi
_tasks
task
:
longbench2_legal_multi
task
:
longbench2_legal_multi
dataset_name
:
legal_multi
dataset_name
:
legal_multi
lm_eval/tasks/longbench2/legal_single.yaml
View file @
ad506a13
include
:
_longbench_common_yaml
include
:
_longbench_common_yaml
tag
:
tag
:
-
longbench2
-
longbench2
_tasks
-
longbench2_single
-
longbench2_single
_tasks
task
:
longbench2_legal_single
task
:
longbench2_legal_single
dataset_name
:
legal_single
dataset_name
:
legal_single
lm_eval/tasks/longbench2/lit_single_doc.yaml
View file @
ad506a13
include
:
_longbench_common_yaml
include
:
_longbench_common_yaml
tag
:
tag
:
-
longbench2
-
longbench2
_tasks
-
longbench2_single
-
longbench2_single
_tasks
task
:
longbench2_lit_single
task
:
longbench2_lit_single
dataset_name
:
literary
dataset_name
:
literary
lm_eval/tasks/longbench2/longbench2_code.yaml
View file @
ad506a13
include
:
_longbench_common_yaml
include
:
_longbench_common_yaml
tag
:
tag
:
-
longbench2
-
longbench2
_tasks
task
:
longbench2_code
task
:
longbench2_code
dataset_name
:
code_repo_qa
dataset_name
:
code_repo_qa
lm_eval/tasks/longbench2/many_shot.yaml
View file @
ad506a13
include
:
_longbench_common_yaml
include
:
_longbench_common_yaml
tag
:
tag
:
-
longbench2
-
longbench2
_tasks
-
longbench2_incontext
-
longbench2_incontext
_tasks
task
:
longbench2_many_shot
task
:
longbench2_many_shot
dataset_name
:
manyshot_learning
dataset_name
:
manyshot_learning
lm_eval/tasks/longbench2/news_multi.yaml
View file @
ad506a13
include
:
_longbench_common_yaml
include
:
_longbench_common_yaml
tag
:
tag
:
-
longbench2
-
longbench2
_tasks
-
longbench2_multi
-
longbench2_multi
_tasks
task
:
longbench2_news_multi
task
:
longbench2_news_multi
dataset_name
:
multinews
dataset_name
:
multinews
lm_eval/tasks/longbench2/table.yaml
View file @
ad506a13
include
:
_longbench_common_yaml
include
:
_longbench_common_yaml
tag
:
tag
:
-
longbench2
-
longbench2
_tasks
-
longbench2_structured
-
longbench2_structured
_tasks
task
:
longbench2_table
task
:
longbench2_table
dataset_name
:
table_qa
dataset_name
:
table_qa
lm_eval/tasks/longbench2/translate.yaml
View file @
ad506a13
include
:
_longbench_common_yaml
include
:
_longbench_common_yaml
tag
:
tag
:
-
longbench2
-
longbench2
_tasks
-
longbench2_incontext
-
longbench2_incontext
_tasks
task
:
longbench2_translate
task
:
longbench2_translate
dataset_name
:
new_language_translation
dataset_name
:
new_language_translation
Prev
1
2
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment