Commit c1e63555 authored by Yu Shi Jie's avatar Yu Shi Jie
Browse files

Merge branch 'upstream' into 'mmlu-pro'

add tokenizer logs info (#1731)

See merge request shijie.yu/lm-evaluation-harness!4
parents e361687c 42dc2448
# arc mt
arc mt is an implementation of tasks to support machine translated arc
challenge evals, to improve eval support across a number of additional
languages.
The main page for the effort is
[here](https://huggingface.co/datasets/LumiOpen/arc_challenge_mt) and we will
include more data and analysis there.
Initial datasets include a number of European languages, and we plan to expand
more in the future.
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_da
dataset_name: da
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_de
dataset_name: de
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_el
dataset_name: el
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_es
dataset_name: es
tag:
- arc_challenge_mt
task: arc_challenge_mt_fi
dataset_path: LumiOpen/arc_challenge_mt
dataset_name: fi
output_type: multiple_choice
training_split: train
validation_split: validation
test_split: test
doc_to_text: "Question: {{question}}\nAnswer:"
doc_to_target: "{{choices.label.index(answerKey)}}"
doc_to_choice: "{{choices.text}}"
should_decontaminate: true
doc_to_decontamination_query: "Question: {{question}}\nAnswer:"
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
metadata:
version: 1.0
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_hu
dataset_name: hu
group:
- arc_challenge_mt
task: arc_challenge_mt_is
dataset_path: mideind/icelandic-arc-challenge
output_type: multiple_choice
training_split: train
validation_split: validation
test_split: test
doc_to_text: "Question: {{question}}\nAnswer:"
doc_to_target: "{{choices.label.index(answerKey)}}"
doc_to_choice: "{{choices.text}}"
should_decontaminate: true
doc_to_decontamination_query: "Question: {{question}}\nAnswer:"
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
metadata:
version: 1.0
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_it
dataset_name: it
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_nb
dataset_name: nb
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_pl
dataset_name: pl
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_pt
dataset_name: pt
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_sv
dataset_name: sv
......@@ -27,9 +27,9 @@ Homepage: https://github.com/openai/gpt-3/tree/master/data
}
```
### Groups and Tasks
### Groups, Tags, and Tasks
#### Groups
#### Tags
* `arithmetic`: Evaluates `1dc` to `5ds`
......
group:
tag:
- arithmetic
task: arithmetic_1dc
dataset_path: EleutherAI/arithmetic
......
......@@ -32,7 +32,7 @@ Homepage: https://github.com/chaochun/nlu-asdiv-dataset
}
```
### Groups and Tasks
### Groups, Tags, and Tasks
#### Groups
......
......@@ -21,12 +21,16 @@ Homepage: https://github.com/facebookarchive/bAbI-tasks
}
```
### Groups and Tasks
### Groups, Tags, and Tasks
#### Groups
* Not part of a group yet
#### Tags
* No tags applied.
#### Tasks
* `babi`
......
......@@ -43,11 +43,15 @@ Homepage: `https://github.com/hitz-zentroa/latxa`
}
```
### Groups and Tasks
### Groups, Tags, and Tasks
#### Groups
* `basque-glue`: First version of the implementation
None.
#### Tags
* `basque-glue`: First version of the implementation. Calls all subtasks, but does not average.
#### Tasks
......
group: basque-glue
tag: basque-glue
task: bec2016eu
dataset_path: orai-nlp/basqueGLUE
dataset_name: bec
......
group: basque-glue
tag: basque-glue
task: bhtc_v2
dataset_path: orai-nlp/basqueGLUE
dataset_name: bhtc
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment