"tests/models/bert_japanese/__init__.py" did not exist on "29c10a41d04f855c433a6cde7797b325651417d2"
Commit c1e63555 authored by Yu Shi Jie's avatar Yu Shi Jie
Browse files

Merge branch 'upstream' into 'mmlu-pro'

add tokenizer logs info (#1731)

See merge request shijie.yu/lm-evaluation-harness!4
parents e361687c 42dc2448
# arc mt
arc mt is an implementation of tasks to support machine translated arc
challenge evals, to improve eval support across a number of additional
languages.
The main page for the effort is
[here](https://huggingface.co/datasets/LumiOpen/arc_challenge_mt) and we will
include more data and analysis there.
Initial datasets include a number of European languages, and we plan to expand
more in the future.
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_da
dataset_name: da
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_de
dataset_name: de
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_el
dataset_name: el
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_es
dataset_name: es
tag:
- arc_challenge_mt
task: arc_challenge_mt_fi
dataset_path: LumiOpen/arc_challenge_mt
dataset_name: fi
output_type: multiple_choice
training_split: train
validation_split: validation
test_split: test
doc_to_text: "Question: {{question}}\nAnswer:"
doc_to_target: "{{choices.label.index(answerKey)}}"
doc_to_choice: "{{choices.text}}"
should_decontaminate: true
doc_to_decontamination_query: "Question: {{question}}\nAnswer:"
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
metadata:
version: 1.0
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_hu
dataset_name: hu
group:
- arc_challenge_mt
task: arc_challenge_mt_is
dataset_path: mideind/icelandic-arc-challenge
output_type: multiple_choice
training_split: train
validation_split: validation
test_split: test
doc_to_text: "Question: {{question}}\nAnswer:"
doc_to_target: "{{choices.label.index(answerKey)}}"
doc_to_choice: "{{choices.text}}"
should_decontaminate: true
doc_to_decontamination_query: "Question: {{question}}\nAnswer:"
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
metadata:
version: 1.0
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_it
dataset_name: it
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_nb
dataset_name: nb
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_pl
dataset_name: pl
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_pt
dataset_name: pt
include: arc_challenge_mt_fi.yaml
task: arc_challenge_mt_sv
dataset_name: sv
...@@ -27,9 +27,9 @@ Homepage: https://github.com/openai/gpt-3/tree/master/data ...@@ -27,9 +27,9 @@ Homepage: https://github.com/openai/gpt-3/tree/master/data
} }
``` ```
### Groups and Tasks ### Groups, Tags, and Tasks
#### Groups #### Tags
* `arithmetic`: Evaluates `1dc` to `5ds` * `arithmetic`: Evaluates `1dc` to `5ds`
......
group: tag:
- arithmetic - arithmetic
task: arithmetic_1dc task: arithmetic_1dc
dataset_path: EleutherAI/arithmetic dataset_path: EleutherAI/arithmetic
......
...@@ -32,7 +32,7 @@ Homepage: https://github.com/chaochun/nlu-asdiv-dataset ...@@ -32,7 +32,7 @@ Homepage: https://github.com/chaochun/nlu-asdiv-dataset
} }
``` ```
### Groups and Tasks ### Groups, Tags, and Tasks
#### Groups #### Groups
......
...@@ -21,12 +21,16 @@ Homepage: https://github.com/facebookarchive/bAbI-tasks ...@@ -21,12 +21,16 @@ Homepage: https://github.com/facebookarchive/bAbI-tasks
} }
``` ```
### Groups and Tasks ### Groups, Tags, and Tasks
#### Groups #### Groups
* Not part of a group yet * Not part of a group yet
#### Tags
* No tags applied.
#### Tasks #### Tasks
* `babi` * `babi`
......
...@@ -43,11 +43,15 @@ Homepage: `https://github.com/hitz-zentroa/latxa` ...@@ -43,11 +43,15 @@ Homepage: `https://github.com/hitz-zentroa/latxa`
} }
``` ```
### Groups and Tasks ### Groups, Tags, and Tasks
#### Groups #### Groups
* `basque-glue`: First version of the implementation None.
#### Tags
* `basque-glue`: First version of the implementation. Calls all subtasks, but does not average.
#### Tasks #### Tasks
......
group: basque-glue tag: basque-glue
task: bec2016eu task: bec2016eu
dataset_path: orai-nlp/basqueGLUE dataset_path: orai-nlp/basqueGLUE
dataset_name: bec dataset_name: bec
......
group: basque-glue tag: basque-glue
task: bhtc_v2 task: bhtc_v2
dataset_path: orai-nlp/basqueGLUE dataset_path: orai-nlp/basqueGLUE
dataset_name: bhtc dataset_name: bhtc
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment