"sgl-router/src/vscode:/vscode.git/clone" did not exist on "38907fe639047fa21dfa22eadbeb7512b1ecd053"
Unverified Commit a2af2101 authored by Yen-Ting Lin's avatar Yen-Ting Lin Committed by GitHub
Browse files

Merge branch 'EleutherAI:main' into main

parents 82cb25c1 d5f39bf8
group: tag:
- crows_pairs - crows_pairs
- social_bias
- loglikelihood
task: crows_pairs_english task: crows_pairs_english
dataset_path: BigScienceBiasEval/crows_pairs_multilingual dataset_path: BigScienceBiasEval/crows_pairs_multilingual
dataset_name: english dataset_name: english
......
group: csatqa
task:
- csatqa_gr
- csatqa_li
- csatqa_rch
- csatqa_rcs
- csatqa_rcss
- csatqa_wr
aggregate_metric_list:
- metric: acc
aggregation: mean
weight_by_size: true
- metric: acc_norm
aggregation: mean
weight_by_size: true
metadata:
version: 0.0
group: csatqa
dataset_path: EleutherAI/csatqa dataset_path: EleutherAI/csatqa
test_split: test test_split: test
output_type: multiple_choice output_type: multiple_choice
......
""" """
Take in a YAML, and output all other splits with this YAML Take in a YAML, and output all other splits with this YAML
""" """
import argparse import argparse
import os import os
......
"""
"""
import re import re
from typing import List from typing import List
...@@ -14,7 +12,7 @@ class FDA(ConfigurableTask): ...@@ -14,7 +12,7 @@ class FDA(ConfigurableTask):
DATASET_PATH = "hazyresearch/based-fda" DATASET_PATH = "hazyresearch/based-fda"
DATASET_NAME = "default" DATASET_NAME = "default"
def __init__(self): def __init__(self, **kwargs):
super().__init__(config={"metadata": {"version": self.VERSION}}) super().__init__(config={"metadata": {"version": self.VERSION}})
def has_training_docs(self): def has_training_docs(self):
......
...@@ -38,18 +38,19 @@ Homepage: https://github.com/hitachi-nlp/FLD ...@@ -38,18 +38,19 @@ Homepage: https://github.com/hitachi-nlp/FLD
### Groups and Tasks ### Groups and Tasks
#### Groups
* `fld`
#### Tasks
This release is the simplified version of FLD where a model is required to predict only an answer. This release is the simplified version of FLD where a model is required to predict only an answer.
This setting is described by "answer accuracy" in the original paper. This setting is described by "answer accuracy" in the original paper.
#### Tasks in Group `fld`
* `fld_default` is a basic task based on [FLD.v2](https://huggingface.co/datasets/hitachi-nlp/FLD.v2/viewer/star) * `fld_default` is a basic task based on [FLD.v2](https://huggingface.co/datasets/hitachi-nlp/FLD.v2/viewer/star)
* `fld_star`: is a more challenging version based on [FLD.v2-star](https://huggingface.co/datasets/hitachi-nlp/FLD.v2/viewer/star) * `fld_star`: is a more challenging version based on [FLD.v2-star](https://huggingface.co/datasets/hitachi-nlp/FLD.v2/viewer/star)
#### Tasks in Group `fld_logical_formula`
Further, we have "logical formula" versions of the benchmarks, which evaluate LLMs' pure logical reasoning capabilities within the domain of logical formulas, rather than natural language:
* `fld_logical_formula_default`
* `fld_logical_formula_fld_star`
### Checklist ### Checklist
For adding novel benchmarks/datasets to the library: For adding novel benchmarks/datasets to the library:
......
group:
- fld
task: fld_default task: fld_default
dataset_path: hitachi-nlp/FLD.v2 dataset_path: hitachi-nlp/FLD.v2
dataset_name: default dataset_name: default
......
group:
- fld_logical_formula
task: fld_logical_formula_default
dataset_path: hitachi-nlp/FLD.v2
dataset_name: default
training_split: train
validation_split: validation
test_split: test
doc_to_text: "Based on the provided facts ($context$), either prove or disprove the hypothesis or state that it is unknown. The facts and the hypothesis are written in logical formulas as follows: capital letters such as \"{A}\", \"{B}\", \"{AB}\" are predicates, small letters such as \"{a}\", \"{b}\", \"{ab}\" are constants, \"&\" is logical conjunction, \"v\" is logical disjunction, \"¬\" is negation, \"->\" is implication, \"(x)\" is \"for all x\", and \"(Ex)\" is \"for some x\". $hypothesis$ = {{hypothesis_formula}} ; $context$ = {{context_formula}} ; $proof$ = "
doc_to_target: world_assump_label
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
filter_list:
- name: remove_whitespace
filter:
- function: remove_whitespace
- function: take_first
metadata:
version: 2.0
include: fld_logical_formula_default.yaml
task: fld_logical_formula_star
dataset_name: star
...@@ -20,9 +20,9 @@ This benchmark is constructed both from openly available datasets, as well as ne ...@@ -20,9 +20,9 @@ This benchmark is constructed both from openly available datasets, as well as ne
} }
``` ```
### Groups and Tasks ### Groups, Tags, and Tasks
#### Groups #### Tags
- `french_bench`: All tasks (non-perplexity based) - `french_bench`: All tasks (non-perplexity based)
- `french_bench_gen`: All official generative tasks - `french_bench_gen`: All official generative tasks
......
group: tag:
- french_bench - french_bench
- french_bench_mc - french_bench_mc
task: french_bench_arc_challenge task: french_bench_arc_challenge
......
include: "_default_template_yaml" include: "_default_template_yaml"
group: tag:
- french_bench - french_bench
- french_bench_extra - french_bench_extra
description: "D'après l'information dans le contexte donné, quelle est la réponse à la question ?" description: "D'après l'information dans le contexte donné, quelle est la réponse à la question ?"
......
include: "_default_template_yaml" include: "_default_template_yaml"
group: tag:
- french_bench - french_bench
- french_bench_extra - french_bench_extra
description: "D'après l'information dans le contexte donné, donne la réponse à la question en citant quelques mots du contexte. Si il est impossible de répondre avec les informations du contexte, répond 'Impossible'." description: "D'après l'information dans le contexte donné, donne la réponse à la question en citant quelques mots du contexte. Si il est impossible de répondre avec les informations du contexte, répond 'Impossible'."
......
include: "_default_template_yaml" include: "_default_template_yaml"
group: tag:
- french_bench - french_bench
- french_bench_extra - french_bench_extra
description: "D'après l'information présente dans le contexte, est il possible de répondre à la question ?" description: "D'après l'information présente dans le contexte, est il possible de répondre à la question ?"
......
include: "_default_template_yaml" include: "_default_template_yaml"
group: tag:
- french_bench - french_bench
- french_bench_gen - french_bench_gen
description: "D'après l'information dans le contexte donné, quelle question a été posée pour obtenir la réponse donnée ?" description: "D'après l'information dans le contexte donné, quelle question a été posée pour obtenir la réponse donnée ?"
......
include: "_default_template_yaml" include: "_default_template_yaml"
group: tag:
- french_bench - french_bench
- french_bench_gen - french_bench_gen
description: "D'après l'information dans le contexte donné, donne la réponse à la question en citant quelques mots du contexte. Si il est impossible de répondre avec les informations du contexte, répond 'Impossible'." description: "D'après l'information dans le contexte donné, donne la réponse à la question en citant quelques mots du contexte. Si il est impossible de répondre avec les informations du contexte, répond 'Impossible'."
......
include: "_default_template_yaml" include: "_default_template_yaml"
group: tag:
- french_bench - french_bench
- french_bench_mc - french_bench_mc
description: "Répond au mieux en complétant la question avec une des réponses proposées." description: "Répond au mieux en complétant la question avec une des réponses proposées."
......
group: tag:
- french_bench - french_bench
- french_bench_mc - french_bench_mc
task: french_bench_hellaswag task: french_bench_hellaswag
......
include: "_default_template_yaml" include: "_default_template_yaml"
group: tag:
- french_bench - french_bench
- french_bench_gen - french_bench_gen
description: "D'après l'information dans le contexte donné, donne la réponse à la question en citant quelques extraits du contexte." description: "D'après l'information dans le contexte donné, donne la réponse à la question en citant quelques extraits du contexte."
......
group: tag:
- french_bench_perplexity - french_bench_perplexity
task: french_bench_opus_perplexity task: french_bench_opus_perplexity
dataset_path: manu/opus100-en-fr dataset_path: manu/opus100-en-fr
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment