Unverified Commit bdddfec2 authored by Hailey Schoelkopf's avatar Hailey Schoelkopf Committed by GitHub
Browse files

Merge pull request #864 from EleutherAI/add-fewshot-config

[Refactor] CMMLU, C-Eval port ; Add fewshot config
parents 0f6cd358 f88ffeee
"dataset_name": "metrology_engineer"
"description": "以下是中国关于注册计量师的单项选择题,请选出其中的正确答案。\n\n"
"include": "_default_ceval_yaml"
"task": "ceval-valid_metrology_engineer"
"dataset_name": "middle_school_biology"
"description": "以下是中国关于初中生物的单项选择题,请选出其中的正确答案。\n\n"
"include": "_default_ceval_yaml"
"task": "ceval-valid_middle_school_biology"
"dataset_name": "middle_school_chemistry"
"description": "以下是中国关于初中化学的单项选择题,请选出其中的正确答案。\n\n"
"include": "_default_ceval_yaml"
"task": "ceval-valid_middle_school_chemistry"
"dataset_name": "middle_school_geography"
"description": "以下是中国关于初中地理的单项选择题,请选出其中的正确答案。\n\n"
"include": "_default_ceval_yaml"
"task": "ceval-valid_middle_school_geography"
"dataset_name": "middle_school_history"
"description": "以下是中国关于初中历史的单项选择题,请选出其中的正确答案。\n\n"
"include": "_default_ceval_yaml"
"task": "ceval-valid_middle_school_history"
"dataset_name": "middle_school_mathematics"
"description": "以下是中国关于初中数学的单项选择题,请选出其中的正确答案。\n\n"
"include": "_default_ceval_yaml"
"task": "ceval-valid_middle_school_mathematics"
"dataset_name": "middle_school_physics"
"description": "以下是中国关于初中物理的单项选择题,请选出其中的正确答案。\n\n"
"include": "_default_ceval_yaml"
"task": "ceval-valid_middle_school_physics"
"dataset_name": "middle_school_politics"
"description": "以下是中国关于初中政治的单项选择题,请选出其中的正确答案。\n\n"
"include": "_default_ceval_yaml"
"task": "ceval-valid_middle_school_politics"
"dataset_name": "modern_chinese_history"
"description": "以下是中国关于近代史纲要的单项选择题,请选出其中的正确答案。\n\n"
"include": "_default_ceval_yaml"
"task": "ceval-valid_modern_chinese_history"
"dataset_name": "operating_system"
"description": "以下是中国关于操作系统的单项选择题,请选出其中的正确答案。\n\n"
"include": "_default_ceval_yaml"
"task": "ceval-valid_operating_system"
"dataset_name": "physician"
"description": "以下是中国关于医师资格的单项选择题,请选出其中的正确答案。\n\n"
"include": "_default_ceval_yaml"
"task": "ceval-valid_physician"
"dataset_name": "plant_protection"
"description": "以下是中国关于植物保护的单项选择题,请选出其中的正确答案。\n\n"
"include": "_default_ceval_yaml"
"task": "ceval-valid_plant_protection"
"dataset_name": "probability_and_statistics"
"description": "以下是中国关于概率统计的单项选择题,请选出其中的正确答案。\n\n"
"include": "_default_ceval_yaml"
"task": "ceval-valid_probability_and_statistics"
"dataset_name": "professional_tour_guide"
"description": "以下是中国关于导游资格的单项选择题,请选出其中的正确答案。\n\n"
"include": "_default_ceval_yaml"
"task": "ceval-valid_professional_tour_guide"
"dataset_name": "sports_science"
"description": "以下是中国关于体育学的单项选择题,请选出其中的正确答案。\n\n"
"include": "_default_ceval_yaml"
"task": "ceval-valid_sports_science"
"dataset_name": "tax_accountant"
"description": "以下是中国关于税务师的单项选择题,请选出其中的正确答案。\n\n"
"include": "_default_ceval_yaml"
"task": "ceval-valid_tax_accountant"
"dataset_name": "teacher_qualification"
"description": "以下是中国关于教师资格的单项选择题,请选出其中的正确答案。\n\n"
"include": "_default_ceval_yaml"
"task": "ceval-valid_teacher_qualification"
"dataset_name": "urban_and_rural_planner"
"description": "以下是中国关于注册城乡规划师的单项选择题,请选出其中的正确答案。\n\n"
"include": "_default_ceval_yaml"
"task": "ceval-valid_urban_and_rural_planner"
"dataset_name": "veterinary_medicine"
"description": "以下是中国关于兽医学的单项选择题,请选出其中的正确答案。\n\n"
"include": "_default_ceval_yaml"
"task": "ceval-valid_veterinary_medicine"
# CMMLU
### Paper
CMMLU: Measuring massive multitask language understanding in Chinese
https://arxiv.org/abs/2306.09212
CMMLU is a comprehensive evaluation benchmark specifically designed to evaluate the knowledge and reasoning abilities of LLMs within the context of Chinese language and culture.
CMMLU covers a wide range of subjects, comprising 67 topics that span from elementary to advanced professional levels.
Homepage: https://github.com/haonan-li/CMMLU
### Citation
```bibtex
@misc{li2023cmmlu,
title={CMMLU: Measuring massive multitask language understanding in Chinese},
author={Haonan Li and Yixuan Zhang and Fajri Koto and Yifei Yang and Hai Zhao and Yeyun Gong and Nan Duan and Timothy Baldwin},
year={2023},
eprint={2306.09212},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
### Groups and Tasks
#### Groups
- `cmmlu`: All 67 subjects of the CMMLU dataset, evaluated following the methodology in MMLU's original implementation.
#### Tasks
The following tasks evaluate subjects in the CMMLU dataset using loglikelihood-based multiple-choice scoring:
- `cmmlu_{subject_english}`
### Checklist
* [x] Is the task an existing benchmark in the literature?
* [x] Have you referenced the original paper that introduced the task?
* [x] If yes, does the original paper provide a reference implementation?
* [x] Yes, original implementation contributed by author of the benchmark
If other tasks on this dataset are already supported:
* [x] Is the "Main" variant of this task clearly denoted?
* [x] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [x] Have you noted which, if any, published evaluation setups are matched by this variant?
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment