README.md 15.1 KB
Newer Older
haileyschoelkopf's avatar
haileyschoelkopf committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
# CMMLU

### Paper

CMMLU: Measuring massive multitask language understanding in Chinese
https://arxiv.org/abs/2306.09212

CMMLU is a comprehensive evaluation benchmark specifically designed to evaluate the knowledge and reasoning abilities of LLMs within the context of Chinese language and culture.
CMMLU covers a wide range of subjects, comprising 67 topics that span from elementary to advanced professional levels.

Homepage: https://github.com/haonan-li/CMMLU

### Citation

```bibtex
@misc{li2023cmmlu,
      title={CMMLU: Measuring massive multitask language understanding in Chinese},
      author={Haonan Li and Yixuan Zhang and Fajri Koto and Yifei Yang and Hai Zhao and Yeyun Gong and Nan Duan and Timothy Baldwin},
      year={2023},
      eprint={2306.09212},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```


|                   Tasks                    |Version|Filter|       Metric       |Value |   |Stderr|
|--------------------------------------------|-------|------|--------------------|-----:|---|-----:|
|cmmlu                                       |N/A    |none  |acc                 |0.2480|   |      |
|                                            |       |none  |acc(sample agg)     |0.2494|   |      |
|                                            |       |none  |acc_norm            |0.2480|   |      |
|                                            |       |none  |acc_norm(sample agg)|0.2494|   |      |
|-cmmlu_modern_chinese                       |Yaml   |none  |acc                 |0.2500|±  |0.0404|
|                                            |       |none  |acc_norm            |0.2500|±  |0.0404|
|-cmmlu_world_history                        |Yaml   |none  |acc                 |0.2484|±  |0.0342|
|                                            |       |none  |acc_norm            |0.2484|±  |0.0342|
|-cmmlu_college_education                    |Yaml   |none  |acc                 |0.2523|±  |0.0422|
|                                            |       |none  |acc_norm            |0.2523|±  |0.0422|
|-cmmlu_international_law                    |Yaml   |none  |acc                 |0.2486|±  |0.0319|
|                                            |       |none  |acc_norm            |0.2486|±  |0.0319|
|-cmmlu_philosophy                           |Yaml   |none  |acc                 |0.1905|±  |0.0385|
|                                            |       |none  |acc_norm            |0.1905|±  |0.0385|
|-cmmlu_professional_psychology              |Yaml   |none  |acc                 |0.2457|±  |0.0283|
|                                            |       |none  |acc_norm            |0.2457|±  |0.0283|
|-cmmlu_college_engineering_hydrology        |Yaml   |none  |acc                 |0.2830|±  |0.0440|
|                                            |       |none  |acc_norm            |0.2830|±  |0.0440|
|-cmmlu_electrical_engineering               |Yaml   |none  |acc                 |0.2442|±  |0.0329|
|                                            |       |none  |acc_norm            |0.2442|±  |0.0329|
|-cmmlu_ancient_chinese                      |Yaml   |none  |acc                 |0.2378|±  |0.0333|
|                                            |       |none  |acc_norm            |0.2378|±  |0.0333|
|-cmmlu_chinese_food_culture                 |Yaml   |none  |acc                 |0.2353|±  |0.0365|
|                                            |       |none  |acc_norm            |0.2353|±  |0.0365|
|-cmmlu_chinese_literature                   |Yaml   |none  |acc                 |0.2598|±  |0.0308|
|                                            |       |none  |acc_norm            |0.2598|±  |0.0308|
|-cmmlu_legal_and_moral_basis                |Yaml   |none  |acc                 |0.2477|±  |0.0296|
|                                            |       |none  |acc_norm            |0.2477|±  |0.0296|
|-cmmlu_construction_project_management      |Yaml   |none  |acc                 |0.2374|±  |0.0362|
|                                            |       |none  |acc_norm            |0.2374|±  |0.0362|
|-cmmlu_ethnology                            |Yaml   |none  |acc                 |0.2519|±  |0.0375|
|                                            |       |none  |acc_norm            |0.2519|±  |0.0375|
|-cmmlu_high_school_geography                |Yaml   |none  |acc                 |0.2542|±  |0.0403|
|                                            |       |none  |acc_norm            |0.2542|±  |0.0403|
|-cmmlu_professional_medicine                |Yaml   |none  |acc                 |0.2500|±  |0.0224|
|                                            |       |none  |acc_norm            |0.2500|±  |0.0224|
|-cmmlu_global_facts                         |Yaml   |none  |acc                 |0.2349|±  |0.0348|
|                                            |       |none  |acc_norm            |0.2349|±  |0.0348|
|-cmmlu_astronomy                            |Yaml   |none  |acc                 |0.2303|±  |0.0329|
|                                            |       |none  |acc_norm            |0.2303|±  |0.0329|
|-cmmlu_machine_learning                     |Yaml   |none  |acc                 |0.2541|±  |0.0396|
|                                            |       |none  |acc_norm            |0.2541|±  |0.0396|
|-cmmlu_high_school_politics                 |Yaml   |none  |acc                 |0.2378|±  |0.0357|
|                                            |       |none  |acc_norm            |0.2378|±  |0.0357|
|-cmmlu_chinese_civil_service_exam           |Yaml   |none  |acc                 |0.2562|±  |0.0346|
|                                            |       |none  |acc_norm            |0.2562|±  |0.0346|
|-cmmlu_professional_law                     |Yaml   |none  |acc                 |0.2512|±  |0.0299|
|                                            |       |none  |acc_norm            |0.2512|±  |0.0299|
|-cmmlu_college_medical_statistics           |Yaml   |none  |acc                 |0.2453|±  |0.0420|
|                                            |       |none  |acc_norm            |0.2453|±  |0.0420|
|-cmmlu_computer_security                    |Yaml   |none  |acc                 |0.2573|±  |0.0335|
|                                            |       |none  |acc_norm            |0.2573|±  |0.0335|
|-cmmlu_food_science                         |Yaml   |none  |acc                 |0.2238|±  |0.0350|
|                                            |       |none  |acc_norm            |0.2238|±  |0.0350|
|-cmmlu_security_study                       |Yaml   |none  |acc                 |0.2519|±  |0.0375|
|                                            |       |none  |acc_norm            |0.2519|±  |0.0375|
|-cmmlu_high_school_physics                  |Yaml   |none  |acc                 |0.2545|±  |0.0417|
|                                            |       |none  |acc_norm            |0.2545|±  |0.0417|
|-cmmlu_management                           |Yaml   |none  |acc                 |0.2476|±  |0.0299|
|                                            |       |none  |acc_norm            |0.2476|±  |0.0299|
|-cmmlu_professional_accounting              |Yaml   |none  |acc                 |0.2514|±  |0.0329|
|                                            |       |none  |acc_norm            |0.2514|±  |0.0329|
|-cmmlu_human_sexuality                      |Yaml   |none  |acc                 |0.2222|±  |0.0372|
|                                            |       |none  |acc_norm            |0.2222|±  |0.0372|
|-cmmlu_marxist_theory                       |Yaml   |none  |acc                 |0.2487|±  |0.0315|
|                                            |       |none  |acc_norm            |0.2487|±  |0.0315|
|-cmmlu_agronomy                             |Yaml   |none  |acc                 |0.2426|±  |0.0331|
|                                            |       |none  |acc_norm            |0.2426|±  |0.0331|
|-cmmlu_chinese_teacher_qualification        |Yaml   |none  |acc                 |0.2626|±  |0.0330|
|                                            |       |none  |acc_norm            |0.2626|±  |0.0330|
|-cmmlu_genetics                             |Yaml   |none  |acc                 |0.2273|±  |0.0317|
|                                            |       |none  |acc_norm            |0.2273|±  |0.0317|
|-cmmlu_sports_science                       |Yaml   |none  |acc                 |0.2727|±  |0.0348|
|                                            |       |none  |acc_norm            |0.2727|±  |0.0348|
|-cmmlu_elementary_commonsense               |Yaml   |none  |acc                 |0.2424|±  |0.0305|
|                                            |       |none  |acc_norm            |0.2424|±  |0.0305|
|-cmmlu_logical                              |Yaml   |none  |acc                 |0.1951|±  |0.0359|
|                                            |       |none  |acc_norm            |0.1951|±  |0.0359|
|-cmmlu_chinese_history                      |Yaml   |none  |acc                 |0.2508|±  |0.0242|
|                                            |       |none  |acc_norm            |0.2508|±  |0.0242|
|-cmmlu_traditional_chinese_medicine         |Yaml   |none  |acc                 |0.2378|±  |0.0314|
|                                            |       |none  |acc_norm            |0.2378|±  |0.0314|
|-cmmlu_elementary_mathematics               |Yaml   |none  |acc                 |0.2609|±  |0.0290|
|                                            |       |none  |acc_norm            |0.2609|±  |0.0290|
|-cmmlu_nutrition                            |Yaml   |none  |acc                 |0.2552|±  |0.0363|
|                                            |       |none  |acc_norm            |0.2552|±  |0.0363|
|-cmmlu_chinese_foreign_policy               |Yaml   |none  |acc                 |0.1776|±  |0.0371|
|                                            |       |none  |acc_norm            |0.1776|±  |0.0371|
|-cmmlu_journalism                           |Yaml   |none  |acc                 |0.2616|±  |0.0336|
|                                            |       |none  |acc_norm            |0.2616|±  |0.0336|
|-cmmlu_jurisprudence                        |Yaml   |none  |acc                 |0.2506|±  |0.0214|
|                                            |       |none  |acc_norm            |0.2506|±  |0.0214|
|-cmmlu_sociology                            |Yaml   |none  |acc                 |0.2478|±  |0.0288|
|                                            |       |none  |acc_norm            |0.2478|±  |0.0288|
|-cmmlu_college_mathematics                  |Yaml   |none  |acc                 |0.2190|±  |0.0406|
|                                            |       |none  |acc_norm            |0.2190|±  |0.0406|
|-cmmlu_computer_science                     |Yaml   |none  |acc                 |0.2549|±  |0.0306|
|                                            |       |none  |acc_norm            |0.2549|±  |0.0306|
|-cmmlu_conceptual_physics                   |Yaml   |none  |acc                 |0.2517|±  |0.0359|
|                                            |       |none  |acc_norm            |0.2517|±  |0.0359|
|-cmmlu_elementary_chinese                   |Yaml   |none  |acc                 |0.2817|±  |0.0284|
|                                            |       |none  |acc_norm            |0.2817|±  |0.0284|
|-cmmlu_marketing                            |Yaml   |none  |acc                 |0.2500|±  |0.0324|
|                                            |       |none  |acc_norm            |0.2500|±  |0.0324|
|-cmmlu_high_school_chemistry                |Yaml   |none  |acc                 |0.2576|±  |0.0382|
|                                            |       |none  |acc_norm            |0.2576|±  |0.0382|
|-cmmlu_college_law                          |Yaml   |none  |acc                 |0.2315|±  |0.0408|
|                                            |       |none  |acc_norm            |0.2315|±  |0.0408|
|-cmmlu_chinese_driving_rule                 |Yaml   |none  |acc                 |0.2595|±  |0.0384|
|                                            |       |none  |acc_norm            |0.2595|±  |0.0384|
|-cmmlu_clinical_knowledge                   |Yaml   |none  |acc                 |0.2532|±  |0.0283|
|                                            |       |none  |acc_norm            |0.2532|±  |0.0283|
|-cmmlu_education                            |Yaml   |none  |acc                 |0.2761|±  |0.0351|
|                                            |       |none  |acc_norm            |0.2761|±  |0.0351|
|-cmmlu_high_school_mathematics              |Yaml   |none  |acc                 |0.2927|±  |0.0356|
|                                            |       |none  |acc_norm            |0.2927|±  |0.0356|
|-cmmlu_college_actuarial_science            |Yaml   |none  |acc                 |0.2736|±  |0.0435|
|                                            |       |none  |acc_norm            |0.2736|±  |0.0435|
|-cmmlu_arts                                 |Yaml   |none  |acc                 |0.2313|±  |0.0334|
|                                            |       |none  |acc_norm            |0.2313|±  |0.0334|
|-cmmlu_public_relations                     |Yaml   |none  |acc                 |0.2471|±  |0.0328|
|                                            |       |none  |acc_norm            |0.2471|±  |0.0328|
|-cmmlu_college_medicine                     |Yaml   |none  |acc                 |0.2418|±  |0.0260|
|                                            |       |none  |acc_norm            |0.2418|±  |0.0260|
|-cmmlu_economics                            |Yaml   |none  |acc                 |0.2453|±  |0.0342|
|                                            |       |none  |acc_norm            |0.2453|±  |0.0342|
|-cmmlu_elementary_information_and_technology|Yaml   |none  |acc                 |0.2731|±  |0.0289|
|                                            |       |none  |acc_norm            |0.2731|±  |0.0289|
|-cmmlu_anatomy                              |Yaml   |none  |acc                 |0.2432|±  |0.0354|
|                                            |       |none  |acc_norm            |0.2432|±  |0.0354|
|-cmmlu_world_religions                      |Yaml   |none  |acc                 |0.2875|±  |0.0359|
|                                            |       |none  |acc_norm            |0.2875|±  |0.0359|
|-cmmlu_virology                             |Yaml   |none  |acc                 |0.2485|±  |0.0333|
|                                            |       |none  |acc_norm            |0.2485|±  |0.0333|
|-cmmlu_high_school_biology                  |Yaml   |none  |acc                 |0.2485|±  |0.0333|
|                                            |       |none  |acc_norm            |0.2485|±  |0.0333|
|-cmmlu_business_ethics                      |Yaml   |none  |acc                 |0.2584|±  |0.0304|
|                                            |       |none  |acc_norm            |0.2584|±  |0.0304|

|Groups|Version|Filter|       Metric       |Value |   |Stderr|
|------|-------|------|--------------------|-----:|---|------|
|cmmlu |N/A    |none  |acc                 |0.2480|   |      |
|      |       |none  |acc(sample agg)     |0.2494|   |      |
|      |       |none  |acc_norm            |0.2480|   |      |
|      |       |none  |acc_norm(sample agg)|0.2494|   |      |