README_zh-CN.md 10.5 KB
Newer Older
gaotongxiao's avatar
gaotongxiao committed
1
<div align="center">
Tong Gao's avatar
Tong Gao committed
2
3
4
  <img src="docs/zh_cn/_static/image/logo.svg" width="500px"/>
  <br />
  <br />
gaotongxiao's avatar
gaotongxiao committed
5

Hubert's avatar
Hubert committed
6
[![docs](https://readthedocs.org/projects/opencompass/badge)](https://opencompass.readthedocs.io/zh_CN)
Songyang Zhang's avatar
Songyang Zhang committed
7
[![license](https://img.shields.io/github/license/InternLM/opencompass.svg)](https://github.com/open-compass/opencompass/blob/main/LICENSE)
Tong Gao's avatar
Tong Gao committed
8

Hubert's avatar
Hubert committed
9
<!-- [![PyPI](https://badge.fury.io/py/opencompass.svg)](https://pypi.org/project/opencompass/) -->
gaotongxiao's avatar
gaotongxiao committed
10

Tong Gao's avatar
Tong Gao committed
11
[🌐Website](https://opencompass.org.cn/) |
Ezra-Yu's avatar
Ezra-Yu committed
12
[📘Documentation](https://opencompass.readthedocs.io/zh_CN/latest/index.html) |
13
[🛠️Installation](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html#id1) |
Songyang Zhang's avatar
Songyang Zhang committed
14
[🤔Reporting Issues](https://github.com/open-compass/opencompass/issues/new/choose)
gaotongxiao's avatar
gaotongxiao committed
15
16
17
18
19

[English](/README.md) | 简体中文

</div>

20
<p align="center">
21
    👋 加入我们的 <a href="https://discord.gg/KKwfEbFj7U" target="_blank">Discord</a><a href="https://r.vansin.top/?r=opencompass" target="_blank">微信社区</a>
22
23
</p>

Songyang Zhang's avatar
Songyang Zhang committed
24
25
26
## 🧭	欢迎

来到**OpenCompass**
Tong Gao's avatar
Tong Gao committed
27
28
29

就像指南针在我们的旅程中为我们导航一样,我们希望OpenCompass能够帮助你穿越评估大型语言模型的重重迷雾。OpenCompass提供丰富的算法和功能支持,期待OpenCompass能够帮助社区更便捷地对NLP模型的性能进行公平全面的评估。

Songyang Zhang's avatar
Songyang Zhang committed
30
31
> **🔥 注意**<br />
> 我们正式启动 OpenCompass 共建计划,诚邀社区用户为 OpenCompass 提供更具代表性和可信度的客观评测数据集!
Songyang Zhang's avatar
Songyang Zhang committed
32
> 点击 [Issue](https://github.com/open-compass/opencompass/issues/248) 获取更多数据集.
Songyang Zhang's avatar
Songyang Zhang committed
33
34
> 让我们携手共进,打造功能强大易用的大模型评测平台!

Songyang Zhang's avatar
Songyang Zhang committed
35
## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
Leymore's avatar
Leymore committed
36

37
- **\[2023.09.18\]** 我们发布了[长文本评测指引](docs/zh_cn/advanced_guides/longeval.md).🔥🔥🔥.
Songyang Zhang's avatar
Songyang Zhang committed
38
39
40
41
42
- **\[2023.09.08\]** 我们在评测榜单上更新了Baichuan-2/Tigerbot-2/Vicuna-v1.5, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情🔥🔥🔥.
- **\[2023.09.06\]** 欢迎 [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) 团队采用OpenCompass对模型进行系统评估。我们非常感谢社区在提升LLM评估的透明度和可复现性上所做的努力。🔥🔥🔥.
- **\[2023.09.02\]** 我们加入了[Qwen-VL](https://github.com/QwenLM/Qwen-VL)的评测支持。
- **\[2023.08.25\]** 欢迎 [**TigerBot**](https://github.com/TigerResearch/TigerBot) 团队采用OpenCompass对模型进行系统评估。我们非常感谢社区在提升LLM评估的透明度和可复现性上所做的努力。
- **\[2023.08.21\]** [**Lagent**](https://github.com/InternLM/lagent) 正式发布,它是一个轻量级、开源的基于大语言模型的智能体(agent)框架。我们正与Lagent团队紧密合作,推进支持基于Lagent的大模型工具能力评测 !
Songyang Zhang's avatar
Songyang Zhang committed
43
44

> [更多](docs/zh_cn/notes/news.md)
Leymore's avatar
Leymore committed
45

Songyang Zhang's avatar
Songyang Zhang committed
46
## ✨ 介绍
gaotongxiao's avatar
gaotongxiao committed
47

48
49
![image](https://github.com/open-compass/opencompass/assets/22607038/30bcb2e2-3969-4ac5-9f29-ad3f4abb4f3b)

Tong Gao's avatar
Tong Gao committed
50
51
52
53
54
55
56
57
58
59
60
61
62
63
OpenCompass 是面向大模型评测的一站式平台。其主要特点如下:

- **开源可复现**:提供公平、公开、可复现的大模型评测方案

- **全面的能力维度**:五大维度设计,提供 50+ 个数据集约 30 万题的的模型评测方案,全面评估模型能力

- **丰富的模型支持**:已支持 20+ HuggingFace 及 API 模型

- **分布式高效评测**:一行命令实现任务分割和分布式评测,数小时即可完成千亿模型全量评测

- **多样化评测范式**:支持零样本、小样本及思维链评测,结合标准型或对话型提示词模板,轻松激发各种模型最大性能

- **灵活化拓展**:想增加新模型或数据集?想要自定义更高级的任务分割策略,甚至接入新的集群管理系统?OpenCompass 的一切均可轻松扩展!

Songyang Zhang's avatar
Songyang Zhang committed
64
## 📊 性能榜单
Tong Gao's avatar
Tong Gao committed
65
66
67

我们将陆续提供开源模型和API模型的具体性能榜单,请见 [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) 。如需加入评测,请提供模型仓库地址或标准的 API 接口至邮箱  `opencompass@pjlab.org.cn`.

Songyang Zhang's avatar
Songyang Zhang committed
68
<p align="right"><a href="#top">🔝返回顶部</a></p>
Tong Gao's avatar
Tong Gao committed
69

Songyang Zhang's avatar
Songyang Zhang committed
70
## 📖 数据集支持
Tong Gao's avatar
Tong Gao committed
71
72
73
74
75
76
77
78
79
80
81
82
83
84

<table align="center">
  <tbody>
    <tr align="center" valign="bottom">
      <td>
        <b>语言</b>
      </td>
      <td>
        <b>知识</b>
      </td>
      <td>
        <b>推理</b>
      </td>
      <td>
Songyang Zhang's avatar
Songyang Zhang committed
85
        <b>学科</b>
Tong Gao's avatar
Tong Gao committed
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
      </td>
      <td>
        <b>理解</b>
      </td>
    </tr>
    <tr valign="top">
      <td>
<details open>
<summary><b>字词释义</b></summary>

- WiC
- SummEdits

</details>

<details open>
<summary><b>成语习语</b></summary>

- CHID

</details>

<details open>
<summary><b>语义相似度</b></summary>

- AFQMC
- BUSTM

</details>

<details open>
<summary><b>指代消解</b></summary>

- CLUEWSC
- WSC
- WinoGrande

</details>

<details open>
<summary><b>翻译</b></summary>

- Flores

</details>
      </td>
      <td>
<details open>
<summary><b>知识问答</b></summary>

- BoolQ
- CommonSenseQA
- NaturalQuestion
- TrivialQA

</details>

<details open>
<summary><b>多语种问答</b></summary>

- TyDi-QA

</details>
      </td>
      <td>
<details open>
<summary><b>文本蕴含</b></summary>

- CMNLI
- OCNLI
- OCNLI_FC
- AX-b
- AX-g
- CB
- RTE

</details>

<details open>
<summary><b>常识推理</b></summary>

- StoryCloze
- StoryCloze-CN(即将上线)
- COPA
- ReCoRD
- HellaSwag
- PIQA
- SIQA

</details>

<details open>
<summary><b>数学推理</b></summary>

- MATH
- GSM8K

</details>

<details open>
<summary><b>定理应用</b></summary>

- TheoremQA

</details>

<details open>
<summary><b>代码</b></summary>

- HumanEval
- MBPP

</details>

<details open>
<summary><b>综合推理</b></summary>

- BBH

</details>
      </td>
      <td>
<details open>
<summary><b>初中/高中/大学/职业考试</b></summary>

- GAOKAO-2023
- CEval
- AGIEval
- MMLU
- GAOKAO-Bench
216
- CMMLU
Tong Gao's avatar
Tong Gao committed
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
- ARC

</details>
      </td>
      <td>
<details open>
<summary><b>阅读理解</b></summary>

- C3
- CMRC
- DRCD
- MultiRC
- RACE

</details>

<details open>
<summary><b>内容总结</b></summary>

- CSL
- LCSTS
- XSum

</details>

<details open>
<summary><b>内容分析</b></summary>

- EPRSTMT
- LAMBADA
- TNEWS

</details>
      </td>
    </tr>
</td>
    </tr>
  </tbody>
</table>

Songyang Zhang's avatar
Songyang Zhang committed
257
258
259
<p align="right"><a href="#top">🔝返回顶部</a></p>

## 📖 模型支持
gaotongxiao's avatar
gaotongxiao committed
260

Tong Gao's avatar
Tong Gao committed
261
262
263
264
<table align="center">
  <tbody>
    <tr align="center" valign="bottom">
      <td>
Songyang Zhang's avatar
Songyang Zhang committed
265
        <b>开源模型</b>
Tong Gao's avatar
Tong Gao committed
266
267
268
269
      </td>
      <td>
        <b>API 模型</b>
      </td>
Songyang Zhang's avatar
Songyang Zhang committed
270
      <!-- <td>
Tong Gao's avatar
Tong Gao committed
271
        <b>自定义模型</b>
Songyang Zhang's avatar
Songyang Zhang committed
272
      </td> -->
Tong Gao's avatar
Tong Gao committed
273
274
275
    </tr>
    <tr valign="top">
      <td>
gaotongxiao's avatar
gaotongxiao committed
276

Tong Gao's avatar
Tong Gao committed
277
278
279
280
281
282
283
284
285
286
287
288
- LLaMA
- Vicuna
- Alpaca
- Baichuan
- WizardLM
- ChatGLM-6B
- ChatGLM2-6B
- MPT
- Falcon
- TigerBot
- MOSS
- ……
gaotongxiao's avatar
gaotongxiao committed
289

Tong Gao's avatar
Tong Gao committed
290
291
</td>
<td>
gaotongxiao's avatar
gaotongxiao committed
292

Songyang Zhang's avatar
Songyang Zhang committed
293
- OpenAI
Tong Gao's avatar
Tong Gao committed
294
295
296
- Claude (即将推出)
- PaLM (即将推出)
- ……
gaotongxiao's avatar
gaotongxiao committed
297

Tong Gao's avatar
Tong Gao committed
298
</td>
Songyang Zhang's avatar
Songyang Zhang committed
299
<!-- <td>
gaotongxiao's avatar
gaotongxiao committed
300

Tong Gao's avatar
Tong Gao committed
301
302
- GLM
- ……
gaotongxiao's avatar
gaotongxiao committed
303

Songyang Zhang's avatar
Songyang Zhang committed
304
</td> -->
Tong Gao's avatar
Tong Gao committed
305
306
307
</tr>
  </tbody>
</table>
gaotongxiao's avatar
gaotongxiao committed
308

Songyang Zhang's avatar
Songyang Zhang committed
309
## 🛠️ 安装
gaotongxiao's avatar
gaotongxiao committed
310

311
下面展示了快速安装以及准备数据集的步骤。
gaotongxiao's avatar
gaotongxiao committed
312
313

```Python
Tong Gao's avatar
Tong Gao committed
314
conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
gaotongxiao's avatar
gaotongxiao committed
315
conda activate opencompass
Songyang Zhang's avatar
Songyang Zhang committed
316
git clone https://github.com/open-compass/opencompass opencompass
gaotongxiao's avatar
gaotongxiao committed
317
318
319
cd opencompass
pip install -e .
# 下载数据集到 data/ 处
Songyang Zhang's avatar
Songyang Zhang committed
320
wget https://github.com/open-compass/opencompass/releases/download/0.1.1/OpenCompassData.zip
Tong Gao's avatar
Tong Gao committed
321
unzip OpenCompassData.zip
gaotongxiao's avatar
gaotongxiao committed
322
323
```

324
325
有部分第三方功能,如 Humaneval 以及 Llama,可能需要额外步骤才能正常运行,详细步骤请参考[安装指南](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html)

Songyang Zhang's avatar
Songyang Zhang committed
326
327
328
<p align="right"><a href="#top">🔝返回顶部</a></p>

## 🏗️ ️评测
gaotongxiao's avatar
gaotongxiao committed
329

330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
确保按照上述步骤正确安装 OpenCompass 并准备好数据集后,可以通过以下命令评测 LLaMA-7b 模型在 MMLU 和 C-Eval 数据集上的性能:

```bash
python run.py --models hf_llama_7b --datasets mmlu_ppl ceval_ppl
```

OpenCompass 预定义了许多模型和数据集的配置,你可以通过 [工具](./docs/zh_cn/tools.md#ListConfigs) 列出所有可用的模型和数据集配置。

```bash
# 列出所有配置
python tools/list_configs.py
# 列出所有跟 llama 及 mmlu 相关的配置
python tools/list_configs.py llama mmlu
```

你也可以通过命令行去评测其它 HuggingFace 模型。同样以 LLaMA-7b 为例:

```bash
python run.py --datasets ceval_ppl mmlu_ppl \
--hf-path huggyllama/llama-7b \  # HuggingFace 模型地址
--model-kwargs device_map='auto' \  # 构造 model 的参数
--tokenizer-kwargs padding_side='left' truncation='left' use_fast=False \  # 构造 tokenizer 的参数
--max-out-len 100 \  # 最长生成 token 数
--max-seq-len 2048 \  # 模型能接受的最大序列长度
--batch-size 8 \  # 批次大小
--no-batch-padding \  # 不打开 batch padding,通过 for loop 推理,避免精度损失
--num-gpus 1  # 所需 gpu 数
```

通过命令行或配置文件,OpenCompass 还支持评测 API 或自定义模型,以及更多样化的评测策略。请阅读[快速上手](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html#id3)了解如何运行一个评测任务。
360
361

更多教程请查看我们的[文档](https://opencompass.readthedocs.io/zh_CN/latest/index.html)
gaotongxiao's avatar
gaotongxiao committed
362

Songyang Zhang's avatar
Songyang Zhang committed
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
## 🔜 路线图

- [ ] 主观评测
  - [ ] 发布主观评测榜单
  - [ ] 发布主观评测数据集
- [ ] 长文本
  - [ ] 支持广泛的长文本评测集
  - [ ] 发布长文本评测榜单
- [ ] 代码能力
  - [ ] 发布代码能力评测榜单
  - [ ] 提供非Python语言的评测服务
- [ ] 智能体
  - [ ] 支持丰富的智能体方案
  - [ ] 提供智能体评测榜单
- [ ] 鲁棒性
  - [ ] 支持各类攻击方法

380
381
382
383
## 👷‍♂️ 贡献

我们感谢所有的贡献者为改进和提升 OpenCompass 所作出的努力。请参考[贡献指南](https://opencompass.readthedocs.io/zh_CN/latest/notes/contribution_guide.html)来了解参与项目贡献的相关指引。

Songyang Zhang's avatar
Songyang Zhang committed
384
## 🤝 致谢
gaotongxiao's avatar
gaotongxiao committed
385
386
387

该项目部分的代码引用并修改自 [OpenICL](https://github.com/Shark-NLP/OpenICL)

Leymore's avatar
Leymore committed
388
389
该项目部分的数据集和提示词实现修改自 [chain-of-thought-hub](https://github.com/FranxYao/chain-of-thought-hub), [instruct-eval](https://github.com/declare-lab/instruct-eval)

Songyang Zhang's avatar
Songyang Zhang committed
390
## 🖊️ 引用
gaotongxiao's avatar
gaotongxiao committed
391
392
393
394
395

```bibtex
@misc{2023opencompass,
    title={OpenCompass: A Universal Evaluation Platform for Foundation Models},
    author={OpenCompass Contributors},
Songyang Zhang's avatar
Songyang Zhang committed
396
    howpublished = {\url{https://github.com/open-compass/opencompass}},
gaotongxiao's avatar
gaotongxiao committed
397
398
399
    year={2023}
}
```
Songyang Zhang's avatar
Songyang Zhang committed
400
401

<p align="right"><a href="#top">🔝返回顶部</a></p>