[Doc] Update dataset list (#437)

* add new dataset list * add new dataset list * add new dataset list * update * update * update readme --------- Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>

[Doc] Update dataset list (#437)
* add new dataset list * add new dataset list * add new dataset list * update * update * update readme --------- Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
d6261e10 · Leymore · GitHub · dc1b82c3 · d6261e10 · d6261e10
Unverified Commit d6261e10 authored Sep 27, 2023 by Leymore Committed by GitHub Sep 27, 2023
Show whitespace changes
Inline Side-by-side

Showing with 319 additions and 177 deletions

README.md README.md +158 -88

README_zh-CN.md README_zh-CN.md +161 -89

No files found.
--- a/README.md
+++ b/README.md
@@ -34,9 +34,10 @@ Just like a compass guides us on our journey, OpenCompass will guide you through
 ## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
+- **\[2023.09.26\]** We update the leaderboard with [Qwen](https://github.com/QwenLM/Qwen), one of the best-performing open-source models currently available, welcome to our [homepage](https://opencompass.org.cn) for more details. 🔥🔥🔥.
 - **\[2023.09.20\]** We update the leaderboard with [InternLM-20B](https://github.com/InternLM/InternLM), welcome to our [homepage](https://opencompass.org.cn) for more details. 🔥🔥🔥.
- **\[2023.09.19\]** We update the leaderboard with WeMix-LLaMA2-70B/Phi-1.5-1.3B, welcome to our [homepage](https://opencompass.org.cn) for more details. 🔥🔥🔥.
+- **\[2023.09.19\]** We update the leaderboard with WeMix-LLaMA2-70B/Phi-1.5-1.3B, welcome to our [homepage](https://opencompass.org.cn) for more details.
- **\[2023.09.18\]** We have released [long context evaluation guidance](docs/en/advanced_guides/longeval.md). 🔥🔥🔥.
+- **\[2023.09.18\]** We have released [long context evaluation guidance](docs/en/advanced_guides/longeval.md).
 - **\[2023.09.08\]** We update the leaderboard with Baichuan-2/Tigerbot-2/Vicuna-v1.5, welcome to our [homepage](https://opencompass.org.cn) for more details.
 - **\[2023.09.06\]**  [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation.
 - **\[2023.09.02\]** We have supported the evaluation of [Qwen-VL](https://github.com/QwenLM/Qwen-VL) in OpenCompass.
@@ -51,7 +52,7 @@ Just like a compass guides us on our journey, OpenCompass will guide you through
 OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features includes:
- **Comprehensive support for models and datasets**: Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 50+ datasets with about 300,000 questions, comprehensively evaluating the capabilities of the models in five dimensions.
+- **Comprehensive support for models and datasets**: Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 70+ datasets with about 400,000 questions, comprehensively evaluating the capabilities of the models in five dimensions.
 - **Efficient distributed evaluation**: One line command to implement task division and distributed evaluation, completing the full evaluation of billion-scale models in just a few hours.
@@ -67,6 +68,60 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
 <p align="right"><a href="#top">🔝Back to top</a></p>
+## 🛠️ Installation
+Below are the steps for quick installation and datasets preparation.
+```Python
+conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
+conda activate opencompass
+git clone https://github.com/open-compass/opencompass opencompass
+cd opencompass
+pip install -e .
+# Download dataset to data/ folder
+wget https://github.com/open-compass/opencompass/releases/download/0.1.1/OpenCompassData.zip
+unzip OpenCompassData.zip
+```
+Some third-party features, like Humaneval and Llama, may require additional steps to work properly, for detailed steps please refer to the [Installation Guide](https://opencompass.readthedocs.io/en/latest/get_started.html).
+<p align="right"><a href="#top">🔝Back to top</a></p>
+## 🏗️ ️Evaluation
+After ensuring that OpenCompass is installed correctly according to the above steps and the datasets are prepared, you can evaluate the performance of the LLaMA-7b model on the MMLU and C-Eval datasets using the following command:
+```bash
+python run.py --models hf_llama_7b --datasets mmlu_ppl ceval_ppl
+```
+OpenCompass has predefined configurations for many models and datasets. You can list all available model and dataset configurations using the [tools](./docs/en/tools.md#list-configs).
+```bash
+# List all configurations
+python tools/list_configs.py
+# List all configurations related to llama and mmlu
+python tools/list_configs.py llama mmlu
+```
+You can also evaluate other HuggingFace models via command line. Taking LLaMA-7b as an example:
+```bash
+python run.py --datasets ceval_ppl mmlu_ppl \
+--hf-path huggyllama/llama-7b \  # HuggingFace model path
+--model-kwargs device_map='auto' \  # Arguments for model construction
+--tokenizer-kwargs padding_side='left' truncation='left' use_fast=False \  # Arguments for tokenizer construction
+--max-out-len 100 \  # Maximum number of tokens generated
+--max-seq-len 2048 \  # Maximum sequence length the model can accept
+--batch-size 8 \  # Batch size
+--no-batch-padding \  # Don't enable batch padding, infer through for loop to avoid performance loss
+--num-gpus 1  # Number of required GPUs
+```
+Through the command line or configuration files, OpenCompass also supports evaluating APIs or custom models, as well as more diversified evaluation strategies. Please read the [Quick Start](https://opencompass.readthedocs.io/en/latest/get_started.html) to learn how to run an evaluation task.
+<p align="right"><a href="#top">🔝Back to top</a></p>
 ## 📖 Dataset Support
 <table align="center">
@@ -82,10 +137,7 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
        <b>Reasoning</b>
      </td>
      <td>
-        <b>Comprehensive Examination</b>
+        <b>Examination</b>
-      </td>
-      <td>
-        <b>Understanding</b>
      </td>
    </tr>
    <tr valign="top">
@@ -126,24 +178,33 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
 <summary><b>Translation</b></summary>
 - Flores
+- IWSLT2017
 </details>
-      </td>
-      <td>
 <details open>
-<summary><b>Knowledge Question Answering</b></summary>
+<summary><b>Multi-language Question Answering</b></summary>
- BoolQ
+- TyDi-QA
- CommonSenseQA
+- XCOPA
- NaturalQuestion
- TrivialQA
 </details>
 <details open>
-<summary><b>Multi-language Question Answering</b></summary>
+<summary><b>Multi-language Summary</b></summary>
- TyDi-QA
+- XLSum
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>Knowledge Question Answering</b></summary>
+- BoolQ
+- CommonSenseQA
+- NaturalQuestions
+- TriviaQA
 </details>
      </td>
@@ -158,6 +219,7 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
 - AX-g
 - CB
 - RTE
+- ANLI
 </details>
@@ -165,7 +227,6 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
 <summary><b>Commonsense Reasoning</b></summary>
 - StoryCloze
- StoryCloze-CN (coming soon)
 - COPA
 - ReCoRD
 - HellaSwag
@@ -186,14 +247,8 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
 <summary><b>Theorem Application</b></summary>
 - TheoremQA
+- StrategyQA
-</details>
+- SciBench
-<details open>
-<summary><b>Code</b></summary>
- HumanEval
- MBPP
 </details>
@@ -208,17 +263,44 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
 <details open>
 <summary><b>Junior High, High School, University, Professional Examinations</b></summary>
- GAOKAO-2023
+- C-Eval
- CEval
 - AGIEval
 - MMLU
 - GAOKAO-Bench
 - CMMLU
 - ARC
+- Xiezhi
 </details>
+<details open>
+<summary><b>Medical Examinations</b></summary>
+- CMB
+</details>
+      </td>
+    </tr>
+</td>
+    </tr>
+  </tbody>
+  <tbody>
+    <tr align="center" valign="bottom">
+      <td>
+        <b>Understanding</b>
      </td>
      <td>
+        <b>Long Context</b>
+      </td>
+      <td>
+        <b>Safety</b>
+      </td>
+      <td>
+        <b>Code</b>
+      </td>
+    </tr>
+    <tr valign="top">
+      <td>
 <details open>
 <summary><b>Reading Comprehension</b></summary>
@@ -227,6 +309,9 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
 - DRCD
 - MultiRC
 - RACE
+- DROP
+- OpenBookQA
+- SQuAD2.0
 </details>
@@ -236,6 +321,7 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
 - CSL
 - LCSTS
 - XSum
+- SummScreen
 </details>
@@ -246,6 +332,48 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
 - LAMBADA
 - TNEWS
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>Long Context Understanding</b></summary>
+- LEval
+- LongBench
+- GovReports
+- NarrativeQA
+- Qasper
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>Safety</b></summary>
+- CivilComments
+- CrowsPairs
+- CValues
+- JigsawMultilingual
+- TruthfulQA
+</details>
+<details open>
+<summary><b>Robustness</b></summary>
+- AdvGLUE
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>Code</b></summary>
+- HumanEval
+- HumanEvalX
+- MBPP
+- APPs
+- DS1000
 </details>
      </td>
    </tr>
@@ -280,86 +408,28 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
 - Alpaca
 - Baichuan
 - WizardLM
- ChatGLM-6B
+- ChatGLM2
- ChatGLM2-6B
- MPT
 - Falcon
 - TigerBot
- MOSS
+- Qwen
 - ...
 </td>
 <td>
 - OpenAI
- Claude (coming soon)
+- Claude
 - PaLM (coming soon)
 - ……
 </td>
-<!--
- GLM
- ...
-</td> -->
 </tr>
  </tbody>
 </table>
-## 🛠️ Installation
-Below are the steps for quick installation and datasets preparation.
-```Python
-conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
-conda activate opencompass
-git clone https://github.com/open-compass/opencompass opencompass
-cd opencompass
-pip install -e .
-# Download dataset to data/ folder
-wget https://github.com/open-compass/opencompass/releases/download/0.1.1/OpenCompassData.zip
-unzip OpenCompassData.zip
-```
-Some third-party features, like Humaneval and Llama, may require additional steps to work properly, for detailed steps please refer to the [Installation Guide](https://opencompass.readthedocs.io/en/latest/get_started.html).
 <p align="right"><a href="#top">🔝Back to top</a></p>
-## 🏗️ ️Evaluation
-After ensuring that OpenCompass is installed correctly according to the above steps and the datasets are prepared, you can evaluate the performance of the LLaMA-7b model on the MMLU and C-Eval datasets using the following command:
-```bash
-python run.py --models hf_llama_7b --datasets mmlu_ppl ceval_ppl
-```
-OpenCompass has predefined configurations for many models and datasets. You can list all available model and dataset configurations using the [tools](./docs/en/tools.md#list-configs).
-```bash
-# List all configurations
-python tools/list_configs.py
-# List all configurations related to llama and mmlu
-python tools/list_configs.py llama mmlu
-```
-You can also evaluate other HuggingFace models via command line. Taking LLaMA-7b as an example:
-```bash
-python run.py --datasets ceval_ppl mmlu_ppl \
--hf-path huggyllama/llama-7b \  # HuggingFace model path
--model-kwargs device_map='auto' \  # Arguments for model construction
--tokenizer-kwargs padding_side='left' truncation='left' use_fast=False \  # Arguments for tokenizer construction
--max-out-len 100 \  # Maximum number of tokens generated
--max-seq-len 2048 \  # Maximum sequence length the model can accept
--batch-size 8 \  # Batch size
--no-batch-padding \  # Don't enable batch padding, infer through for loop to avoid performance loss
--num-gpus 1  # Number of required GPUs
-```
-Through the command line or configuration files, OpenCompass also supports evaluating APIs or custom models, as well as more diversified evaluation strategies. Please read the [Quick Start](https://opencompass.readthedocs.io/en/latest/get_started.html) to learn how to run an evaluation task.
 ## 🔜 Roadmap
 - [ ] Subjective Evaluation

--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -34,9 +34,10 @@
 ## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
+- **\[2023.09.26\]** 我们在评测榜单上更新了[Qwen](https://github.com/QwenLM/Qwen), 这是目前表现最好的开源模型之一, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.🔥🔥🔥.
 - **\[2023.09.20\]** 我们在评测榜单上更新了[InternLM-20B](https://github.com/InternLM/InternLM), 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.🔥🔥🔥.
- **\[2023.09.19\]** 我们在评测榜单上更新了WeMix-LLaMA2-70B/Phi-1.5-1.3B, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.🔥🔥🔥.
+- **\[2023.09.19\]** 我们在评测榜单上更新了WeMix-LLaMA2-70B/Phi-1.5-1.3B, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.
- **\[2023.09.18\]** 我们发布了[长文本评测指引](docs/zh_cn/advanced_guides/longeval.md).🔥🔥🔥.
+- **\[2023.09.18\]** 我们发布了[长文本评测指引](docs/zh_cn/advanced_guides/longeval.md).
 - **\[2023.09.08\]** 我们在评测榜单上更新了Baichuan-2/Tigerbot-2/Vicuna-v1.5, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情。
 - **\[2023.09.06\]** 欢迎 [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) 团队采用OpenCompass对模型进行系统评估。我们非常感谢社区在提升LLM评估的透明度和可复现性上所做的努力。
 - **\[2023.09.02\]** 我们加入了[Qwen-VL](https://github.com/QwenLM/Qwen-VL)的评测支持。
@@ -53,7 +54,7 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
 - **开源可复现**：提供公平、公开、可复现的大模型评测方案
- **全面的能力维度**：五大维度设计，提供 50+ 个数据集约 30 万题的的模型评测方案，全面评估模型能力
+- **全面的能力维度**：五大维度设计，提供 70+ 个数据集约 40 万题的的模型评测方案，全面评估模型能力
 - **丰富的模型支持**：已支持 20+ HuggingFace 及 API 模型
@@ -69,6 +70,62 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
 <p align="right"><a href="#top">🔝返回顶部</a></p>
+## 🛠️ 安装
+下面展示了快速安装以及准备数据集的步骤。
+```Python
+conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
+conda activate opencompass
+git clone https://github.com/open-compass/opencompass opencompass
+cd opencompass
+pip install -e .
+# 下载数据集到 data/ 处
+wget https://github.com/open-compass/opencompass/releases/download/0.1.1/OpenCompassData.zip
+unzip OpenCompassData.zip
+```
+有部分第三方功能,如 Humaneval 以及 Llama,可能需要额外步骤才能正常运行，详细步骤请参考[安装指南](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html)。
+<p align="right"><a href="#top">🔝返回顶部</a></p>
+## 🏗️ ️评测
+确保按照上述步骤正确安装 OpenCompass 并准备好数据集后，可以通过以下命令评测 LLaMA-7b 模型在 MMLU 和 C-Eval 数据集上的性能：
+```bash
+python run.py --models hf_llama_7b --datasets mmlu_ppl ceval_ppl
+```
+OpenCompass 预定义了许多模型和数据集的配置，你可以通过 [工具](./docs/zh_cn/tools.md#ListConfigs) 列出所有可用的模型和数据集配置。
+```bash
+# 列出所有配置
+python tools/list_configs.py
+# 列出所有跟 llama 及 mmlu 相关的配置
+python tools/list_configs.py llama mmlu
+```
+你也可以通过命令行去评测其它 HuggingFace 模型。同样以 LLaMA-7b 为例：
+```bash
+python run.py --datasets ceval_ppl mmlu_ppl \
+--hf-path huggyllama/llama-7b \  # HuggingFace 模型地址
+--model-kwargs device_map='auto' \  # 构造 model 的参数
+--tokenizer-kwargs padding_side='left' truncation='left' use_fast=False \  # 构造 tokenizer 的参数
+--max-out-len 100 \  # 最长生成 token 数
+--max-seq-len 2048 \  # 模型能接受的最大序列长度
+--batch-size 8 \  # 批次大小
+--no-batch-padding \  # 不打开 batch padding，通过 for loop 推理，避免精度损失
+--num-gpus 1  # 所需 gpu 数
+```
+通过命令行或配置文件，OpenCompass 还支持评测 API 或自定义模型，以及更多样化的评测策略。请阅读[快速上手](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html#id3)了解如何运行一个评测任务。
+更多教程请查看我们的[文档](https://opencompass.readthedocs.io/zh_CN/latest/index.html)。
+<p align="right"><a href="#top">🔝返回顶部</a></p>
 ## 📖 数据集支持
 <table align="center">
@@ -84,10 +141,7 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
        <b>推理</b>
      </td>
      <td>
-        <b>学科</b>
+        <b>考试</b>
-      </td>
-      <td>
-        <b>理解</b>
      </td>
    </tr>
    <tr valign="top">
@@ -128,24 +182,33 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
 <summary><b>翻译</b></summary>
 - Flores
+- IWSLT2017
 </details>
-      </td>
-      <td>
 <details open>
-<summary><b>知识问答</b></summary>
+<summary><b>多语种问答</b></summary>
- BoolQ
+- TyDi-QA
- CommonSenseQA
+- XCOPA
- NaturalQuestion
- TrivialQA
 </details>
 <details open>
-<summary><b>多语种问答</b></summary>
+<summary><b>多语种总结</b></summary>
- TyDi-QA
+- XLSum
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>知识问答</b></summary>
+- BoolQ
+- CommonSenseQA
+- NaturalQuestions
+- TriviaQA
 </details>
      </td>
@@ -160,6 +223,7 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
 - AX-g
 - CB
 - RTE
+- ANLI
 </details>
@@ -167,7 +231,6 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
 <summary><b>常识推理</b></summary>
 - StoryCloze
- StoryCloze-CN（即将上线）
 - COPA
 - ReCoRD
 - HellaSwag
@@ -188,14 +251,8 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
 <summary><b>定理应用</b></summary>
 - TheoremQA
+- StrategyQA
-</details>
+- SciBench
-<details open>
-<summary><b>代码</b></summary>
- HumanEval
- MBPP
 </details>
@@ -210,16 +267,43 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
 <details open>
 <summary><b>初中/高中/大学/职业考试</b></summary>
- GAOKAO-2023
+- C-Eval
- CEval
 - AGIEval
 - MMLU
 - GAOKAO-Bench
 - CMMLU
 - ARC
+- Xiezhi
+</details>
+<details open>
+<summary><b>医学考试</b></summary>
+- CMB
 </details>
      </td>
+    </tr>
+</td>
+    </tr>
+  </tbody>
+  <tbody>
+    <tr align="center" valign="bottom">
+      <td>
+        <b>理解</b>
+      </td>
+      <td>
+        <b>长文本</b>
+      </td>
+      <td>
+        <b>安全</b>
+      </td>
+      <td>
+        <b>代码</b>
+      </td>
+    </tr>
+    <tr valign="top">
      <td>
 <details open>
 <summary><b>阅读理解</b></summary>
@@ -229,6 +313,9 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
 - DRCD
 - MultiRC
 - RACE
+- DROP
+- OpenBookQA
+- SQuAD2.0
 </details>
@@ -238,6 +325,7 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
 - CSL
 - LCSTS
 - XSum
+- SummScreen
 </details>
@@ -248,6 +336,48 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
 - LAMBADA
 - TNEWS
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>长文本理解</b></summary>
+- LEval
+- LongBench
+- GovReports
+- NarrativeQA
+- Qasper
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>安全</b></summary>
+- CivilComments
+- CrowsPairs
+- CValues
+- JigsawMultilingual
+- TruthfulQA
+</details>
+<details open>
+<summary><b>健壮性</b></summary>
+- AdvGLUE
+</details>
+      </td>
+      <td>
+<details open>
+<summary><b>代码</b></summary>
+- HumanEval
+- HumanEvalX
+- MBPP
+- APPs
+- DS1000
 </details>
      </td>
    </tr>
@@ -276,92 +406,34 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
    <tr valign="top">
      <td>
+- InternLM
 - LLaMA
 - Vicuna
 - Alpaca
 - Baichuan
 - WizardLM
- ChatGLM-6B
+- ChatGLM2
- ChatGLM2-6B
- MPT
 - Falcon
 - TigerBot
- MOSS
+- Qwen
 - ……
 </td>
 <td>
 - OpenAI
- Claude (即将推出)
+- Claude
 - PaLM (即将推出)
 - ……
 </td>
-<!-- <td>
- GLM
- ……
-</td> -->
 </tr>
  </tbody>
 </table>
-## 🛠️ 安装
-下面展示了快速安装以及准备数据集的步骤。
-```Python
-conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
-conda activate opencompass
-git clone https://github.com/open-compass/opencompass opencompass
-cd opencompass
-pip install -e .
-# 下载数据集到 data/ 处
-wget https://github.com/open-compass/opencompass/releases/download/0.1.1/OpenCompassData.zip
-unzip OpenCompassData.zip
-```
-有部分第三方功能,如 Humaneval 以及 Llama,可能需要额外步骤才能正常运行，详细步骤请参考[安装指南](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html)。
 <p align="right"><a href="#top">🔝返回顶部</a></p>
-## 🏗️ ️评测
-确保按照上述步骤正确安装 OpenCompass 并准备好数据集后，可以通过以下命令评测 LLaMA-7b 模型在 MMLU 和 C-Eval 数据集上的性能：
-```bash
-python run.py --models hf_llama_7b --datasets mmlu_ppl ceval_ppl
-```
-OpenCompass 预定义了许多模型和数据集的配置，你可以通过 [工具](./docs/zh_cn/tools.md#ListConfigs) 列出所有可用的模型和数据集配置。
-```bash
-# 列出所有配置
-python tools/list_configs.py
-# 列出所有跟 llama 及 mmlu 相关的配置
-python tools/list_configs.py llama mmlu
-```
-你也可以通过命令行去评测其它 HuggingFace 模型。同样以 LLaMA-7b 为例：
-```bash
-python run.py --datasets ceval_ppl mmlu_ppl \
--hf-path huggyllama/llama-7b \  # HuggingFace 模型地址
--model-kwargs device_map='auto' \  # 构造 model 的参数
--tokenizer-kwargs padding_side='left' truncation='left' use_fast=False \  # 构造 tokenizer 的参数
--max-out-len 100 \  # 最长生成 token 数
--max-seq-len 2048 \  # 模型能接受的最大序列长度
--batch-size 8 \  # 批次大小
--no-batch-padding \  # 不打开 batch padding，通过 for loop 推理，避免精度损失
--num-gpus 1  # 所需 gpu 数
-```
-通过命令行或配置文件，OpenCompass 还支持评测 API 或自定义模型，以及更多样化的评测策略。请阅读[快速上手](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html#id3)了解如何运行一个评测任务。
-更多教程请查看我们的[文档](https://opencompass.readthedocs.io/zh_CN/latest/index.html)。
 ## 🔜 路线图
 - [ ] 主观评测