"git@developer.sourcefind.cn:gaoqiong/composable_kernel.git" did not exist on "0b41ca2d9e20ad41dc522942aaffb2ebc8ef9b92"
Unverified Commit d6261e10 authored by Leymore's avatar Leymore Committed by GitHub
Browse files

[Doc] Update dataset list (#437)



* add new dataset list

* add new dataset list

* add new dataset list

* update

* update

* update readme

---------
Co-authored-by: default avatargaotongxiao <gaotongxiao@gmail.com>
parent dc1b82c3
......@@ -34,9 +34,10 @@ Just like a compass guides us on our journey, OpenCompass will guide you through
## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
- **\[2023.09.26\]** We update the leaderboard with [Qwen](https://github.com/QwenLM/Qwen), one of the best-performing open-source models currently available, welcome to our [homepage](https://opencompass.org.cn) for more details. 🔥🔥🔥.
- **\[2023.09.20\]** We update the leaderboard with [InternLM-20B](https://github.com/InternLM/InternLM), welcome to our [homepage](https://opencompass.org.cn) for more details. 🔥🔥🔥.
- **\[2023.09.19\]** We update the leaderboard with WeMix-LLaMA2-70B/Phi-1.5-1.3B, welcome to our [homepage](https://opencompass.org.cn) for more details. 🔥🔥🔥.
- **\[2023.09.18\]** We have released [long context evaluation guidance](docs/en/advanced_guides/longeval.md). 🔥🔥🔥.
- **\[2023.09.19\]** We update the leaderboard with WeMix-LLaMA2-70B/Phi-1.5-1.3B, welcome to our [homepage](https://opencompass.org.cn) for more details.
- **\[2023.09.18\]** We have released [long context evaluation guidance](docs/en/advanced_guides/longeval.md).
- **\[2023.09.08\]** We update the leaderboard with Baichuan-2/Tigerbot-2/Vicuna-v1.5, welcome to our [homepage](https://opencompass.org.cn) for more details.
- **\[2023.09.06\]** [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation.
- **\[2023.09.02\]** We have supported the evaluation of [Qwen-VL](https://github.com/QwenLM/Qwen-VL) in OpenCompass.
......@@ -51,7 +52,7 @@ Just like a compass guides us on our journey, OpenCompass will guide you through
OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features includes:
- **Comprehensive support for models and datasets**: Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 50+ datasets with about 300,000 questions, comprehensively evaluating the capabilities of the models in five dimensions.
- **Comprehensive support for models and datasets**: Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 70+ datasets with about 400,000 questions, comprehensively evaluating the capabilities of the models in five dimensions.
- **Efficient distributed evaluation**: One line command to implement task division and distributed evaluation, completing the full evaluation of billion-scale models in just a few hours.
......@@ -67,6 +68,60 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
<p align="right"><a href="#top">🔝Back to top</a></p>
## 🛠️ Installation
Below are the steps for quick installation and datasets preparation.
```Python
conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
conda activate opencompass
git clone https://github.com/open-compass/opencompass opencompass
cd opencompass
pip install -e .
# Download dataset to data/ folder
wget https://github.com/open-compass/opencompass/releases/download/0.1.1/OpenCompassData.zip
unzip OpenCompassData.zip
```
Some third-party features, like Humaneval and Llama, may require additional steps to work properly, for detailed steps please refer to the [Installation Guide](https://opencompass.readthedocs.io/en/latest/get_started.html).
<p align="right"><a href="#top">🔝Back to top</a></p>
## 🏗️ ️Evaluation
After ensuring that OpenCompass is installed correctly according to the above steps and the datasets are prepared, you can evaluate the performance of the LLaMA-7b model on the MMLU and C-Eval datasets using the following command:
```bash
python run.py --models hf_llama_7b --datasets mmlu_ppl ceval_ppl
```
OpenCompass has predefined configurations for many models and datasets. You can list all available model and dataset configurations using the [tools](./docs/en/tools.md#list-configs).
```bash
# List all configurations
python tools/list_configs.py
# List all configurations related to llama and mmlu
python tools/list_configs.py llama mmlu
```
You can also evaluate other HuggingFace models via command line. Taking LLaMA-7b as an example:
```bash
python run.py --datasets ceval_ppl mmlu_ppl \
--hf-path huggyllama/llama-7b \ # HuggingFace model path
--model-kwargs device_map='auto' \ # Arguments for model construction
--tokenizer-kwargs padding_side='left' truncation='left' use_fast=False \ # Arguments for tokenizer construction
--max-out-len 100 \ # Maximum number of tokens generated
--max-seq-len 2048 \ # Maximum sequence length the model can accept
--batch-size 8 \ # Batch size
--no-batch-padding \ # Don't enable batch padding, infer through for loop to avoid performance loss
--num-gpus 1 # Number of required GPUs
```
Through the command line or configuration files, OpenCompass also supports evaluating APIs or custom models, as well as more diversified evaluation strategies. Please read the [Quick Start](https://opencompass.readthedocs.io/en/latest/get_started.html) to learn how to run an evaluation task.
<p align="right"><a href="#top">🔝Back to top</a></p>
## 📖 Dataset Support
<table align="center">
......@@ -82,10 +137,7 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
<b>Reasoning</b>
</td>
<td>
<b>Comprehensive Examination</b>
</td>
<td>
<b>Understanding</b>
<b>Examination</b>
</td>
</tr>
<tr valign="top">
......@@ -126,24 +178,33 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
<summary><b>Translation</b></summary>
- Flores
- IWSLT2017
</details>
</td>
<td>
<details open>
<summary><b>Knowledge Question Answering</b></summary>
<summary><b>Multi-language Question Answering</b></summary>
- BoolQ
- CommonSenseQA
- NaturalQuestion
- TrivialQA
- TyDi-QA
- XCOPA
</details>
<details open>
<summary><b>Multi-language Question Answering</b></summary>
<summary><b>Multi-language Summary</b></summary>
- TyDi-QA
- XLSum
</details>
</td>
<td>
<details open>
<summary><b>Knowledge Question Answering</b></summary>
- BoolQ
- CommonSenseQA
- NaturalQuestions
- TriviaQA
</details>
</td>
......@@ -158,6 +219,7 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
- AX-g
- CB
- RTE
- ANLI
</details>
......@@ -165,7 +227,6 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
<summary><b>Commonsense Reasoning</b></summary>
- StoryCloze
- StoryCloze-CN (coming soon)
- COPA
- ReCoRD
- HellaSwag
......@@ -186,14 +247,8 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
<summary><b>Theorem Application</b></summary>
- TheoremQA
</details>
<details open>
<summary><b>Code</b></summary>
- HumanEval
- MBPP
- StrategyQA
- SciBench
</details>
......@@ -208,17 +263,44 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
<details open>
<summary><b>Junior High, High School, University, Professional Examinations</b></summary>
- GAOKAO-2023
- CEval
- C-Eval
- AGIEval
- MMLU
- GAOKAO-Bench
- CMMLU
- ARC
- Xiezhi
</details>
<details open>
<summary><b>Medical Examinations</b></summary>
- CMB
</details>
</td>
</tr>
</td>
</tr>
</tbody>
<tbody>
<tr align="center" valign="bottom">
<td>
<b>Understanding</b>
</td>
<td>
<b>Long Context</b>
</td>
<td>
<b>Safety</b>
</td>
<td>
<b>Code</b>
</td>
</tr>
<tr valign="top">
<td>
<details open>
<summary><b>Reading Comprehension</b></summary>
......@@ -227,6 +309,9 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
- DRCD
- MultiRC
- RACE
- DROP
- OpenBookQA
- SQuAD2.0
</details>
......@@ -236,6 +321,7 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
- CSL
- LCSTS
- XSum
- SummScreen
</details>
......@@ -246,6 +332,48 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
- LAMBADA
- TNEWS
</details>
</td>
<td>
<details open>
<summary><b>Long Context Understanding</b></summary>
- LEval
- LongBench
- GovReports
- NarrativeQA
- Qasper
</details>
</td>
<td>
<details open>
<summary><b>Safety</b></summary>
- CivilComments
- CrowsPairs
- CValues
- JigsawMultilingual
- TruthfulQA
</details>
<details open>
<summary><b>Robustness</b></summary>
- AdvGLUE
</details>
</td>
<td>
<details open>
<summary><b>Code</b></summary>
- HumanEval
- HumanEvalX
- MBPP
- APPs
- DS1000
</details>
</td>
</tr>
......@@ -280,86 +408,28 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
- Alpaca
- Baichuan
- WizardLM
- ChatGLM-6B
- ChatGLM2-6B
- MPT
- ChatGLM2
- Falcon
- TigerBot
- MOSS
- Qwen
- ...
</td>
<td>
- OpenAI
- Claude (coming soon)
- Claude
- PaLM (coming soon)
- ……
</td>
<!--
- GLM
- ...
</td> -->
</tr>
</tbody>
</table>
## 🛠️ Installation
Below are the steps for quick installation and datasets preparation.
```Python
conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
conda activate opencompass
git clone https://github.com/open-compass/opencompass opencompass
cd opencompass
pip install -e .
# Download dataset to data/ folder
wget https://github.com/open-compass/opencompass/releases/download/0.1.1/OpenCompassData.zip
unzip OpenCompassData.zip
```
Some third-party features, like Humaneval and Llama, may require additional steps to work properly, for detailed steps please refer to the [Installation Guide](https://opencompass.readthedocs.io/en/latest/get_started.html).
<p align="right"><a href="#top">🔝Back to top</a></p>
## 🏗️ ️Evaluation
After ensuring that OpenCompass is installed correctly according to the above steps and the datasets are prepared, you can evaluate the performance of the LLaMA-7b model on the MMLU and C-Eval datasets using the following command:
```bash
python run.py --models hf_llama_7b --datasets mmlu_ppl ceval_ppl
```
OpenCompass has predefined configurations for many models and datasets. You can list all available model and dataset configurations using the [tools](./docs/en/tools.md#list-configs).
```bash
# List all configurations
python tools/list_configs.py
# List all configurations related to llama and mmlu
python tools/list_configs.py llama mmlu
```
You can also evaluate other HuggingFace models via command line. Taking LLaMA-7b as an example:
```bash
python run.py --datasets ceval_ppl mmlu_ppl \
--hf-path huggyllama/llama-7b \ # HuggingFace model path
--model-kwargs device_map='auto' \ # Arguments for model construction
--tokenizer-kwargs padding_side='left' truncation='left' use_fast=False \ # Arguments for tokenizer construction
--max-out-len 100 \ # Maximum number of tokens generated
--max-seq-len 2048 \ # Maximum sequence length the model can accept
--batch-size 8 \ # Batch size
--no-batch-padding \ # Don't enable batch padding, infer through for loop to avoid performance loss
--num-gpus 1 # Number of required GPUs
```
Through the command line or configuration files, OpenCompass also supports evaluating APIs or custom models, as well as more diversified evaluation strategies. Please read the [Quick Start](https://opencompass.readthedocs.io/en/latest/get_started.html) to learn how to run an evaluation task.
## 🔜 Roadmap
- [ ] Subjective Evaluation
......
......@@ -34,9 +34,10 @@
## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
- **\[2023.09.26\]** 我们在评测榜单上更新了[Qwen](https://github.com/QwenLM/Qwen), 这是目前表现最好的开源模型之一, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.🔥🔥🔥.
- **\[2023.09.20\]** 我们在评测榜单上更新了[InternLM-20B](https://github.com/InternLM/InternLM), 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.🔥🔥🔥.
- **\[2023.09.19\]** 我们在评测榜单上更新了WeMix-LLaMA2-70B/Phi-1.5-1.3B, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.🔥🔥🔥.
- **\[2023.09.18\]** 我们发布了[长文本评测指引](docs/zh_cn/advanced_guides/longeval.md).🔥🔥🔥.
- **\[2023.09.19\]** 我们在评测榜单上更新了WeMix-LLaMA2-70B/Phi-1.5-1.3B, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.
- **\[2023.09.18\]** 我们发布了[长文本评测指引](docs/zh_cn/advanced_guides/longeval.md).
- **\[2023.09.08\]** 我们在评测榜单上更新了Baichuan-2/Tigerbot-2/Vicuna-v1.5, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情。
- **\[2023.09.06\]** 欢迎 [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) 团队采用OpenCompass对模型进行系统评估。我们非常感谢社区在提升LLM评估的透明度和可复现性上所做的努力。
- **\[2023.09.02\]** 我们加入了[Qwen-VL](https://github.com/QwenLM/Qwen-VL)的评测支持。
......@@ -53,7 +54,7 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
- **开源可复现**:提供公平、公开、可复现的大模型评测方案
- **全面的能力维度**:五大维度设计,提供 50+ 个数据集约 30 万题的的模型评测方案,全面评估模型能力
- **全面的能力维度**:五大维度设计,提供 70+ 个数据集约 40 万题的的模型评测方案,全面评估模型能力
- **丰富的模型支持**:已支持 20+ HuggingFace 及 API 模型
......@@ -69,6 +70,62 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
<p align="right"><a href="#top">🔝返回顶部</a></p>
## 🛠️ 安装
下面展示了快速安装以及准备数据集的步骤。
```Python
conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
conda activate opencompass
git clone https://github.com/open-compass/opencompass opencompass
cd opencompass
pip install -e .
# 下载数据集到 data/ 处
wget https://github.com/open-compass/opencompass/releases/download/0.1.1/OpenCompassData.zip
unzip OpenCompassData.zip
```
有部分第三方功能,如 Humaneval 以及 Llama,可能需要额外步骤才能正常运行,详细步骤请参考[安装指南](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html)
<p align="right"><a href="#top">🔝返回顶部</a></p>
## 🏗️ ️评测
确保按照上述步骤正确安装 OpenCompass 并准备好数据集后,可以通过以下命令评测 LLaMA-7b 模型在 MMLU 和 C-Eval 数据集上的性能:
```bash
python run.py --models hf_llama_7b --datasets mmlu_ppl ceval_ppl
```
OpenCompass 预定义了许多模型和数据集的配置,你可以通过 [工具](./docs/zh_cn/tools.md#ListConfigs) 列出所有可用的模型和数据集配置。
```bash
# 列出所有配置
python tools/list_configs.py
# 列出所有跟 llama 及 mmlu 相关的配置
python tools/list_configs.py llama mmlu
```
你也可以通过命令行去评测其它 HuggingFace 模型。同样以 LLaMA-7b 为例:
```bash
python run.py --datasets ceval_ppl mmlu_ppl \
--hf-path huggyllama/llama-7b \ # HuggingFace 模型地址
--model-kwargs device_map='auto' \ # 构造 model 的参数
--tokenizer-kwargs padding_side='left' truncation='left' use_fast=False \ # 构造 tokenizer 的参数
--max-out-len 100 \ # 最长生成 token 数
--max-seq-len 2048 \ # 模型能接受的最大序列长度
--batch-size 8 \ # 批次大小
--no-batch-padding \ # 不打开 batch padding,通过 for loop 推理,避免精度损失
--num-gpus 1 # 所需 gpu 数
```
通过命令行或配置文件,OpenCompass 还支持评测 API 或自定义模型,以及更多样化的评测策略。请阅读[快速上手](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html#id3)了解如何运行一个评测任务。
更多教程请查看我们的[文档](https://opencompass.readthedocs.io/zh_CN/latest/index.html)
<p align="right"><a href="#top">🔝返回顶部</a></p>
## 📖 数据集支持
<table align="center">
......@@ -84,10 +141,7 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
<b>推理</b>
</td>
<td>
<b>学科</b>
</td>
<td>
<b>理解</b>
<b>考试</b>
</td>
</tr>
<tr valign="top">
......@@ -128,24 +182,33 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
<summary><b>翻译</b></summary>
- Flores
- IWSLT2017
</details>
</td>
<td>
<details open>
<summary><b>知识问答</b></summary>
<summary><b>多语种问答</b></summary>
- BoolQ
- CommonSenseQA
- NaturalQuestion
- TrivialQA
- TyDi-QA
- XCOPA
</details>
<details open>
<summary><b>多语种问答</b></summary>
<summary><b>多语种总结</b></summary>
- TyDi-QA
- XLSum
</details>
</td>
<td>
<details open>
<summary><b>知识问答</b></summary>
- BoolQ
- CommonSenseQA
- NaturalQuestions
- TriviaQA
</details>
</td>
......@@ -160,6 +223,7 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
- AX-g
- CB
- RTE
- ANLI
</details>
......@@ -167,7 +231,6 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
<summary><b>常识推理</b></summary>
- StoryCloze
- StoryCloze-CN(即将上线)
- COPA
- ReCoRD
- HellaSwag
......@@ -188,14 +251,8 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
<summary><b>定理应用</b></summary>
- TheoremQA
</details>
<details open>
<summary><b>代码</b></summary>
- HumanEval
- MBPP
- StrategyQA
- SciBench
</details>
......@@ -210,16 +267,43 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
<details open>
<summary><b>初中/高中/大学/职业考试</b></summary>
- GAOKAO-2023
- CEval
- C-Eval
- AGIEval
- MMLU
- GAOKAO-Bench
- CMMLU
- ARC
- Xiezhi
</details>
<details open>
<summary><b>医学考试</b></summary>
- CMB
</details>
</td>
</tr>
</td>
</tr>
</tbody>
<tbody>
<tr align="center" valign="bottom">
<td>
<b>理解</b>
</td>
<td>
<b>长文本</b>
</td>
<td>
<b>安全</b>
</td>
<td>
<b>代码</b>
</td>
</tr>
<tr valign="top">
<td>
<details open>
<summary><b>阅读理解</b></summary>
......@@ -229,6 +313,9 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
- DRCD
- MultiRC
- RACE
- DROP
- OpenBookQA
- SQuAD2.0
</details>
......@@ -238,6 +325,7 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
- CSL
- LCSTS
- XSum
- SummScreen
</details>
......@@ -248,6 +336,48 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
- LAMBADA
- TNEWS
</details>
</td>
<td>
<details open>
<summary><b>长文本理解</b></summary>
- LEval
- LongBench
- GovReports
- NarrativeQA
- Qasper
</details>
</td>
<td>
<details open>
<summary><b>安全</b></summary>
- CivilComments
- CrowsPairs
- CValues
- JigsawMultilingual
- TruthfulQA
</details>
<details open>
<summary><b>健壮性</b></summary>
- AdvGLUE
</details>
</td>
<td>
<details open>
<summary><b>代码</b></summary>
- HumanEval
- HumanEvalX
- MBPP
- APPs
- DS1000
</details>
</td>
</tr>
......@@ -276,92 +406,34 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
<tr valign="top">
<td>
- InternLM
- LLaMA
- Vicuna
- Alpaca
- Baichuan
- WizardLM
- ChatGLM-6B
- ChatGLM2-6B
- MPT
- ChatGLM2
- Falcon
- TigerBot
- MOSS
- Qwen
- ……
</td>
<td>
- OpenAI
- Claude (即将推出)
- Claude
- PaLM (即将推出)
- ……
</td>
<!-- <td>
- GLM
- ……
</td> -->
</tr>
</tbody>
</table>
## 🛠️ 安装
下面展示了快速安装以及准备数据集的步骤。
```Python
conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
conda activate opencompass
git clone https://github.com/open-compass/opencompass opencompass
cd opencompass
pip install -e .
# 下载数据集到 data/ 处
wget https://github.com/open-compass/opencompass/releases/download/0.1.1/OpenCompassData.zip
unzip OpenCompassData.zip
```
有部分第三方功能,如 Humaneval 以及 Llama,可能需要额外步骤才能正常运行,详细步骤请参考[安装指南](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html)
<p align="right"><a href="#top">🔝返回顶部</a></p>
## 🏗️ ️评测
确保按照上述步骤正确安装 OpenCompass 并准备好数据集后,可以通过以下命令评测 LLaMA-7b 模型在 MMLU 和 C-Eval 数据集上的性能:
```bash
python run.py --models hf_llama_7b --datasets mmlu_ppl ceval_ppl
```
OpenCompass 预定义了许多模型和数据集的配置,你可以通过 [工具](./docs/zh_cn/tools.md#ListConfigs) 列出所有可用的模型和数据集配置。
```bash
# 列出所有配置
python tools/list_configs.py
# 列出所有跟 llama 及 mmlu 相关的配置
python tools/list_configs.py llama mmlu
```
你也可以通过命令行去评测其它 HuggingFace 模型。同样以 LLaMA-7b 为例:
```bash
python run.py --datasets ceval_ppl mmlu_ppl \
--hf-path huggyllama/llama-7b \ # HuggingFace 模型地址
--model-kwargs device_map='auto' \ # 构造 model 的参数
--tokenizer-kwargs padding_side='left' truncation='left' use_fast=False \ # 构造 tokenizer 的参数
--max-out-len 100 \ # 最长生成 token 数
--max-seq-len 2048 \ # 模型能接受的最大序列长度
--batch-size 8 \ # 批次大小
--no-batch-padding \ # 不打开 batch padding,通过 for loop 推理,避免精度损失
--num-gpus 1 # 所需 gpu 数
```
通过命令行或配置文件,OpenCompass 还支持评测 API 或自定义模型,以及更多样化的评测策略。请阅读[快速上手](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html#id3)了解如何运行一个评测任务。
更多教程请查看我们的[文档](https://opencompass.readthedocs.io/zh_CN/latest/index.html)
## 🔜 路线图
- [ ] 主观评测
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment