update docs (#14)

* update docs * update docs * update docs

update docs (#14)
* update docs * update docs * update docs
83dac269 · Ezra-Yu · GitHub · 86d5ec3d · 83dac269 · 83dac269
Unverified Commit 83dac269 authored Jul 06, 2023 by Ezra-Yu Committed by GitHub Jul 06, 2023
14 changed files
--- a/configs/eval_llama_7b.py
+++ b/configs/eval_llama_7b.py
+from mmengine.config import read_base
+
+with read_base():
+    from .datasets.piqa.piqa_ppl import piqa_datasets
+    from .datasets.siqa.siqa_gen import siqa_datasets
+
+
+datasets = [*piqa_datasets, *siqa_datasets]
+
+from opencompass.models import HuggingFaceCausalLM
+
+models = [
+    dict(
+        type=HuggingFaceCausalLM,
+        path='huggyllama/llama-7b',
+        tokenizer_path='huggyllama/llama-7b',
+        tokenizer_kwargs=dict(padding_side='left', truncation_side='left'),
+        max_seq_len=2048,
+        abbr='llama-7b',            
+        max_out_len=100,            
+        batch_size=16,
+        run_cfg=dict(num_gpus=1),   
+    )
+]
\ No newline at end of file
--- a/docs/en/advanced_guides/new_dataset.md
+++ b/docs/en/advanced_guides/new_dataset.md
 # New Dataset
+
+Coming soon.
--- a/docs/en/advanced_guides/new_model.md
+++ b/docs/en/advanced_guides/new_model.md
 # New A Model
+
+Coming soon.
--- a/docs/en/get_started.md
+++ b/docs/en/get_started.md
@@ -21,7 +21,7 @@ pip install -e .

 If you want to perform evaluations on the humaneval dataset, follow these steps.

-```
+```bash
 git clone https://github.com/openai/human-eval.git
 cd human-eval
 pip install -r requirements.txt
@@ -39,11 +39,30 @@ resources that meet the minimum requirements for LLaMA-7B.

 ## Prepare the Dataset

-Create a `data` folder in the repository directory and place the dataset files in the `data` folder.
+To start a simple evaluation task using OpenCompass, you generally need to follow three steps:
+
+1. **Prepare dataset configurations** - [`configs/datasets`](https://github.com/open-mmlab/OpenCompass/tree/main/configs/datasets) provides over 50 datasets supported by OpenCompass.
+2. **Prepare model configurations** - The [`configs/models`](https://github.com/open-mmlab/OpenCompass/tree/main/configs/models) contains sample configuration files for already supported large models including those based on HuggingFace and similar APIs like ChatGPT.
+3. **Use the 'run' script to launch** - Supported commands include running locally or on Slurm, testing multiple datasets and models at once.
+
+In this example, we will demonstrate how to test the performance of pre-trained base models from LLaMA-7B on two benchmark tasks, SIQA and PIQA. Before proceeding, ensure that you have installed OpenCompass and have access to sufficient computing resources with GPU support that meet the minimum requirements for LLaMA-7B.
+
+To initiate the evaluation task on your local machine, use the following command:
+
+```bash
+python run.py configs/eval_llama_7b.py --debug
+```
+
+Here's a detailed step-by-step explanation of this case study:
+
+## Step by step
+
+<details>
+<summary>prepare datasets</summary>

-## Prepare the Evaluation Configuration File
+The SiQA and PiQA benchmarks can be automatically downloaded through their respective links here and here, so no manual downloading is required here. However, some other datasets may require manual downloads. Please refer to the documentation [Prepare Datasets](docs/zh_cn/user_guides/dataset_prepare.md) for more information.

-Create the following configuration file `configs/llama.py`:
+Create a '.py' configuration file and add the following content:

 ```python
 from mmengine.config import read_base
@@ -55,12 +74,20 @@ with read_base():

 # Concatenate the datasets to be evaluated into the datasets field
 datasets = [*piqa_datasets, *siqa_datasets]
+```
+
+</details>
+
+<details>
+<summary>prepare models</summary>

+The pretrained model 'huggyllama/llama-7b' from HuggingFace supports automatic downloading. Add the following line to your configuration file:
+
+```python
 # Evaluate models supported by HuggingFace's `AutoModelForCausalLM` using `HuggingFaceCausalLM`
 from opencompass.models import HuggingFaceCausalLM

-models = [
-    dict(
+llama_7b = dict(
        type=HuggingFaceCausalLM,
        # Initialization parameters for `HuggingFaceCausalLM`
        path='huggyllama/llama-7b',
@@ -73,10 +100,14 @@ models = [
        batch_size=16,
        run_cfg=dict(num_gpus=1),   # Run configuration for specifying resource requirements
    )
-]
+
+models = [llama_7b]
 ```

-## Start the Evaluation
+</details>
+
+<details>
+<summary>Lauch Evalution</summary>

 First, we can start the task in **debug mode** to check for any exceptions in model loading, dataset reading, or incorrect cache usage.

@@ -109,6 +140,8 @@ If you are not performing the evaluation on your local machine but using a Slurm
 - `--partition my_part`: Slurm cluster partition.
 - `--retry 2`: Number of retries for failed tasks.

+</details>
+
 ## Obtaining Evaluation Results

 After the evaluation is complete, the evaluation results table will be printed as follows:
@@ -120,4 +153,26 @@ piqa       1cf9f0     accuracy  ppl          77.75
 siqa       e78df3     accuracy  gen          36.08
 ```

-Additionally, the text and CSV format result files will be saved in the `summary` folder of the result directory.
+All run outputs will default to `outputs/default/` directory with following structure:
+
+```markdown
+outputs/default/
+├── 20200220_120000
+├── ...
+├── 20230220_183030
+│   ├── configs
+│   ├── logs
+│   │   ├── eval
+│   │   └── infer
+│   ├── predictions
+│   │   └── MODEL1
+│   └── results
+│       └── MODEL1
+```
+
+Inside each timestamp folder there would be below items:
+
+- configs folder, used for storing configuration files corresponding to this output dir using current time stamp;
+- logs folder, used for storing inference and evaluation log files of different models;
+- predictions folder, used for storing inference json result file(s), grouped by model;
+- results folder, used for storing evaluation json result file(s), grouped by model.
--- a/docs/en/index.rst
+++ b/docs/en/index.rst
@@ -7,15 +7,15 @@ Hands-on Roadmap of OpenCompass
 To help users quickly utilize OpenCompass, we recommend following the hands-on
 roadmap we have created for the library:

-    - For users who want to use OpenCompass, we recommend reading the GetStarted_ section first to set up the environment.
+   - For users who want to use OpenCompass, we recommend reading the GetStarted_ section first to set up the environment.

-    - For some basic usage, we suggest users read the UserGuides_.
+   - For some basic usage, we suggest users read the UserGuides_.

-    - If you want to customize the algorithm, we have provided the AdvancedGuides_.
+   - If you want to adjust the prompts, you can browse the Prompt_.

-    - If you want to adjust the prompts, you can browse the Prompt_.
+   - If you want to customize the algorithm, we have provided the AdvancedGuides_.

-    - We also offer the Tools_.
+   - We also offer the Tools_.

 We always welcome *PRs* and *Issues* for the betterment of MMPretrain.

@@ -31,13 +31,11 @@ We always welcome *PRs* and *Issues* for the betterment of MMPretrain.
   :maxdepth: 1
   :caption: UserGuides

-   user_guides/framework_overview.md
   user_guides/config.md
   user_guides/dataset_prepare.md
   user_guides/models.md
   user_guides/evaluation.md
   user_guides/experimentation.md
-   user_guides/metrics.md

 .. _AdvancedGuides:
 .. toctree::
@@ -52,7 +50,6 @@ We always welcome *PRs* and *Issues* for the betterment of MMPretrain.
   :maxdepth: 1
   :caption: Prompt

-   prompt/overview.md
   prompt/few_shot.md
   prompt/prompt_template.md
   prompt/meta_template.md

--- a/docs/en/prompt/few_shot.md
+++ b/docs/en/prompt/few_shot.md
-# In-context Learning
\ No newline at end of file
+# In-context Learning
+
+Coming soon.
--- a/docs/en/prompt/meta_template.md
+++ b/docs/en/prompt/meta_template.md
-# Meta-Prompt
\ No newline at end of file
+# Meta-Prompt
+
+Coming soon.
--- a/docs/en/prompt/prompt_template.md
+++ b/docs/en/prompt/prompt_template.md
-# Prompt Template
\ No newline at end of file
+# Prompt Template
+
+Coming soon.
--- a/docs/zh_cn/get_started.md
+++ b/docs/zh_cn/get_started.md
@@ -19,9 +19,9 @@ pip install -e .

 3. 安装 humaneval（可选）

-如果你希望在 humaneval 数据集上进行评估，请执行此步骤。
+如果你需要在 humaneval 数据集上进行评估，请执行此步骤，否则忽略这一步。

-```
+```bash
 git clone https://github.com/openai/human-eval.git
 cd human-eval
 pip install -r requirements.txt
@@ -33,33 +33,58 @@ cd ..

 # 快速上手

-在这一节，我们会以测试 LLaMA-7B 在 SIQA 和 PIQA 上的性能为例，带领你熟悉 OpenCompass 的一些基本功能。在运行前，
+启动一个简单评测任务一般需要三个步骤：
+
+1. **准备数据集及其配置**， [`configs/datasets`](https://github.com/open-mmlab/OpenCompass/tree/main/configs/datasets) 提供了 OpenCompass 已经支持的 50 多种数据集。
+2. **准备模型配置**，[`configs/models`](https://github.com/open-mmlab/OpenCompass/tree/main/configs/models) 提供已经支持的大模型样例， 包括基于 HuggingFace 的模型以及类似 ChatGPT 的 API 模型。
+3. **使用 `run` 脚本启动**， 支持一行命令在本地或者 slurm 上启动评测，支持一次测试多个数据集，多个模型。
+
+我们会以测试 LLaMA-7B 预训练基座模型在 SIQA 和 PIQA 上的性能为例，带领你熟悉 OpenCompass 的一些基本功能。在运行前，
 请先确保你安装好了 OpenCompass，并在本机或集群上有满足 LLaMA-7B 最低要求的 GPU 计算资源。

-## 准备数据集
+使用以下命令在本地启动评测任务(运行中需要联网自动下载数据集和模型，模型下载较慢)：

-在仓库目录创建 data 文件夹，并将数据集文件放置在 data 文件夹中
+```bash
+python run.py configs/eval_llama_7b.py --debug
+```

-## 准备评测配置文件
+下面是这个案例的详细步骤解释。

-创建如下配置文件 `configs/llama.py`:
+## 详细步骤
+
+<details>
+<summary>准备数据集及其配置</summary>
+
+因为 [siqa](https://huggingface.co/datasets/siqa)， [piqa](https://huggingface.co/datasets/piqa) 支持自动下载，所以这里不需要手动下载数据集，但有部分数据集可能需要手动下载，详细查看文档 [准备数据集](docs/zh_cn/user_guides/dataset_prepare.md).
+
+创建一个 '.py' 配置文件， 添加以下内容：

 ```python
-from mmengine.config import read_base
+from mmengine.config import read_base                # 使用 mmengine 的 config 机制

 with read_base():
    # 直接从预设数据集配置中读取需要的数据集配置
    from .datasets.piqa.piqa_ppl import piqa_datasets
    from .datasets.siqa.siqa_gen import siqa_datasets
+                                          
+datasets = [*piqa_datasets, *siqa_datasets]          # 最后 config 需要包含所需的评测数据集列表 datasets
+```
+
+[configs/datasets](https://github.com/InternLM/OpenCompass/blob/main/configs/datasets) 包含各种数据集预先定义好的配置文件，如 [piqa](https://github.com/InternLM/OpenCompass/blob/main/configs/) 文件夹下有不同 Prompt 版本的 piqa 定义，其中 `ppl` 表示使用判别式评测， `gen` 表示使用生成式评测。[configs/datasets/collections](https://github.com/InternLM/OpenCompass/blob/main/configs/datasets/collections) 存放了各类数据集集合，方便做综合评测。

-# 将需要评测的数据集拼接成 datasets 字段
-datasets = [*piqa_datasets, *siqa_datasets]
+</details>

-# 使用 HuggingFaceCausalLM 评测 HuggingFace 中 AutoModelForCausalLM 支持的模型
-from opencompass.models import HuggingFaceCausalLM
+<details>
+<summary>准备模型</summary>

-models = [
-    dict(
+[configs/models](https://github.com/InternLM/OpenCompass/blob/main/configs/models) 包含多种已经支持的模型案案例，如 gpt3.5, hf_llama 等。
+
+HuggingFace 中的 'huggyllama/llama-7b' 支持自动下载，在配置文件中添加以下内容：
+
+```python
+from opencompass.models import HuggingFaceCausalLM    # 提供直接使用 HuggingFaceCausalLM 模型的接口
+
+llama_7b = dict(
        type=HuggingFaceCausalLM,
        # 以下参数为 HuggingFaceCausalLM 的初始化参数
        path='huggyllama/llama-7b',
@@ -72,21 +97,25 @@ models = [
        batch_size=16,              # 批次大小
        run_cfg=dict(num_gpus=1),   # 运行配置，用于指定资源需求
    )
-]
+ 
+models = [llama_7b]                                     # 最后 config 需要包含所需的模型列表 models
 ```

-## 启动评测
+</details>
+
+<details>
+<summary>启动评测</summary>

 首先，我们可以使用 debug 模式启动任务，以检查模型加载、数据集读取是否出现异常，如未正确读取缓存等。

 ```shell
-python run.py configs/llama.py -w outputs/llama --debug
+python run.py configs/eval_llama_7b.py -w outputs/llama --debug
 ```

 但 `--debug` 模式下只能逐一序列执行任务，因此检查无误后，可关闭 `--debug` 模式，使程序充分利用多卡资源

 ```shell
-python run.py configs/llama.py -w outputs/llama
+python run.py configs/eval_llama_7b.py -w outputs/llama
 ```

 以下是一些与评测相关的参数，可以帮助你根据自己的环境情况配置更高效的推理任务。
@@ -104,10 +133,12 @@ python run.py configs/llama.py -w outputs/llama
 如果你不是在本机进行评测，而是使用 slurm 集群，可以指定如下参数：

 - `--slurm`: 使用 slurm 在集群提交任务
- `--partition my_part`: slurm 集群分区
+- `--partition(-p) my_part`: slurm 集群分区
 - `--retry 2`: 任务出错重试次数

-## 获取评测结果
+</details>
+
+## 评测结果

 评测完成后，会打印评测结果表格如下：

@@ -118,4 +149,26 @@ piqa       1cf9f0     accuracy  ppl          77.75
 siqa       e78df3     accuracy  gen          36.08
 ```

-另外，会在结果保存目录的 `summary` 文件夹中保存 txt 和 csv 格式的结果文件。
+所有运行结果会默认放在`outputs/default/`目录下，目录结构如下所示：
+
+```text
+outputs/default/
+├── 20200220_120000
+├── ...
+├── 20230220_183030
+│   ├── configs
+│   ├── logs
+│   │   ├── eval
+│   │   └── infer
+│   ├── predictions
+│   │   └── MODEL1
+│   └── results
+│       └── MODEL1
+```
+
+其中，每一个时间戳中存在以下内容：
+
+- configs文件夹，用于存放以这个时间戳为输出目录的每次运行对应的配置文件；
+- logs文件夹，用于存放推理和评测两个阶段的输出日志文件，各个文件夹内会以模型为子文件夹存放日志；
+- predicitions文件夹，用于存放推理json结果，以模型为子文件夹；
+- results文件夹，用于存放评测json结果，以模型为子文件夹
--- a/docs/zh_cn/index.rst
+++ b/docs/zh_cn/index.rst
@@ -8,12 +8,12 @@ OpenCompass 上手路线

   - 对于想要使用 OpenCompass 的用户，我们推荐先阅读 开始你的第一步_ 部分来设置环境。

+   - 如果您想调整提示语，您可以浏览 提示语_ 。
+
   - 对于一些基础使用，我们建议用户阅读 教程_ 。

   - 若您想进行算法的自定义，我们提供了 进阶教程_ 。

-   - 如果您想调整提示语，您可以浏览 提示语_ 。
-
   - 我们同样提供了 工具_ 。


@@ -31,32 +31,29 @@ OpenCompass 上手路线
   :maxdepth: 1
   :caption: 教程

-   user_guides/framework_overview.md
   user_guides/config.md
   user_guides/dataset_prepare.md
   user_guides/models.md
   user_guides/evaluation.md
   user_guides/experimentation.md
-   user_guides/metrics.md
-
-.. _进阶教程:
-.. toctree::
-   :maxdepth: 1
-   :caption: 进阶教程
-
-   advanced_guides/new_dataset.md
-   advanced_guides/new_model.md

 .. _提示语:
 .. toctree::
   :maxdepth: 1
   :caption: 提示语

-   prompt/overview.md
   prompt/few_shot.md
   prompt/prompt_template.md
   prompt/meta_template.md

+.. _进阶教程:
+.. toctree::
+   :maxdepth: 1
+   :caption: 进阶教程
+
+   advanced_guides/new_dataset.md
+   advanced_guides/new_model.md
+
 .. _工具:
 .. toctree::
   :maxdepth: 1

--- a/docs/zh_cn/prompt/few_shot.md
+++ b/docs/zh_cn/prompt/few_shot.md
-# Few-shot
\ No newline at end of file
+# Few-shot
+
+Coming soon.
--- a/docs/zh_cn/prompt/prompt_template.md
+++ b/docs/zh_cn/prompt/prompt_template.md
-# Prompt 模板
\ No newline at end of file
+# Prompt 模板
+
+Coming soon.
\ No newline at end of file
--- a/docs/zh_cn/user_guides/experimentation.md
+++ b/docs/zh_cn/user_guides/experimentation.md
@@ -5,9 +5,16 @@
 评测任务的程序入口为 `run.py`，使用方法如下：

 ```shell
-run.py [-p PARTITION] [-q QUOTATYPE] [--debug] [-m MODE] [-r [REUSE]] [-w WORKDIR] [-l LARK] config
+run.py {--slrum | --dlc | None} $Config [-p PARTITION] [-q QUOTATYPE] [--debug] [-m MODE] [-r [REUSE]] [-w WORKDIR] [-l LARK]
 ```

+启动方式：
+
+- 本地机器运行: `run.py $Config`，$Config 中不包含 `eval` 和 `infer` 字段。
+- srun运行: `run.py $Config --slurm -p $PARTITION_name`。
+- dlc运行： `run.py $Config --dlc --aliyun-cfg $AliYun_Cfg`， 后续会有教程。
+- 定制化启动: `run.py $Config` $Config 中包含 `eval` 和 `infer` 字段，参考 [评估文档](./evaluation.md)。
+
 参数解释如下：

 - -p 指定 slurm 分区；
@@ -18,14 +25,12 @@ run.py [-p PARTITION] [-q QUOTATYPE] [--debug] [-m MODE] [-r [REUSE]] [-w WORKDI
 - -w 指定工作路径，默认为 ./outputs/default
 - -l 打开飞书机器人状态上报。

-
 以运行模式`-m all`为例，整体运行流如下：

 1. 读取配置文件，解析出模型、数据集、评估器等配置信息
 2. 评测任务主要分为推理 infer、评测 eval 和可视化 viz 三个阶段，其中推理和评测经过 Partitioner 进行任务切分后，交由 Runner 负责并行执行。单个推理和评测任务则被抽象成 OpenICLInferTask 和 OpenICLEvalTask。
 3. 两阶段分别结束后，可视化阶段会读取 results 中的评测结果，生成可视化报告。

-
 ## 任务监控：飞书机器人

 用户可以通过配置飞书机器人，实现任务状态的实时监控。飞书机器人的设置文档请[参考这里](https://open.feishu.cn/document/ukTMukTMukTM/ucTM5YjL3ETO24yNxkjN?lang=zh-CN#7a28964d)。
@@ -64,7 +69,7 @@ run.py [-p PARTITION] [-q QUOTATYPE] [--debug] [-m MODE] [-r [REUSE]] [-w WORKDI

 所有运行结果会默认放在`outputs/default/`目录下，目录结构如下所示：

-```
+```text
 outputs/default/
 ├── 20200220_120000
 ├── ...
@@ -80,6 +85,7 @@ outputs/default/
 ```

 其中，每一个时间戳中存在以下内容：
+
 - configs文件夹，用于存放以这个时间戳为输出目录的每次运行对应的配置文件；
 - logs文件夹，用于存放推理和评测两个阶段的输出日志文件，各个文件夹内会以模型为子文件夹存放日志；
 - predicitions文件夹，用于存放推理json结果，以模型为子文件夹；

--- a/docs/zh_cn/user_guides/metrics.md
+++ b/docs/zh_cn/user_guides/metrics.md
 # 评估指标
+
+Coming soon.