[Docs] Update dataset docs (#19)

* [Docs] Update dataset docs * [Docs] Update dataset docs

[Docs] Update dataset docs (#19)
* [Docs] Update dataset docs * [Docs] Update dataset docs
30a988a6 · Tong Gao · GitHub · d1025c32 · 30a988a6 · 30a988a6
Unverified Commit 30a988a6 authored Jul 06, 2023 by Tong Gao Committed by GitHub Jul 06, 2023
7 changed files
--- a/README.md
+++ b/README.md
@@ -15,7 +15,7 @@ English | [简体中文](README_zh-CN.md)
 </div>
-Welcome to **OpenCompass**! 
+Welcome to **OpenCompass**!
 Just like a compass guides us on our journey, OpenCompass will guide you through the complex landscape of evaluating large language models. With its powerful algorithms and intuitive interface, OpenCompass makes it easy to assess the quality and effectiveness of your NLP models.
@@ -37,7 +37,6 @@ OpenCompass is a one-stop platform for large model evaluation, aiming to provide
 We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for community to rank all public models and API models. If you would like to join the evaluation, please provide the model repository URL or a standard API interface to the email address `opencompass@pjlab.org.cn`.
 [![image](https://github.com/InternLM/OpenCompass/assets/7881589/475b0c8e-28b8-43e9-b2fd-4dd558e22491)](https://opencompass.org.cn/rank)
 ## Dataset Support
@@ -289,7 +288,8 @@ git clone https://github.com/InternLM/opencompass opencompass
 cd opencompass
 pip install -e .
 # Download dataset to data/ folder
-# TODO: ....
+wget https://github.com/InternLM/opencompass/releases/download/0.1.0/OpenCompassData.zip
+unzip OpenCompassData.zip
 ```
 ## Evaluation

--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -290,7 +290,8 @@ git clone https://github.com/InternLM/opencompass opencompass
 cd opencompass
 pip install -e .
 # 下载数据集到 data/ 处
-# TODO: ....
+wget https://github.com/InternLM/opencompass/releases/download/0.1.0/OpenCompassData.zip
+unzip OpenCompassData.zip
 ```
 ## 评测

--- a/docs/en/get_started.md
+++ b/docs/en/get_started.md
@@ -60,7 +60,7 @@ Here's a detailed step-by-step explanation of this case study:
 <details>
 <summary>prepare datasets</summary>
-The SiQA and PiQA benchmarks can be automatically downloaded through their respective links here and here, so no manual downloading is required here. However, some other datasets may require manual downloads. Please refer to the documentation [Prepare Datasets](docs/zh_cn/user_guides/dataset_prepare.md) for more information.
+The SiQA and PiQA benchmarks can be automatically downloaded through their respective links here and here, so no manual downloading is required here. However, some other datasets may require manual downloads. Please refer to the documentation [Prepare Datasets](./user_guides/dataset_prepare.md) for more information.
 Create a '.py' configuration file and add the following content:

--- a/docs/en/index.rst
+++ b/docs/en/index.rst
@@ -29,7 +29,7 @@ We always welcome *PRs* and *Issues* for the betterment of MMPretrain.
 .. _UserGuides:
 .. toctree::
   :maxdepth: 1
-   :caption: UserGuides
+   :caption: User Guides
   user_guides/config.md
   user_guides/dataset_prepare.md
@@ -40,7 +40,7 @@ We always welcome *PRs* and *Issues* for the betterment of MMPretrain.
 .. _AdvancedGuides:
 .. toctree::
   :maxdepth: 1
-   :caption: AdvancedGuides
+   :caption: Advanced Guides
   advanced_guides/new_dataset.md
   advanced_guides/new_model.md

--- a/docs/en/user_guides/dataset_prepare.md
+++ b/docs/en/user_guides/dataset_prepare.md
@@ -39,11 +39,17 @@ The datasets supported by OpenCompass mainly include two parts:
 [Huggingface Dataset](https://huggingface.co/datasets) provides a large number of datasets. OpenCompass has supported most of the datasets commonly used for performance comparison, please refer to `configs/dataset` for the specific list of supported datasets.
-2. OpenCompass Self-built Datasets
+2. Third-party Datasets
-In addition to supporting Huggingface's existing datasets, OpenCompass also provides some self-built CN datasets. In the future, a dataset-related link will be provided for users to download and use. Following the instructions in the document to place the datasets uniformly in the `./data` directory can complete dataset preparation.
+In addition to supporting Huggingface's existing datasets, OpenCompass also provides some third-party and self-built datasets. Run the following commands to download and place the datasets in the `./data` directory can complete dataset preparation.
-It is important to note that the Repo not only contains self-built datasets, but also includes some HF-supported datasets for testing convenience.
+```bash
+# Run in the OpenCompass directory
+wget https://github.com/InternLM/opencompass/releases/download/0.1.0/OpenCompassData.zip
+unzip OpenCompassData.zip
+```
+Note that the Repo not only contains self-built datasets, but also includes some HF-supported datasets for testing convenience.
 ## Dataset Selection

--- a/docs/zh_cn/get_started.md
+++ b/docs/zh_cn/get_started.md
@@ -55,7 +55,7 @@ python run.py configs/eval_llama_7b.py --debug
 <details>
 <summary>准备数据集及其配置</summary>
-因为 [siqa](https://huggingface.co/datasets/siqa)， [piqa](https://huggingface.co/datasets/piqa) 支持自动下载，所以这里不需要手动下载数据集，但有部分数据集可能需要手动下载，详细查看文档 [准备数据集](docs/zh_cn/user_guides/dataset_prepare.md).
+因为 [siqa](https://huggingface.co/datasets/siqa)， [piqa](https://huggingface.co/datasets/piqa) 支持自动下载，所以这里不需要手动下载数据集，但有部分数据集可能需要手动下载，详细查看文档 [准备数据集](./user_guides/dataset_prepare.md).
 创建一个 '.py' 配置文件， 添加以下内容：
@@ -66,7 +66,7 @@ with read_base():
    # 直接从预设数据集配置中读取需要的数据集配置
    from .datasets.piqa.piqa_ppl import piqa_datasets
    from .datasets.siqa.siqa_gen import siqa_datasets
 datasets = [*piqa_datasets, *siqa_datasets]          # 最后 config 需要包含所需的评测数据集列表 datasets
 ```
@@ -97,7 +97,7 @@ llama_7b = dict(
        batch_size=16,              # 批次大小
        run_cfg=dict(num_gpus=1),   # 运行配置，用于指定资源需求
    )
 models = [llama_7b]                                     # 最后 config 需要包含所需的模型列表 models
 ```

--- a/docs/zh_cn/user_guides/dataset_prepare.md
+++ b/docs/zh_cn/user_guides/dataset_prepare.md
@@ -39,9 +39,15 @@ OpenCompass 支持的数据集主要包括两个部分：
 [Huggingface Dataset](https://huggingface.co/datasets) 提供了大量的数据集。OpenCompass 已经支持了大多数常用于性能比较的数据集，具体支持的数据集列表请直接在 `configs/dataset` 下进行查找。
-2. OpenCompass 自建数据集
+2. 第三方数据集
-除了支持 Huggingface 已有的数据集， OpenCompass 还提供了一些自建CN数据集，未来将会提供一个数据集相关的链接供用户下载使用。按照文档指示将数据集统一放置在`./data`目录下即可完成数据集准备。
+除了支持 Huggingface 已有的数据集， OpenCompass 还提供了一些第三方数据集及自建CN数据集。运行以下命令，将数据集统一下载并放置在`./data`目录下即可完成数据集准备。
+```bash
+# 在 OpenCompass 目录下运行
+wget https://github.com/InternLM/opencompass/releases/download/0.1.0/OpenCompassData.zip
+unzip OpenCompassData.zip
+```
 需要注意的是，Repo中不仅包含自建的数据集，为了方便也加入了部分HF已支持的数据集方便测试。