Initial commit

be3dfa50 · jerrrrry · be3dfa50 · be3dfa50 · be3dfa50 · be3dfa50
Commit be3dfa50 authored Aug 06, 2025 by jerrrrry
20 changed files
--- a/.codespellrc
+++ b/.codespellrc
+[codespell]
+skip = *.ipynb
+count =
+quiet-level = 3
+ignore-words-list = nd, ans, ques, rouge, softwares, wit
--- a/.github/ISSUE_TEMPLATE/1_bug-report.yml
+++ b/.github/ISSUE_TEMPLATE/1_bug-report.yml
+name: 🐞 Bug report
+description: Create a report to help us improve
+labels: ["bug"]
+title: "[Bug] "
+body:
+  - type: markdown
+    attributes:
+      value: |
+        For general questions or idea discussions, please post it to our [**Forum**](https://github.com/open-compass/opencompass/discussions).
+        If you have already identified the reason, we strongly appreciate you creating a new PR according to [the tutorial](https://opencompass.readthedocs.io/en/master/community/CONTRIBUTING.html)!
+        If you need our help, please fill in the following form to help us to identify the bug.
+  - type: checkboxes
+    attributes:
+      label: Prerequisite
+      description: Please check the following items before creating a new issue.
+      options:
+      - label: I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help.
+        required: true
+      - label: The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass).
+        required: true
+  - type: dropdown
+    id: task
+    attributes:
+      label: Type
+      description: The problem arises when
+      options:
+        - I'm evaluating with the officially supported tasks/models/datasets.
+        - I have modified the code (config is not considered code), or I'm working on my own tasks/models/datasets.
+    validations:
+      required: true
+  - type: textarea
+    id: environment
+    validations:
+      required: true
+    attributes:
+      label: Environment
+      description: |
+        Please run `python -c "import opencompass.utils;import pprint;pprint.pprint(dict(opencompass.utils.collect_env()))"` to collect necessary environment information and paste it here.
+      placeholder: |
+        ```python
+        # The output the above command
+        ```
+  - type: textarea
+    attributes:
+      label: Reproduces the problem - code/configuration sample
+      description: |
+        Please provide a code or configuration sample that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
+      placeholder: |
+        ```python
+        # Sample code to reproduce the problem
+        ```
+    validations:
+      required: true
+  - type: textarea
+    attributes:
+      label: Reproduces the problem - command or script
+      description: |
+        What command or script did you run?
+      placeholder: |
+        ```shell
+        The command or script you run.
+        ```
+    validations:
+      required: true
+  - type: textarea
+    attributes:
+      label: Reproduces the problem - error message
+      description: |
+        Please provide the error message or logs you got, with the full traceback.
+        Tip: You can attach images or log files by dragging them into the text area..
+      placeholder: |
+        ```
+        The error message or logs you got, with the full traceback.
+        ```
+    validations:
+      required: true
+  - type: textarea
+    id: other
+    attributes:
+      label: Other information
+      description: |
+        Tell us anything else you think we should know.
+        1. What's your expected result?
+        2. What dataset did you use?
+        3. What do you think might be the reason?
--- a/.github/ISSUE_TEMPLATE/2_feature-request.yml
+++ b/.github/ISSUE_TEMPLATE/2_feature-request.yml
+name: 🚀 Feature request
+description: Suggest an idea for this project
+labels: ["enhancement"]
+title: "[Feature] "
+body:
+  - type: markdown
+    attributes:
+      value: |
+        For general questions or idea discussions, please post it to our [**Forum**](https://github.com/open-compass/opencompass/discussions).
+        If you have already implemented the feature, we strongly appreciate you creating a new PR according to [the tutorial](https://opencompass.readthedocs.io/en/master/community/CONTRIBUTING.html)!
+  - type: textarea
+    id: describe
+    validations:
+      required: true
+    attributes:
+      label: Describe the feature
+      description: |
+        What kind of feature do you want OpenCompass to add. If there is an official code release or third-party implementation, please also provide the information here, which would be very helpful.
+      placeholder: |
+        A clear and concise description of the motivation of the feature.
+        Ex1. It is inconvenient when \[....\].
+        Ex2. There is a recent paper \[....\], which is very helpful for \[....\].
+  - type: checkboxes
+    id: pr
+    attributes:
+      label: Will you implement it?
+      options:
+        - label: I would like to implement this feature and create a PR!
--- a/.github/ISSUE_TEMPLATE/3_bug-report_zh.yml
+++ b/.github/ISSUE_TEMPLATE/3_bug-report_zh.yml
+name: 🐞 报告 Bug
+description: 报告你在使用中遇到的不合预期的情况
+labels: ["bug"]
+title: "[Bug] "
+body:
+  - type: markdown
+    attributes:
+      value: |
+        我们推荐使用英语模板 Bug report，以便你的问题帮助更多人。
+        如果需要询问一般性的问题或者想法，请在我们的[**论坛**](https://github.com/open-compass/opencompass/discussions)讨论。
+        如果你已经有了解决方案，我们非常欢迎你直接创建一个新的 PR 来解决这个问题。创建 PR 的流程可以参考[文档](https://opencompass.readthedocs.io/zh_CN/master/community/CONTRIBUTING.html)。
+        如果你需要我们的帮助，请填写以下内容帮助我们定位 Bug。
+  - type: checkboxes
+    attributes:
+      label: 先决条件
+      description: 在创建新问题之前，请检查以下项目。
+      options:
+      - label: 我已经搜索过 [问题](https://github.com/open-compass/opencompass/issues/) 和 [讨论](https://github.com/open-compass/opencompass/discussions) 但未得到预期的帮助。
+        required: true
+      - label: 错误在 [最新版本](https://github.com/open-compass/opencompass) 中尚未被修复。
+        required: true
+  - type: dropdown
+    id: task
+    attributes:
+      label: 问题类型
+      description: 问题出现时
+      options:
+        - 我正在使用官方支持的任务/模型/数据集进行评估。
+        - 我修改了代码（配置不视为代码），或者我正在处理我自己的任务/模型/数据集。
+    validations:
+      required: true
+  - type: textarea
+    id: environment
+    validations:
+      required: true
+    attributes:
+      label: 环境
+      description: |
+        请运行 `python -c "import opencompass.utils;import pprint;pprint.pprint(dict(opencompass.utils.collect_env()))"` 来收集必要的环境信息并粘贴在此处。
+      placeholder: |
+        ```python
+        # 上述命令的输出
+        ```
+  - type: textarea
+    attributes:
+      label: 重现问题 - 代码/配置示例
+      description: |
+        请提供重现您遇到的问题的代码或配置示例。它可以是一个Colab链接或仅仅是一个代码片段。
+      placeholder: |
+        ```python
+        # 重现问题的示例代码
+        ```
+    validations:
+      required: true
+  - type: textarea
+    attributes:
+      label: 重现问题 - 命令或脚本
+      description: |
+        您运行了什么命令或脚本？
+      placeholder: |
+        ```shell
+        您运行的命令或脚本。
+        ```
+    validations:
+      required: true
+  - type: textarea
+    attributes:
+      label: 重现问题 - 错误信息
+      description: |
+        请提供您收到的错误消息或日志，并提供完整的追溯。
+        提示：您可以通过拖放图片或日志文件到文本区域来附加它们。
+      placeholder: |
+        ```
+        您收到的错误消息或日志，带有完整的追溯。
+        ```
+    validations:
+      required: true
+  - type: textarea
+    id: other
+    attributes:
+      label: 其他信息
+      description: |
+        告诉我们其他有价值的信息。
+        1. 你是否对代码或配置文件做了任何改动？
+        2. 你认为可能的原因是什么？
--- a/.github/ISSUE_TEMPLATE/4_feature-request_zh.yml
+++ b/.github/ISSUE_TEMPLATE/4_feature-request_zh.yml
+name: 🚀 功能建议
+description: 建议一项新的功能
+labels: ["enhancement"]
+title: "[Feature] "
+body:
+  - type: markdown
+    attributes:
+      value: |
+        推荐使用英语模板 Feature request，以便你的问题帮助更多人。
+        如果需要询问一般性的问题或者想法，请在我们的[**论坛**](https://github.com/open-compass/opencompass/discussions)讨论。
+        如果你已经实现了该功能，我们非常欢迎你直接创建一个新的 PR 来解决这个问题。创建 PR 的流程可以参考[文档](https://opencompass.readthedocs.io/zh_CN/master/community/CONTRIBUTING.html)。
+  - type: textarea
+    id: describe
+    validations:
+      required: true
+    attributes:
+      label: 描述该功能
+      description: |
+        你希望 OpenCompass 添加什么功能？如果存在相关的论文、官方实现或者第三方实现，请同时贴出链接，这将非常有帮助。
+      placeholder: |
+        简要说明该功能，及为什么需要该功能
+        例 1. 现在进行 xxx 的时候不方便
+        例 2. 最近的论文中提出了有一个很有帮助的 xx
+  - type: checkboxes
+    id: pr
+    attributes:
+      label: 是否希望自己实现该功能？
+      options:
+        - label: 我希望自己来实现这一功能，并向 OpenCompass 贡献代码！
--- a/.github/ISSUE_TEMPLATE/config.yml
+++ b/.github/ISSUE_TEMPLATE/config.yml
+blank_issues_enabled: false
+contact_links:
+  - name: 📚 OpenCompass Documentation (官方文档)
+    url: https://opencompass.readthedocs.io/en/latest/
+    about: Check if your question is answered in docs
+  - name: 💬 General questions (寻求帮助)
+    url: https://github.com/open-compass/opencompass/discussions
+    about: Ask general usage questions and discuss with other OpenCompass community members
+  - name: 🌐 Explore OpenCompass (官网)
+    url: https://opencompass.org.cn/
+    about: Get know more about OpenCompass
--- a/.github/pull_request_template.md
+++ b/.github/pull_request_template.md
+Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
+## Motivation
+Please describe the motivation of this PR and the goal you want to achieve through this PR.
+## Modification
+Please briefly describe what modification is made in this PR.
+## BC-breaking (Optional)
+Does the modification introduce changes that break the backward compatibility of the downstream repositories?
+If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
+## Use cases (Optional)
+If this PR introduces a new feature, it is better to list some use cases here and update the documentation.
+## Checklist
+**Before PR**:
+- [ ] Pre-commit or other linting tools are used to fix the potential lint issues.
+- [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
+- [ ] The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
+- [ ] The documentation has been modified accordingly, like docstring or example tutorials.
+**After PR**:
+- [ ] If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
+- [ ] CLA has been signed and all committers have signed the CLA in this PR.
--- a/.github/workflows/daily-run-test.yml
+++ b/.github/workflows/daily-run-test.yml
+name: daily_run_test
+on:
+  workflow_dispatch:
+    inputs:
+      repo_org:
+        required: false
+        description: 'Tested repository organization name. Default is open-compass/opencompass'
+        type: string
+        default: 'open-compass/opencompass'
+      repo_ref:
+        required: false
+        description: 'Set branch or tag or commit id. Default is "main"'
+        type: string
+        default: 'main'
+      build_lmdeploy:
+        required: false
+        description: 'whether to build lmdeploy'
+        type:  boolean
+        default: true
+      repo_org_lmdeploy:
+        required: false
+        description: 'Tested repository organization name. Default is internlm/lmdeploy'
+        type: string
+        default: 'InternLM/lmdeploy'
+      repo_ref_lmdeploy:
+        required: false
+        description: 'Set branch or tag or commit id. Default is "main"'
+        type: string
+        default: 'main'
+      regression_func_volc:
+        required: true
+        description: 'regression functions'
+        type: string
+        default: "['chat_models','base_models', 'chat_obj_fullbench', 'base_fullbench']"
+      regression_func_local:
+        required: true
+        description: 'regression functions'
+        type: string
+        default: "['cmd', 'api', 'chat_sub_fullbench']"
+      fullbench_eval:
+        required: true
+        description: 'fullbench volc functions'
+        type: string
+        default: "['base_objective','chat_objective','chat_subjective','base_long_context','chat_long_context']"
+  schedule:
+    - cron:  '15 14 * * 0,2'
+env:
+  HF_DATASETS_OFFLINE: 1
+  HF_EVALUATE_OFFLINE: 1
+  TRANSFORMERS_OFFLINE: 1
+  VLLM_USE_MODELSCOPE: false
+  LMDEPLOY_USE_MODELSCOPE: false
+  HF_HUB_OFFLINE: 1
+  OUTPUT_FOLDER: cuda12.1_dist_${{ github.run_id }}
+  CONDA_PATH: /fs-computility/llm/qa-llm-cicd/miniconda3
+  PIP_CACHE_PATH: /fs-computility/llm/qa-llm-cicd/.cache/pip
+  REPORT_ROOT: /fs-computility/llm/qa-llm-cicd/eval_report/regression
+  COMPASS_DATA_CACHE: /fs-computility/llm/shared/llmeval/datasets/compass_data_cache
+  HUGGINGFACE_HUB_CACHE: /fs-computility/llm/shared/llmeval/models/opencompass_hf_hub
+  HF_HUB_CACHE: /fs-computility/llm/shared/llmeval/models/opencompass_hf_hub
+  CONDA_ENV: regression_test
+jobs:
+  build-pypi:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+        with:
+          repository: ${{ github.event.inputs.repo_org || 'open-compass/opencompass' }}
+          ref: ${{github.event.inputs.repo_ref || 'main'}}
+      - name: Set up Python 3.10
+        uses: actions/setup-python@v4
+        with:
+          python-version: '3.10'
+      - name: Build lagent
+        run: |
+          pip install wheel setuptools
+          python setup.py sdist bdist_wheel
+      - name: Upload Artifacts
+        uses: actions/upload-artifact@v4
+        with:
+          if-no-files-found: error
+          path: dist/*
+          retention-days: 1
+          name: my-artifact-${{ github.run_id }}
+  build-pypi-lmdeploy:
+    if: ${{!cancelled() && (github.event_name == 'schedule' || inputs.build_lmdeploy)}}
+    strategy:
+      matrix:
+        pyver: [py310]
+    runs-on: ubuntu-latest
+    environment: 'prod'
+    env:
+      PYTHON_VERSION: ${{ matrix.pyver }}
+      PLAT_NAME: manylinux2014_x86_64
+      DOCKER_TAG: cuda12.1
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v3
+        with:
+          repository: ${{ github.event.inputs.repo_org_lmdeploy || 'InternLM/lmdeploy' }}
+          ref: ${{github.event.inputs.repo_ref_lmdeploy || 'main'}}
+      - name: Build
+        run: |
+          echo ${PYTHON_VERSION}
+          echo ${PLAT_NAME}
+          echo ${DOCKER_TAG}
+          echo ${OUTPUT_FOLDER}
+          echo ${GITHUB_RUN_ID}
+          # remove -it
+          sed -i 's/docker run --rm -it/docker run --rm/g' builder/manywheel/build_wheel.sh
+          bash builder/manywheel/build_wheel.sh ${PYTHON_VERSION} ${PLAT_NAME} ${DOCKER_TAG} ${OUTPUT_FOLDER}
+      - name: Upload Artifacts
+        uses: actions/upload-artifact@v4
+        with:
+          if-no-files-found: error
+          path: builder/manywheel/${{ env.OUTPUT_FOLDER }}
+          retention-days: 1
+          name: my-artifact-${{ github.run_id }}-${{ matrix.pyver }}
+  prepare_env:
+    if: ${{!cancelled()}}
+    needs: ['build-pypi', 'build-pypi-lmdeploy']
+    runs-on: volc_cu12
+    environment: 'prod'
+    timeout-minutes: 120 #2hours
+    steps:
+      - name: Clone repository
+        uses: actions/checkout@v2
+        with:
+          repository: ${{ github.event.inputs.repo_org || 'open-compass/opencompass' }}
+          ref: ${{github.event.inputs.repo_ref || 'main'}}
+      - name: Download Artifacts
+        uses: actions/download-artifact@v4
+        with:
+          name: my-artifact-${{ github.run_id }}
+      - name:  Remove Conda Env
+        if: always()
+        run: |
+          . /fs-computility/llm/qa-llm-cicd/miniconda3/bin/activate
+          conda env remove -y --name ${{env.CONDA_ENV}}
+          conda info --envs
+      - name: Prepare - create conda env and install torch - cu12
+        uses: nick-fields/retry@v3
+        with:
+          max_attempts: 1
+          timeout_minutes: 120
+          command: |
+            . ${{env.CONDA_PATH}}/bin/activate
+            conda create -y --name ${{env.CONDA_ENV}} python=3.10
+            conda activate ${{env.CONDA_ENV}}
+            pip install -r /fs-computility/llm/qa-llm-cicd/config/requirements.txt --cache-dir ${{env.PIP_CACHE_PATH}}
+            pip install opencompass*.whl --cache-dir ${{env.PIP_CACHE_PATH}}
+            pip install opencompass[lmdeploy] --cache-dir ${{env.PIP_CACHE_PATH}}
+            pip install opencompass[vllm] --cache-dir ${{env.PIP_CACHE_PATH}}
+            pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --cache-dir ${{env.PIP_CACHE_PATH}}
+            FLASH_ATTENTION_FORCE_BUILD=TRUE pip install /fs-computility/llm/qa-llm-cicd/packages/flash_attn-2.7.0.post2+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
+            pip install xformers --index-url https://download.pytorch.org/whl/cu121 --cache-dir ${{env.PIP_CACHE_PATH}}
+            cp -r /root/nltk_data ${{env.CONDA_PATH}}/envs/${{env.CONDA_ENV}}/nltk_data
+      - name: Prepare - reinstall lmdeploy - cu12
+        if: ${{github.event_name == 'schedule' || inputs.build_lmdeploy}}
+        uses: actions/download-artifact@v4
+        with:
+          name: my-artifact-${{ github.run_id }}-py310
+      - name: Prepare - reinstall lmdeploy - cu12
+        if: ${{github.event_name == 'schedule' || inputs.build_lmdeploy}}
+        run: |
+          . ${{env.CONDA_PATH}}/bin/activate
+          conda activate ${{env.CONDA_ENV}}
+          pip uninstall -y lmdeploy
+          pip install lmdeploy-*.whl --no-deps
+      - name: conda env
+        run: |
+          . ${{env.CONDA_PATH}}/bin/activate
+          conda activate ${{env.CONDA_ENV}}
+          conda info --envs
+          pip list
+  daily_run_test_volc:
+    if: ${{!cancelled()}}
+    needs: prepare_env
+    strategy:
+      fail-fast: false
+      matrix:
+        regression_func: ${{fromJSON(github.event.inputs.regression_func_volc || '["chat_models","base_models","chat_obj_fullbench","base_fullbench"]')}}
+    runs-on: volc_cu12_daily
+    environment: 'prod'
+    timeout-minutes: 180 #3hours
+    steps:
+      - name: Clone repository
+        uses: actions/checkout@v2
+        with:
+          repository: ${{ github.event.inputs.repo_org || 'open-compass/opencompass' }}
+          ref: ${{github.event.inputs.repo_ref || 'main'}}
+      - name: conda env
+        run: |
+          . ${{env.CONDA_PATH}}/bin/activate
+          conda activate ${{env.CONDA_ENV}}
+          conda info --envs
+          pip list
+      - name:  modify config
+        if: matrix.regression_func != 'chat_sub_fullbench'
+        run: |
+          cp -r /fs-computility/llm/qa-llm-cicd/ocplayground/template/configs_cluster/volc.py .
+          cat /fs-computility/llm/qa-llm-cicd/config/test_config.txt >> .github/scripts/eval_regression_${{matrix.regression_func}}.py
+      - name:  Run test
+        uses: nick-fields/retry@v3
+        with:
+          max_attempts: 1
+          timeout_minutes: 180
+          command: |
+            . ${{env.CONDA_PATH}}/bin/activate
+            conda activate ${{env.CONDA_ENV}}
+            conda info --envs
+            opencompass .github/scripts/eval_regression_${{matrix.regression_func}}.py --work-dir ${{env.REPORT_ROOT}}/${{ github.run_id }}/${{matrix.regression_func}} --reuse --dump-eval-details
+            rm regression_result_daily -f && ln -s ${{env.REPORT_ROOT}}/${{ github.run_id }}/${{matrix.regression_func}}/*/summary regression_result_daily
+            python -m pytest -m ${{matrix.regression_func}} -s -v --color=yes .github/scripts/oc_score_assert.py
+  daily_run_test_local:
+    if: ${{!cancelled()}}
+    needs: prepare_env
+    strategy:
+      fail-fast: false
+      matrix:
+        regression_func: ${{fromJSON(github.event.inputs.regression_func_local || '["cmd","api","chat_sub_fullbench"]')}}
+    runs-on: volc_cu12_local
+    environment: 'prod'
+    timeout-minutes: 480 #6hours
+    steps:
+      - name: Clone repository
+        uses: actions/checkout@v2
+        with:
+          repository: ${{ github.event.inputs.repo_org || 'open-compass/opencompass' }}
+          ref: ${{github.event.inputs.repo_ref || 'main'}}
+      - name: conda env
+        run: |
+          . ${{env.CONDA_PATH}}/bin/activate
+          conda activate ${{env.CONDA_ENV}}
+          conda info --envs
+          pip list
+      - name:  modify config
+        if: matrix.regression_func == 'chat_sub_fullbench'
+        run: |
+          cp -r /fs-computility/llm/qa-llm-cicd/ocplayground/template/configs_cluster/volc.py .
+          cat /fs-computility/llm/qa-llm-cicd/config/test_config_sub.txt >> .github/scripts/eval_regression_${{matrix.regression_func}}.py
+      - name:  Run command testcase
+        if: matrix.regression_func == 'cmd'
+        run: |
+          . ${{env.CONDA_PATH}}/bin/activate
+          conda activate ${{env.CONDA_ENV}}
+          conda info --envs
+          export from_tf=TRUE
+          python tools/list_configs.py internlm2_5 mmlu
+          opencompass --models hf_internlm2_5_7b hf_internlm2_1_8b --datasets race_ppl demo_gsm8k_chat_gen --work-dir ${{env.REPORT_ROOT}}/${{ github.run_id }}/cmd1 --reuse --max-num-workers 2 --dump-eval-details
+          rm regression_result_daily -f && ln -s ${{env.REPORT_ROOT}}/${{ github.run_id }}/cmd1/*/summary regression_result_daily
+          python -m pytest -m case1 -s -v --color=yes .github/scripts/oc_score_assert.py
+          opencompass --models hf_internlm2_5_7b_chat hf_internlm2_chat_1_8b --datasets race_gen demo_gsm8k_chat_gen -a lmdeploy --work-dir ${{env.REPORT_ROOT}}/${{ github.run_id }}/cmd2 --reuse --max-num-workers 2 --dump-eval-details
+          rm regression_result_daily -f && ln -s ${{env.REPORT_ROOT}}/${{ github.run_id }}/cmd2/*/summary regression_result_daily
+          python -m pytest -m case2 -s -v --color=yes .github/scripts/oc_score_assert.py
+          opencompass --datasets race_ppl demo_gsm8k_chat_gen --hf-type base --hf-path internlm/internlm2_5-7b --work-dir ${{env.REPORT_ROOT}}/${{ github.run_id }}/cmd3 --reuse --max-num-workers 2 --dump-eval-details
+          rm regression_result_daily -f && ln -s ${{env.REPORT_ROOT}}/${{ github.run_id }}/cmd3/*/summary regression_result_daily
+          python -m pytest -m case3 -s -v --color=yes .github/scripts/oc_score_assert.py
+          opencompass --datasets race_gen demo_gsm8k_chat_gen --hf-type chat --hf-path internlm/internlm2_5-7b-chat --work-dir ${{env.REPORT_ROOT}}/${{ github.run_id }}/cmd4 --reuse --max-num-workers 2 --dump-eval-details
+          rm regression_result_daily -f && ln -s ${{env.REPORT_ROOT}}/${{ github.run_id }}/cmd4/*/summary regression_result_daily
+          python -m pytest -m case4 -s -v --color=yes .github/scripts/oc_score_assert.py
+      - name:  Run model test - api
+        if: matrix.regression_func == 'api'
+        run: |
+          . ${{env.CONDA_PATH}}/bin/activate
+          conda activate ${{env.CONDA_ENV}}
+          conda info --envs
+          lmdeploy serve api_server internlm/internlm2_5-7b-chat --max-batch-size 256 --model-name internlm2 > ${{env.REPORT_ROOT}}/${{ github.run_id }}/restful.log  2>&1  &
+          echo "restful_pid=$!" >> "$GITHUB_ENV"
+          sleep 180s
+          opencompass .github/scripts/eval_regression_api.py --work-dir ${{env.REPORT_ROOT}}/${{ github.run_id }}/api --reuse --max-num-workers 2 --dump-eval-details
+          rm regression_result_daily -f && ln -s ${{env.REPORT_ROOT}}/${{ github.run_id }}/api/*/summary regression_result_daily
+          python -m pytest -m api -s -v --color=yes .github/scripts/oc_score_assert.py
+      - name:  Run model test - api kill
+        if: always() && matrix.regression_func == 'api'
+        run: |
+          kill -15 "$restful_pid"
+      - name:  Run testcase
+        if: matrix.regression_func == 'chat_sub_fullbench'
+        env:
+          COMPASS_DATA_CACHE: /fs-computility/llm/shared/llmeval/datasets/compass_data_cache_subset
+        run: |
+          . ${{env.CONDA_PATH}}/bin/activate
+          conda activate ${{env.CONDA_ENV}}
+          conda info --envs
+          export from_tf=TRUE
+          opencompass .github/scripts/eval_regression_${{matrix.regression_func}}.py --work-dir ${{env.REPORT_ROOT}}/${{ github.run_id }}/${{matrix.regression_func}} --reuse --dump-eval-details
+          rm regression_result_daily -f && ln -s ${{env.REPORT_ROOT}}/${{ github.run_id }}/${{matrix.regression_func}}/*/summary regression_result_daily
+          python -m pytest -m ${{matrix.regression_func}} -s -v --color=yes .github/scripts/oc_score_assert.py
+  fullbench_run_test:
+    if: ${{!cancelled()}}
+    needs: prepare_env
+    strategy:
+      fail-fast: false
+      matrix:
+        function_type: ${{fromJSON(github.event.inputs.fullbench_eval || '["base_objective","chat_objective","chat_subjective","base_long_context","chat_long_context"]')}}
+    runs-on: volc_cu12
+    environment: 'prod'
+    timeout-minutes: 480 #6hours
+    steps:
+      - name: Clone repository
+        uses: actions/checkout@v2
+        with:
+          repository: ${{ github.event.inputs.repo_org || 'open-compass/opencompass' }}
+          ref: ${{github.event.inputs.repo_ref || 'main'}}
+      - name: conda env
+        run: |
+          . ${{env.CONDA_PATH}}/bin/activate
+          conda activate ${{env.CONDA_ENV}}
+          conda info --envs
+          pip list
+      - name:  Run testcase
+        uses: nick-fields/retry@v3
+        with:
+          max_attempts: 1
+          timeout_minutes: 480
+          command: |
+            . ${{env.CONDA_PATH}}/bin/activate
+            conda activate ${{env.CONDA_ENV}}
+            conda info --envs
+            export from_tf=TRUE
+            opencompass /fs-computility/llm/qa-llm-cicd/ocplayground/template/regression/eval_${{ matrix.function_type }}.py --work-dir ${{env.REPORT_ROOT}}/${{ github.run_id }}/${{ matrix.function_type }} --reuse
+            rm regression_result_daily -f && ln -s ${{env.REPORT_ROOT}}/${{ github.run_id }}/${{ matrix.function_type }}/*/summary regression_result_daily
+            python -m pytest -m ${{ matrix.function_type }} -s -v --color=yes .github/scripts/oc_score_assert.py
+  notify_to_feishu:
+    if: ${{ always() && github.event_name == 'schedule' && !cancelled() && contains(needs.*.result, 'failure') && (github.ref_name == 'develop' || github.ref_name == 'main') }}
+    needs: [daily_run_test_volc, daily_run_test_local, fullbench_run_test]
+    timeout-minutes: 5
+    runs-on: self-hosted
+    environment: 'prod'
+    steps:
+      - name: notify
+        run: |
+          curl -X POST -H "Content-Type: application/json" -d '{"msg_type":"post","content":{"post":{"zh_cn":{"title":"Opencompass- Daily test failed","content":[[{"tag":"text","text":"branch: ${{github.ref_name}}, run action: ${{github.workflow}} failed. "},{"tag":"a","text":"Please click here for details ","href":"https://github.com/'${{ github.repository }}'/actions/runs/'${GITHUB_RUN_ID}'"},{"tag":"at","user_id":"'${{ secrets.USER_ID }}'"}]]}}}}'  ${{ secrets.WEBHOOK_URL }}
--- a/.github/workflows/link-check.yml
+++ b/.github/workflows/link-check.yml
+name: 'Link check'
+on:
+  schedule:
+    # check links at 01:30 a.m. every day
+    - cron: '30 1 * * *'
+  workflow_dispatch: # allow manual trigger
+jobs:
+  link-check:
+    runs-on: ubuntu-latest
+    steps:
+      # - uses: actions/checkout@v3
+      - name: Install linkchecker
+        run: |
+          pip install linkchecker
+      - name: Run linkchecker
+        run: |
+          linkchecker https://opencompass.readthedocs.io/ --no-robots -t 30 --no-warnings \
+            --ignore-url "https://opencompass.readthedocs.io/.*/static/images/opencompass_logo.svg" \
+            --ignore-url "https://opencompass.readthedocs.io/.*/_static/images/icon-menu-dots.svg" \
+            --ignore-url "https://opencompass.readthedocs.io/policy" \
+            --ignore-url "https://opencompass.readthedocs.io/(en|zh_CN)/[0-9a-f]{40}/.*"
--- a/.github/workflows/lint.yml
+++ b/.github/workflows/lint.yml
+name: lint
+on: [push, pull_request]
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+jobs:
+  lint:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - name: Set up Python 3.10
+        uses: actions/setup-python@v4
+        with:
+          python-version: '3.10'
+      - name: Install pre-commit hook
+        run: |
+          pip install pre-commit==3.8.0 mmengine==0.10.5
+          pre-commit install
+      - name: Linting
+        run: pre-commit run --all-files
--- a/.github/workflows/pr-run-test.yml
+++ b/.github/workflows/pr-run-test.yml
+name: pr_run_test
+on:
+  pull_request:
+    paths-ignore:
+      - 'README.md'
+      - 'README_zh-CN.md'
+      - 'docs/**'
+      - 'configs/**'
+      - 'tools/**'
+  workflow_dispatch:
+  schedule:
+    - cron:  '56 22 * * *'
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+env:
+  CONDA_ENV: pr_test
+  HF_DATASETS_OFFLINE: 1
+  HF_EVALUATE_OFFLINE: 1
+  TRANSFORMERS_OFFLINE: 1
+  VLLM_USE_MODELSCOPE: false
+  LMDEPLOY_USE_MODELSCOPE: false
+  HF_HUB_OFFLINE: 1
+  CONDA_PATH: /fs-computility/llm/qa-llm-cicd/miniconda3
+  PIP_CACHE_PATH: /fs-computility/llm/qa-llm-cicd/.cache/pip
+  REPORT_ROOT: /fs-computility/llm/qa-llm-cicd/eval_report/prtest
+  COMPASS_DATA_CACHE: /fs-computility/llm/shared/llmeval/datasets/compass_data_cache
+  HUGGINGFACE_HUB_CACHE: /fs-computility/llm/shared/llmeval/models/opencompass_hf_hub
+  HF_HUB_CACHE: /fs-computility/llm/shared/llmeval/models/opencompass_hf_hub
+jobs:
+  pr_run_test:
+    runs-on: volc_cu12_local
+    environment: 'prod'
+    timeout-minutes: 30
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v2
+      - name: Prepare - Install opencompass
+        run: |
+          . ${{env.CONDA_PATH}}/bin/activate
+          conda activate ${{env.CONDA_ENV}}
+          python3 -m pip uninstall opencompass -y
+          python3 -m pip install -e . --cache-dir ${{env.PIP_CACHE_PATH}}
+          conda info --envs
+      - name: conda env
+        run: |
+          . ${{env.CONDA_PATH}}/bin/activate
+          conda activate ${{env.CONDA_ENV}}
+          conda info --envs
+          pip list
+          lmdeploy check_env
+      - name:  Run test
+        run: |
+          . ${{env.CONDA_PATH}}/bin/activate
+          conda activate ${{env.CONDA_ENV}}
+          conda info --envs
+          rm -rf regression_result
+          opencompass --models hf_internlm2_5_20b_chat --datasets demo_gsm8k_chat_gen --work-dir ${{env.REPORT_ROOT}}/${{ github.run_id }}/regression_result1 --debug
+          opencompass --models hf_internlm2_5_7b_chat --datasets demo_gsm8k_chat_gen --work-dir ${{env.REPORT_ROOT}}/${{ github.run_id }}/regression_result2 --debug --max-num-workers 2
+          opencompass --models hf_internlm2_5_7b_chat --datasets demo_gsm8k_chat_gen -a lmdeploy --work-dir ${{env.REPORT_ROOT}}/${{ github.run_id }}/regression_result3 --debug --max-num-workers 2
+      - name:  Get result
+        run: |
+          score=$(sed -n '$p' ${{env.REPORT_ROOT}}/${{ github.run_id }}/regression_result1/*/summary/*.csv | awk -F ',' '{print $NF}')
+          if (( ${score%.*} >= 88 && ${score%.*} <= 89 )); then
+             echo "score is $score between 88 and 89"
+          else
+             echo "score is $score not between 88 and 89"
+             exit 1
+          fi
+          score=$(sed -n '$p' ${{env.REPORT_ROOT}}/${{ github.run_id }}/regression_result2/*/summary/*.csv | awk -F ',' '{print $NF}')
+          if (( ${score%.*} >= 87 && ${score%.*} <= 88 )); then
+             echo "score is $score between 87 and 88"
+          else
+             echo "score is $score not between 87 and 88"
+             exit 1
+          fi
+          score=$(sed -n '$p' ${{env.REPORT_ROOT}}/${{ github.run_id }}/regression_result3/*/summary/*.csv | awk -F ',' '{print $NF}')
+          if (( ${score%.*} >= 87 && ${score%.*} <= 91 )); then
+             echo "score is $score between 87 and 91"
+          else
+             echo "score is $score not between 87 and 91"
+             exit 1
+          fi
+      - name:  Uninstall opencompass
+        if: always()
+        run: |
+          . ${{env.CONDA_PATH}}/bin/activate
+          conda activate ${{env.CONDA_ENV}}
+          python3 -m pip uninstall opencompass -y
+          conda info --envs
+  notify_to_feishu:
+    if: ${{ always() && !cancelled() && contains(needs.*.result, 'failure') && (github.ref_name == 'develop' || github.ref_name == 'main') }}
+    needs: [pr_run_test]
+    timeout-minutes: 5
+    runs-on: self-hosted
+    environment: 'prod'
+    steps:
+      - name: notify
+        run: |
+          curl -X POST -H "Content-Type: application/json" -d '{"msg_type":"post","content":{"post":{"zh_cn":{"title":"Opencompass- pr test failed","content":[[{"tag":"text","text":"branch: ${{github.ref_name}}, run action: ${{github.workflow}} failed. "},{"tag":"a","text":"Please click here for details ","href":"https://github.com/'${{ github.repository }}'/actions/runs/'${GITHUB_RUN_ID}'"},{"tag":"at","user_id":"'${{ secrets.USER_ID }}'"}]]}}}}'  ${{ secrets.WEBHOOK_URL }}
--- a/.github/workflows/pr-stage-check.yml
+++ b/.github/workflows/pr-stage-check.yml
+name: pr_stage_test
+on:
+  pull_request:
+    paths-ignore:
+      - 'README.md'
+      - 'README_zh-CN.md'
+      - 'docs/**'
+      - 'configs/**'
+      - 'tools/**'
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+jobs:
+  build:
+    runs-on: ubuntu-22.04
+    strategy:
+      matrix:
+        python-version: ['3.10']
+        include:
+          - torch: 2.0.0
+    steps:
+      - uses: actions/checkout@v3
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v4
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Upgrade pip
+        run: python -m pip install --upgrade pip
+      - name: Install PyTorch
+        run: pip install torch==${{matrix.torch}}+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html
+      - name: Install system dependencies
+        run: |
+          sudo sed -i '$ a deb http://th.archive.ubuntu.com/ubuntu jammy main' /etc/apt/sources.list
+          sudo apt-get update && sudo apt-get install -y libc6 libffi-dev libncursesw6 wget unzip
+      - name: Upgrade pip
+        run: python -m pip install pip --upgrade
+      - name: Install opencompass dependencies
+        run: |
+          python -m pip install -r requirements.txt
+      - name: Build and install
+        run: python -m pip install -e .
+      - name: Prepare dataset
+        run: |
+          wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
+          unzip OpenCompassData-core-20240207.zip
+      - name: Dry run test
+        run: |
+          python run.py --models hf_opt_125m --datasets siqa_gen winograd_ppl --dry-run
+  build_cu117:
+    runs-on: ubuntu-22.04
+    container:
+      image: nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu22.04
+    strategy:
+      matrix:
+        python-version: ['3.10']
+    steps:
+      - uses: actions/checkout@v3
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v4
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Fetch GPG keys
+        run: |
+          apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
+          apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub
+      - name: Install Python-dev
+        run: apt-get update && apt-get install -y python${{matrix.python-version}}-dev
+        if: ${{matrix.python-version != 3.10}}
+      - name: Install system dependencies
+        run: |
+          apt-get update
+          apt-get install -y ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libxrender-dev libc6 libc6-dev
+          sed -i '$ a deb http://th.archive.ubuntu.com/ubuntu jammy main' /etc/apt/sources.list
+          apt-get update && apt-get install -y libc6 libffi-dev libncursesw6 wget unzip
+      - name: Upgrade pip
+        run: python -m pip install pip --upgrade
+      - name: Install opencompass dependencies
+        run: |
+          python -m pip install -r requirements.txt
+      - name: Build and install
+        run: python -m pip install -e .
+      - name: Prepare dataset
+        run: |
+          wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
+          unzip OpenCompassData-core-20240207.zip
+      - name: Dry run test
+        run: |
+          python run.py --models hf_opt_125m --datasets siqa_gen winograd_ppl --dry-run
+  build_windows:
+    runs-on: windows-2022
+    strategy:
+      matrix:
+        python-version: ['3.10']
+        platform: [cpu]
+    steps:
+      - uses: actions/checkout@v3
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v4
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Upgrade pip
+        run: python -m pip install pip --upgrade
+      - name: Install PyTorch
+        run: pip install torch==2.0.0+${{matrix.platform}} -f https://download.pytorch.org/whl/${{matrix.platform}}/torch_stable.html
+      - name: Install opencompass dependencies
+        run: |
+          pip install -r requirements.txt
+      - name: Build and install
+        run: pip install -e .
+      - name: Prepare dataset
+        run: |
+          Invoke-WebRequest -Uri https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip -OutFile OpenCompassData-core-20240207.zip
+          unzip OpenCompassData-core-20240207.zip
+      - name: Dry run test
+        run: |
+          python run.py --models hf_opt_125m --datasets siqa_gen winograd_ppl --dry-run
--- a/.github/workflows/publish-to-pypi.yml
+++ b/.github/workflows/publish-to-pypi.yml
+name: deploy
+on:
+  push:
+  workflow_dispatch:
+    inputs:
+      confirm_publish:
+        description: 'Type YES to confirm publishing to PyPI'
+        required: true
+        type: string
+jobs:
+  build-n-publish:
+    runs-on: ubuntu-latest
+    if: |
+      github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags') ||
+      (github.event_name == 'workflow_dispatch' && inputs.confirm_publish == 'YES')
+    steps:
+      - uses: actions/checkout@v2
+      - name: Set up Python 3.10
+        uses: actions/setup-python@v4
+        with:
+          python-version: '3.10'
+      - name: Build lagent
+        run: |
+          pip install wheel
+          python setup.py sdist bdist_wheel
+      - name: Publish distribution to PyPI
+        run: |
+          pip install twine
+          twine upload dist/* -u __token__ -p ${{ secrets.pypi_password }}
--- a/.gitignore
+++ b/.gitignore
+.DS_Store
+output_*/
+outputs/
+scripts/
+icl_inference_output/
+.vscode/
+tmp/
+configs/eval_subjective_alignbench_test.py
+configs/openai_key.py
+configs/secrets.py
+configs/datasets/log.json
+configs/eval_debug*.py
+configs/viz_*.py
+configs/**/*_bkup.py
+opencompass/**/*_bkup.py
+data
+work_dirs
+outputs
+models/*
+configs/internal/
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+*.ipynb
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+.idea
+# Auto generate documentation
+docs/en/_build/
+docs/zh_cn/_build/
+# .zip
+*.zip
+# sft config ignore list
+configs/sft_cfg/*B_*
+configs/sft_cfg/1B/*
+configs/sft_cfg/7B/*
+configs/sft_cfg/20B/*
+configs/sft_cfg/60B/*
+configs/sft_cfg/100B/*
+configs/cky/
+configs/_internal_legacy*
+# in case llama clone in the opencompass
+llama/
+# in case ilagent clone in the opencompass
+ilagent/
+# ignore the config file for criticbench evaluation
+configs/sft_cfg/criticbench_eval/*
+# path of turbomind's model after runing `lmdeploy.serve.turbomind.deploy`
+turbomind/
+# cibench output
+*.db
+*.pth
+*.pt
+*.onnx
+*.gz
+*.gz.*
+*.png
+*.txt
+*.jpg
+*.json
+*.jsonl
+*.csv
+*.npy
+*.c
+# aliyun
+core.*
--- a/.owners.yml
+++ b/.owners.yml
+assign:
+  issues: enabled
+  pull_requests: disabled
+  strategy:
+    # random
+    daily-shift-based
+  scedule:
+    '*/1 * * * *'
+  assignees:
+    - bittersweet1999
+    - liushz
+    - MaiziXiao
+    - acylam
+    - tonysy
--- a/.pre-commit-config-zh-cn.yaml
+++ b/.pre-commit-config-zh-cn.yaml
+exclude: |
+    (?x)^(
+      tests/data/|
+      tests/dataset/|
+      opencompass/models/internal/|
+      opencompass/utils/internal/|
+      opencompass/openicl/icl_evaluator/hf_metrics/|
+      opencompass/datasets/lawbench/utils|
+      opencompass/datasets/lawbench/evaluation_functions/|
+      opencompass/datasets/medbench/|
+      opencompass/datasets/teval/|
+      opencompass/datasets/NPHardEval/|
+      opencompass/datasets/TheoremQA|
+      opencompass/datasets/subjective/mtbench101.py|
+      docs/zh_cn/advanced_guides/compassbench_intro.md |
+      docs/zh_cn/advanced_guides/compassbench_v2_0.md |
+      opencompass/utils/datasets.py |
+      opencompass/utils/datasets_info.py
+    )
+repos:
+  - repo: https://gitee.com/openmmlab/mirrors-flake8
+    rev: 5.0.4
+    hooks:
+      - id: flake8
+        exclude: |
+            (?x)^(
+                opencompass/configs/|
+                examples/
+            )
+  - repo: https://gitee.com/openmmlab/mirrors-isort
+    rev: 5.11.5
+    hooks:
+      - id: isort
+        exclude: |
+            (?x)^(
+                opencompass/configs/|
+                examples/
+            )
+  - repo: https://gitee.com/openmmlab/mirrors-yapf
+    rev: v0.32.0
+    hooks:
+      - id: yapf
+        exclude: |
+            (?x)^(
+                opencompass/configs/|
+                examples/
+            )
+  - repo: https://gitee.com/openmmlab/mirrors-codespell
+    rev: v2.2.1
+    hooks:
+      - id: codespell
+        exclude: |
+            (?x)^(
+                .*\.jsonl|
+                .*\.md.template|
+                opencompass/configs/ |
+                examples/
+            )
+  - repo: https://gitee.com/openmmlab/mirrors-pre-commit-hooks
+    rev: v4.3.0
+    hooks:
+      - id: trailing-whitespace
+        exclude: |
+            (?x)^(
+              dicts/|
+              projects/.*?/dicts/|
+            )
+      - id: check-yaml
+      - id: end-of-file-fixer
+        exclude: |
+            (?x)^(
+              dicts/|
+              projects/.*?/dicts/|
+            )
+      - id: requirements-txt-fixer
+      - id: double-quote-string-fixer
+      - id: check-merge-conflict
+      - id: fix-encoding-pragma
+        args: ["--remove"]
+      - id: mixed-line-ending
+        args: ["--fix=lf"]
+  - repo: https://gitee.com/openmmlab/mirrors-mdformat
+    rev: 0.7.9
+    hooks:
+      - id: mdformat
+        args: ["--number", "--table-width", "200"]
+        additional_dependencies:
+          - mdformat-openmmlab
+          - mdformat_frontmatter
+          - linkify-it-py
+        exclude: configs/
+  - repo: https://gitee.com/openmmlab/mirrors-docformatter
+    rev: v1.3.1
+    hooks:
+      - id: docformatter
+        args: ["--in-place", "--wrap-descriptions", "79"]
+  - repo: local
+    hooks:
+    -   id: update-dataset-suffix
+        name: dataset suffix updater
+        entry: ./tools/update_dataset_suffix.py
+        language: script
+        pass_filenames: true
+        require_serial: true
+        files: ^opencompass/configs/datasets
+  - repo: local
+    hooks:
+    -   id: update-dataset-suffix-pacakge
+        name: dataset suffix updater(package)
+        entry: ./tools/update_dataset_suffix.py
+        language: script
+        pass_filenames: false
+        # require_serial: true
+        # files: ^opencompass/configs/datasets
+        args:
+          - --root_folder
+          - opencompass/configs/datasets
+  # - repo: https://github.com/open-mmlab/pre-commit-hooks
+  #   rev: v0.2.0  # Use the ref you want to point at
+  #   hooks:
+  #     - id: check-algo-readme
+      # - id: check-copyright
+      #   args: ["mmocr", "tests", "tools"]  # these directories will be checked
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
+exclude: |
+    (?x)^(
+      tests/data/|
+      tests/dataset/|
+      opencompass/models/internal/|
+      opencompass/utils/internal/|
+      opencompass/openicl/icl_evaluator/hf_metrics/|
+      opencompass/datasets/lawbench/utils|
+      opencompass/datasets/lawbench/evaluation_functions/|
+      opencompass/datasets/medbench/|
+      opencompass/datasets/teval/|
+      opencompass/datasets/NPHardEval/|
+      opencompass/datasets/TheoremQA|
+      opencompass/datasets/subjective/mtbench101.py|
+      docs/zh_cn/advanced_guides/compassbench_intro.md |
+      docs/zh_cn/advanced_guides/compassbench_v2_0.md |
+      opencompass/utils/datasets.py |
+      opencompass/utils/datasets_info.py
+    )
+repos:
+  - repo: https://github.com/PyCQA/flake8
+    rev: 5.0.4
+    hooks:
+      - id: flake8
+        exclude: |
+            (?x)^(
+                opencompass/configs/|
+                examples/
+            )
+  - repo: https://github.com/PyCQA/isort
+    rev: 5.11.5
+    hooks:
+      - id: isort
+        exclude: |
+            (?x)^(
+                opencompass/configs/|
+                examples/
+            )
+  - repo: https://github.com/pre-commit/mirrors-yapf
+    rev: v0.32.0
+    hooks:
+      - id: yapf
+        exclude: |
+            (?x)^(
+                opencompass/configs/|
+                examples/
+            )
+  - repo: https://github.com/codespell-project/codespell
+    rev: v2.2.1
+    hooks:
+      - id: codespell
+        exclude: |
+            (?x)^(
+                .*\.jsonl|
+                .*\.md.template|
+                opencompass/configs/ |
+                examples/
+            )
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.3.0
+    hooks:
+      - id: trailing-whitespace
+        exclude: |
+            (?x)^(
+              dicts/|
+              projects/.*?/dicts/|
+            )
+      - id: check-yaml
+      - id: end-of-file-fixer
+        exclude: |
+            (?x)^(
+              dicts/|
+              projects/.*?/dicts/|
+            )
+      - id: requirements-txt-fixer
+      - id: double-quote-string-fixer
+      - id: check-merge-conflict
+      - id: fix-encoding-pragma
+        args: ["--remove"]
+      - id: mixed-line-ending
+        args: ["--fix=lf"]
+  - repo: https://github.com/executablebooks/mdformat
+    rev: 0.7.9
+    hooks:
+      - id: mdformat
+        args: ["--number", "--table-width", "200"]
+        additional_dependencies:
+          - mdformat-openmmlab
+          - mdformat_frontmatter
+          - linkify-it-py
+        exclude: configs/
+  - repo: https://github.com/myint/docformatter
+    rev: v1.3.1
+    hooks:
+      - id: docformatter
+        args: ["--in-place", "--wrap-descriptions", "79"]
+  - repo: local
+    hooks:
+    -   id: update-dataset-suffix
+        name: dataset suffix updater
+        entry: ./tools/update_dataset_suffix.py
+        language: script
+        pass_filenames: true
+        require_serial: true
+        files: ^opencompass/configs/datasets
+  - repo: local
+    hooks:
+    -   id: update-dataset-suffix-pacakge
+        name: dataset suffix updater(package)
+        entry: ./tools/update_dataset_suffix.py
+        language: script
+        pass_filenames: false
+        # require_serial: true
+        # files: ^opencompass/configs/datasets
+        args:
+          - --root_folder
+          - opencompass/configs/datasets
+  # - repo: https://github.com/open-mmlab/pre-commit-hooks
+  #   rev: v0.2.0  # Use the ref you want to point at
+  #   hooks:
+  #     - id: check-algo-readme
+      # - id: check-copyright
+      #   args: ["mmocr", "tests", "tools"]  # these directories will be checked
--- a/LICENSE
+++ b/LICENSE
+Copyright 2020 OpenCompass Authors. All rights reserved.
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright 2020 OpenCompass Authors.
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/MANIFEST.in
+++ b/MANIFEST.in
+recursive-include opencompass/configs *.py *.yml *.json *.txt *.md
+recursive-include opencompass/openicl/icl_evaluator/hf_metrics *.py
+recursive-include opencompass/datasets *.py *.yml *.json *.txt *.md *.yaml
--- a/README.md
+++ b/README.md
+# <div align="center"><strong>opencompass</strong></div>
+## 简介
+OpenCompass主要由三个核心模块构成：CompassKit、CompassHub和CompassRank。CompassRank不仅囊括了开源基准测试项目，还包含了私有基准测试。CompassHub推出了一个基准测试资源导航平台,可以在多样化的基准测试库中进行搜索与利用。CompassKit 是一系列专为大型语言模型和大型视觉-语言模型打造的强大评估工具合集，它所提供的全面评测工具集能够有效地对这些复杂模型的功能性能进行精准测量和科学评估。
+## 安装
+opencompass支持
+ Python 3.8.
+ Python 3.9.
+ Python 3.10.
+### 使用源码编译方式安装
+#### 编译环境准备
+提供2种环境准备方式：
+1. 基于光源pytorch2.3.0基础镜像环境：镜像下载地址：[https://sourcefind.cn/#/image/dcu/pytorch](https://sourcefind.cn/#/image/dcu/pytorch)，根据pytorch2.3.0、python、dtk及系统下载对应的镜像版本。
+2. 基于现有python环境：安装pytorch2.3.0，pytorch whl包下载目录：[https://cancon.hpccube.com:65024/4/main/pytorch/DAS1.3](https://cancon.hpccube.com:65024/4/main/pytorch/DAS1.3)，根据python、dtk版本,下载对应pytorch2.3.0的whl包。安装命令如下：
+```shell
+pip install torch* (下载的torch的whl包)
+pip install setuptools wheel
+```
+#### 源码编译安装
+```shell
+git clone -b 0.3.7 http://developer.hpccube.com/codes/OpenDAS/opencompass.git 
+```
+- 提供2种源码编译方式（进入opencompass目录）：
+```
+基础依赖安装：
+pip install -r requirements.txt
+pip install -r requirements/api.txt
+pip install -r requirements/extra.txt
+1. 编译whl包并安装
+python setup.py bdist_wheel 
+cd dist
+pip install opencompass*
+2. 源码编译安装(推荐)
+pip install -e .
+```
+安装humaneval(可选)
+```shell
+git clone https://github.com/open-compass/human-eval.git
+cd human-eval
+pip install -e .
+```
+#### 注意事项
+ 若使用 pip install 下载安装过慢，可添加源：-i https://pypi.tuna.tsinghua.edu.cn/simple/
+## 使用
+### 数据集准备
+```
+wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
+unzip OpenCompassData-core-20240207.zip (解压后为data目录)
+```
+### 使用说明
+列出所有配置：
+```shell
+python tools/list_configs.py
+```
+列出和llama模型以及mmlu数据集有关的配置：
+```shell
+python tools/list_configs.py llama mmlu
+```
+根据需要的框架进行安装，然后运行：
+#### 离线评测
+1、使用vllm推理验证
+环境及使用参考：[https://developer.hpccube.com/codes/OpenDAS/vllm](https://developer.hpccube.com/codes/OpenDAS/vllm)
+```shell
+python run.py examples/vllm/eval_llama2_vllm.py
+```
+其它模型使用参考`examples/vllm/eval_xxx_vllm.py`,若使用多卡，修改`tensor_parallel_size`和`num_gpus`为卡的数量
+2、使用lmdeploy推理验证
+环境及使用参考：[https://developer.hpccube.com/codes/OpenDAS/lmdeploy](https://developer.hpccube.com/codes/OpenDAS/lmdeploy)
+```shell
+#安装gpufusion 相关工具
+#https://forum.hpccube.com/thread/483 进入网页下载gpufusion工具
+#解压至dtk-24.04
+unzip gpufusion.zip -d /opt/dtk-24.04/
+#激活相关环境变量
+source /opt/dtk-24.04/env.sh
+source /opt/dtk-24.04/cuda/env.sh
+#进入opencompass进行评测
+cd opencompass 
+#fp16精度评测方法
+python run.py examples/lmdeploy/eval_llama2_lmdeploy.py
+#awq int4 评测方法
+#首先需要转换awqInt4模型  
+#model_name:模型名字如 llama2 qwen-7b 
+#awq_modelpath:awq 模型路径，例如：/dataset/llm-models/qwen/qwen-chat-7b-AWQ-4bit
+#awq_lmdeploymodel_path:生成的lmdeploy格式 awq模型路径
+lmdeploy convert ${model_name} ${awq_modelpath} --model-format awq --group-size 128 --dst-path ${awq_lmdeploymodel_path}
+#将eval_llama2_lmdeploy.py中的Llama-2-7b-hf 替换为转换好的awq_lmdeploymodel_path
+python run.py examples/lmdeploy/eval_llama2_lmdeploy.py
+```
+其它模型使用参考`examples/lmdeploy/eval_xxx_lmdeploy.py`
+3、使用tgi推理验证
+环境及使用参考：[https://developer.hpccube.com/codes/OpenDAS/text-generation-inference](https://developer.hpccube.com/codes/OpenDAS/text-generation-inference)
+```shell
+python run.py examples/tgi/eval_llama2_tgi.py
+```
+其它模型使用参考`examples/tgi/eval_xxx_tgi.py`
+4、使用torch推理验证
+```shell
+python run.py --datasets xxx --hf-type chat --hf-path $model_path
+```
+参数说明：
+（1）数据集配置参数
+`work_dir`为保存路径，`from .datasets.ARC_c.ARC_c_gen_1e0de5 import ARC_c_datasets`为使用的数据集，可以在`configs/datasets`路径下查找并配置，vllm目前不支持ppl-based评测。
+（2）模型配置参数
+`abbr`为生成数据集指标得分的列名，`path`为模型路径。
+#### API评测
+1、使用vllm推理验证
+启动服务：
+'''shell
+python -m  vllm.entrypoints.openai.api_server --model $model_path --trust-remote-code  --enforce-eager --host 0.0.0.0 --port 8000 -tp 1  --dtype float16 --max-model-len 32768 
+'''
+运行：
+```shell
+export OPENAI_API_KEY="ENV"
+python run.py examples/vllm/eval_xxx_openai_vllm.py
+```
+## 验证
+- python -c "import opencompass; print(opencompass.\_\_version__)"，版本号与官方版本同步，查询该软件的版本号，例如0.4.1；
+## Known Issue
+- 无
+## 参考资料
+- [README_ORIGIN](README_ORIGIN.md)
+- [https://github.com/open-compass/opencompass](https://github.com/open-compass/opencompass.git)
\ No newline at end of file