Commit 1d5a34cf authored by wanglch's avatar wanglch
Browse files

Initial commit

parents
Pipeline #1446 canceled with stages
[flake8]
ignore = E501, F403, C901, W504, W605, E251, E122, E126, E127, E722, W503, E128, E741
select = E1, E3, E502, E7, E9, W1, W5, W6
max-line-length = 180
exclude=*.egg/*,build,dist,detection/configs/*
## Contributing to InternLM
Welcome to the InternLM community, all kinds of contributions are welcomed, including but not limited to
**Fix bug**
You can directly post a Pull Request to fix typo in code or documents
The steps to fix the bug of code implementation are as follows.
1. If the modification involve significant changes, you should create an issue first and describe the error information and how to trigger the bug. Other developers will discuss with you and propose an proper solution.
2. Posting a pull request after fixing the bug and adding corresponding unit test.
**New Feature or Enhancement**
1. If the modification involve significant changes, you should create an issue to discuss with our developers to propose an proper design.
2. Post a Pull Request after implementing the new feature or enhancement and add corresponding unit test.
**Document**
You can directly post a pull request to fix documents. If you want to add a document, you should first create an issue to check if it is reasonable.
### Pull Request Workflow
If you're not familiar with Pull Request, don't worry! The following guidance will tell you how to create a Pull Request step by step. If you want to dive into the develop mode of Pull Request, you can refer to the [official documents](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests)
#### 1. Fork and clone
If you are posting a pull request for the first time, you should fork the OpenMMLab repositories by clicking the **Fork** button in the top right corner of the GitHub page, and the forked repositories will appear under your GitHub profile.
<img src="https://user-images.githubusercontent.com/57566630/167305749-43c7f4e9-449b-4e98-ade5-0c9276d5c9ce.png" width="1200">
Then, you can clone the repositories to local:
```shell
git clone git@github.com:{username}/lmdeploy.git
```
After that, you should add official repository as the upstream repository
```bash
git remote add upstream git@github.com:InternLM/lmdeploy.git
```
Check whether remote repository has been added successfully by `git remote -v`
```bash
origin git@github.com:{username}/lmdeploy.git (fetch)
origin git@github.com:{username}/lmdeploy.git (push)
upstream git@github.com:InternLM/lmdeploy.git (fetch)
upstream git@github.com:InternLM/lmdeploy.git (push)
```
> Here's a brief introduction to origin and upstream. When we use "git clone", we create an "origin" remote by default, which points to the repository cloned from. As for "upstream", we add it ourselves to point to the target repository. Of course, if you don't like the name "upstream", you could name it as you wish. Usually, we'll push the code to "origin". If the pushed code conflicts with the latest code in official("upstream"), we should pull the latest code from upstream to resolve the conflicts, and then push to "origin" again. The posted Pull Request will be updated automatically.
#### 2. Configure pre-commit
You should configure [pre-commit](https://pre-commit.com/#intro) in the local development environment to make sure the code style matches that of InternLM. **Note**: The following code should be executed under the lmdeploy directory.
```shell
pip install -U pre-commit
pre-commit install
```
Check that pre-commit is configured successfully, and install the hooks defined in `.pre-commit-config.yaml`.
```shell
pre-commit run --all-files
```
<img src="https://user-images.githubusercontent.com/57566630/173660750-3df20a63-cb66-4d33-a986-1f643f1d8aaf.png" width="1200">
<img src="https://user-images.githubusercontent.com/57566630/202368856-0465a90d-8fce-4345-918e-67b8b9c82614.png" width="1200">
If the installation process is interrupted, you can repeatedly run `pre-commit run ... ` to continue the installation.
If the code does not conform to the code style specification, pre-commit will raise a warning and fixes some of the errors automatically.
<img src="https://user-images.githubusercontent.com/57566630/202369176-67642454-0025-4023-a095-263529107aa3.png" width="1200">
If we want to commit our code bypassing the pre-commit hook, we can use the `--no-verify` option(**only for temporarily commit**).
```shell
git commit -m "xxx" --no-verify
```
#### 3. Create a development branch
After configuring the pre-commit, we should create a branch based on the master branch to develop the new feature or fix the bug. The proposed branch name is `username/pr_name`
```shell
git checkout -b yhc/refactor_contributing_doc
```
In subsequent development, if the master branch of the local repository is behind the master branch of "upstream", we need to pull the upstream for synchronization, and then execute the above command:
```shell
git pull upstream master
```
#### 4. Commit the code and pass the unit test
- lmdeploy introduces mypy to do static type checking to increase the robustness of the code. Therefore, we need to add Type Hints to our code and pass the mypy check. If you are not familiar with Type Hints, you can refer to [this tutorial](https://docs.python.org/3/library/typing.html).
- The committed code should pass through the unit test
```shell
# Pass all unit tests
pytest tests
# Pass the unit test of runner
pytest tests/test_runner/test_runner.py
```
If the unit test fails for lack of dependencies, you can install the dependencies referring to the [guidance](#unit-test)
- If the documents are modified/added, we should check the rendering result referring to [guidance](#document-rendering)
#### 5. Push the code to remote
We could push the local commits to remote after passing through the check of unit test and pre-commit. You can associate the local branch with remote branch by adding `-u` option.
```shell
git push -u origin {branch_name}
```
This will allow you to use the `git push` command to push code directly next time, without having to specify a branch or the remote repository.
#### 6. Create a Pull Request
(1) Create a pull request in GitHub's Pull request interface
<img src="https://user-images.githubusercontent.com/57566630/201533288-516f7ac4-0b14-4dc8-afbd-912475c368b5.png" width="1200">
(2) Modify the PR description according to the guidelines so that other developers can better understand your changes
<img src="https://user-images.githubusercontent.com/57566630/202242953-c91a18ff-e388-4ff9-8591-5fae0ead6c1e.png" width="1200">
Find more details about Pull Request description in [pull request guidelines](#pr-specs).
**note**
(a) The Pull Request description should contain the reason for the change, the content of the change, and the impact of the change, and be associated with the relevant Issue (see [documentation](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue))
(b) If it is your first contribution, please sign the CLA
<img src="https://user-images.githubusercontent.com/57566630/167307569-a794b967-6e28-4eac-a942-00deb657815f.png" width="1200">
(c) Check whether the Pull Request pass through the CI
<img src="https://user-images.githubusercontent.com/57566630/167307490-f9ebf9fa-63c0-4d83-8ba1-081ea169eb3a.png" width="1200">
IternLM will run unit test for the posted Pull Request on different platforms (Linux, Window, Mac), based on different versions of Python, PyTorch, CUDA to make sure the code is correct. We can see the specific test information by clicking `Details` in the above image so that we can modify the code.
(3) If the Pull Request passes the CI, then you can wait for the review from other developers. You'll modify the code based on the reviewer's comments, and repeat the steps [4](#4-commit-the-code-and-pass-the-unit-test)-[5](#5-push-the-code-to-remote) until all reviewers approve it. Then, we will merge it ASAP.
<img src="https://user-images.githubusercontent.com/57566630/202145400-cc2cd8c4-10b0-472f-ba37-07e6f50acc67.png" width="1200">
#### 7. Resolve conflicts
If your local branch conflicts with the latest master branch of "upstream", you'll need to resolove them. There are two ways to do this:
```shell
git fetch --all --prune
git rebase upstream/master
```
or
```shell
git fetch --all --prune
git merge upstream/master
```
If you are very good at handling conflicts, then you can use rebase to resolve conflicts, as this will keep your commit logs tidy. If you are not familiar with `rebase`, then you can use `merge` to resolve conflicts.
### Guidance
#### Document rendering
If the documents are modified/added, we should check the rendering result. We could install the dependencies and run the following command to render the documents and check the results:
```shell
pip install -r requirements/docs.txt
cd docs/zh_cn/
# or docs/en
make html
# check file in ./docs/zh_cn/_build/html/index.html
```
### Code style
#### Python
We adopt [PEP8](https://www.python.org/dev/peps/pep-0008/) as the preferred code style.
We use the following tools for linting and formatting:
- [flake8](https://github.com/PyCQA/flake8): A wrapper around some linter tools.
- [isort](https://github.com/timothycrosley/isort): A Python utility to sort imports.
- [yapf](https://github.com/google/yapf): A formatter for Python files.
- [codespell](https://github.com/codespell-project/codespell): A Python utility to fix common misspellings in text files.
- [mdformat](https://github.com/executablebooks/mdformat): Mdformat is an opinionated Markdown formatter that can be used to enforce a consistent style in Markdown files.
- [docformatter](https://github.com/myint/docformatter): A formatter to format docstring.
We use [pre-commit hook](https://pre-commit.com/) that checks and formats for `flake8`, `yapf`, `isort`, `trailing whitespaces`, `markdown files`,
fixes `end-of-files`, `double-quoted-strings`, `python-encoding-pragma`, `mixed-line-ending`, sorts `requirments.txt` automatically on every commit.
The config for a pre-commit hook is stored in [.pre-commit-config](../.pre-commit-config.yaml).
#### C++ and CUDA
The clang-format config is stored in [.clang-format](../.clang-format). And it's recommended to use clang-format version **11**. Please do not use older or newer versions as they will result in differences after formatting, which can cause the [lint](https://github.com/InternLM/lmdeploy/blob/main/.github/workflows/lint.yml#L25) to fail.
### PR Specs
1. Use [pre-commit](https://pre-commit.com) hook to avoid issues of code style
2. One short-time branch should be matched with only one PR
3. Accomplish a detailed change in one PR. Avoid large PR
- Bad: Support Faster R-CNN
- Acceptable: Add a box head to Faster R-CNN
- Good: Add a parameter to box head to support custom conv-layer number
4. Provide clear and significant commit message
5. Provide clear and meaningful PR description
- Task name should be clarified in title. The general format is: \[Prefix\] Short description of the PR (Suffix)
- Prefix: add new feature \[Feature\], fix bug \[Fix\], related to documents \[Docs\], in developing \[WIP\] (which will not be reviewed temporarily)
- Introduce main changes, results and influences on other modules in short description
- Associate related issues and pull requests with a milestone
name: 🐞 Bug report
description: Create a report to help us reproduce and fix the bug
title: "[Bug] "
labels: ['Bug']
body:
- type: checkboxes
attributes:
label: Checklist
options:
- label: 1. I have searched related issues but cannot get the expected help.
- label: 2. The bug has not been fixed in the latest version.
- label: 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- type: textarea
attributes:
label: Describe the bug
description: A clear and concise description of what the bug is.
validations:
required: true
- type: textarea
attributes:
label: Reproduction
description: |
1. What command or script did you run?
placeholder: |
A placeholder for the command.
validations:
required: true
- type: textarea
attributes:
label: Environment
description: |
1. Please run `lmdeploy check_env` to collect necessary environment information and paste it here.
2. You may add addition that may be helpful for locating the problem, such as
- Which **model** are you using?
- How you installed PyTorch \[e.g., pip, conda, source\]
- Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.)
placeholder: Environment here.
render: Shell
validations:
required: true
- type: textarea
attributes:
label: Error traceback
description: |
If applicable, paste the error trackback here.
placeholder: Logs and traceback here.
render: Shell
- type: markdown
attributes:
value: >
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!
Thanks for your bug report. We appreciate it a lot.
name: 🚀 Feature request
description: Suggest an idea for this project
title: "[Feature] "
body:
- type: markdown
attributes:
value: |
We strongly appreciate you creating a PR to implement this feature [here](https://github.com/OpenGVLab/InternVL/pulls)!
If you need our help, please fill in as much of the following form as you're able to.
**The less clear the description, the longer it will take to solve it.**
- type: textarea
attributes:
label: Motivation
description: |
A clear and concise description of the motivation of the feature.
Ex1. It is inconvenient when \[....\].
validations:
required: true
- type: textarea
attributes:
label: Related resources
description: |
If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.
- type: textarea
attributes:
label: Additional context
description: |
Add any other context or screenshots about the feature request here.
If you would like to implement the feature and create a PR, please leave a comment here and that would be much appreciated.
name: 📚 Documentation
description: Report an issue related to the documentation.
labels: "kind/doc,status/unconfirmed"
title: "[Docs] "
body:
- type: textarea
attributes:
label: 📚 The doc issue
description: >
A clear and concise description the issue.
validations:
required: true
- type: textarea
attributes:
label: Suggest a potential alternative/fix
description: >
Tell us how we could improve the documentation in this regard.
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
.idea/
.DS_Store
data_process/
internvl_chat/work_dirs/
internvl_chat/unittest/
internvl_chat/data/
Husky2/*
data_process/
*distillation*
[isort]
line-length = 180
multi_line_output = 0
extra_standard_library = setuptools
known_third_party = PIL,asynctest,cityscapesscripts,cv2,gather_models,matplotlib,mmcv,numpy,onnx,onnxruntime,pycocotools,pytest,pytorch_sphinx_theme,requests,scipy,seaborn,six,terminaltables,torch,ts,yaml
no_lines_before = STDLIB,LOCALFOLDER
default_section = THIRDPARTY
[yapf]
BASED_ON_STYLE = pep8
BLANK_LINE_BEFORE_NESTED_CLASS_OR_DEF = true
SPLIT_BEFORE_EXPRESSION_AFTER_OPENING_PAREN = true
[codespell]
skip = *.ipynb
quiet-level = 3
ignore-words-list = patten,nd,ty,mot,hist,formating,winn,gool,datas,wan,confids,TOOD,tood
© 2022 GitHub, Inc.
Terms
Privacy
Security
Status
Docs
Contact GitHub
Pricing
API
exclude: ^internvl_chat_llava/
repos:
- repo: https://github.com/PyCQA/flake8
rev: 5.0.4
hooks:
- id: flake8
- repo: https://github.com/PyCQA/isort
rev: 5.11.5
hooks:
- id: isort
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0
hooks:
- id: trailing-whitespace
- id: check-yaml
- id: end-of-file-fixer
- id: requirements-txt-fixer
- id: double-quote-string-fixer
- id: check-merge-conflict
- id: fix-encoding-pragma
args: ["--remove"]
- id: mixed-line-ending
args: ["--fix=lf"]
- repo: https://github.com/executablebooks/mdformat
rev: 0.7.9
hooks:
- id: mdformat
args: ["--number"]
additional_dependencies:
- mdformat-openmmlab
- mdformat_frontmatter
- linkify-it-py
## 🛠️ Installation
- Clone this repository:
```bash
git clone https://github.com/OpenGVLab/InternVL.git
```
- Create a conda virtual environment and activate it:
```bash
conda create -n internvl python=3.9 -y
conda activate internvl
```
- Install dependencies using `requirements.txt`:
```bash
pip install -r requirements.txt
```
By default, our `requirements.txt` file includes the following dependencies:
- `-r requirements/internvl_chat.txt`
- `-r requirements/streamlit_demo.txt`
- `-r requirements/classification.txt`
- `-r requirements/segmentation.txt`
The `clip_benchmark.txt` is **not** included in the default installation. If you require the `clip_benchmark` functionality, please install it manually by running the following command:
```bash
pip install -r requirements/clip_benchmark.txt
```
### Additional Instructions
- Install `flash-attn==2.3.6`:
```bash
pip install flash-attn==2.3.6 --no-build-isolation
```
Alternatively you can compile from source:
```bash
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout v2.3.6
python setup.py install
```
- Install `mmcv-full==1.6.2` (optional, for `segmentation`):
```bash
pip install -U openmim
mim install mmcv-full==1.6.2
```
- Install `apex` (optional, for `segmentation`):
```bash
git clone https://github.com/NVIDIA/apex.git
git checkout 2386a912164b0c5cfcd8be7a2b890fbac5607c82 # https://github.com/NVIDIA/apex/issues/1735
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
```
If you encounter `ModuleNotFoundError: No module named 'fused_layer_norm_cuda'`, it is because apex's CUDA extensions are not being installed successfully. You can try uninstalling apex and the code will default to the PyTorch version of RMSNorm. Alternatively, if you prefer using apex, try adding a few lines to `setup.py` and then recompiling.
<img src=https://github.com/OpenGVLab/InternVL/assets/23737120/c04a989c-8024-49fa-b62c-2da623e63729 width=50%>
MIT License
Copyright (c) 2023 OpenGVLab
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
# InternVL2
InternVL2是一个开源的多模态大型语言模型,旨在缩小开源模型与商业专有模型在多模态理解方面的能力差距,可用于OCR、视频理解、文档问答。
## 论文
- [InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks](https://arxiv.org/abs/2312.14238)
- [How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites](https://arxiv.org/abs/2404.16821)
## 模型结构
InternVL 2.0的架构集成了一个预训练的视觉变换器模型(InternViT-6B)和一个预训练的语言模型(InternLM2-20B)。这两个模型通过一个随机初始化的多层感知器(MLP)投影器连接起来。InternViT-6B是一个视觉基础模型(VFM),它在预训练阶段通过持续学习策略进行改进,增强了模型对视觉内容的理解能力,并提高了其在不同语言模型中的适应性。InternLM2-20B作为语言基础模型,提供了强大的初始语言处理能力。在训练过程中,MLP投影器用于优化视觉特征提取,将视觉编码器的输出与语言模型的输入相匹配。
<div align="center">
<img src="./images/model2.png"/>
</div>
## 算法原理
InternVL2.0采用了一种动态的高分辨率训练方法,该方法将图像分割成448×448像素的瓦片,瓦片数量根据输入图像的纵横比和分辨率在1到12之间变化。在测试阶段,这个数量可以扩展到40个瓦片(即4K分辨率)。为了增强高分辨率下的可扩展性,模型使用了一个简单的像素洗牌操作,将视觉标记的数量减少到原始数量的四分之一。因此,在模型中,一个448×448像素的图像由256个视觉标记表示。在微调阶段,模型使用了精心选择的数据集来增强在多模态任务中的性能,这些数据集包括图像字幕、通用问答、科学图像理解、图表解释、数学问题解决、基于知识的问答、OCR和文档理解等。
<div align=center>
<img src="./images/train.png"/>
</div>
## 环境配置
### Docker(方法一)
[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=128G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name internvl2 <your imageID> bash
cd /path/your_code_data/
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
```
### Dockerfile(方法二)
```
cd /path/your_code_data/docker
docker build --no-cache -t internvl2:latest .
docker run --shm-size=128G --name internvl2 -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v /path/your_code_data/:/path/your_code_data/ -it internvl2 bash
```
### Anaconda(方法三)
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
```
DTK驱动:dtk24.04
python:python3.10
torch:2.1
torchvision: 0.16.0
deepspped: 0.12.3
```
`Tips:以上dtk驱动、python、paddle等DCU相关工具版本需要严格一一对应`
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
```
conda create -n internvl2 python=3.10
conda activate internvl2
cd /path/your_code_data/
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple
```
## 数据集
测试数据集 [ai2d](https://allenai.org/data/diagrams)
预训练需要准备你的训练数据,需要将所有图片样本放到中并存入`playground/data/`,文本文件以jsonl存入文件夹路径如下, **ai2d_train_12k.jsonl** 可以在`playground/opensource`中找到,具体可以参考官方[Fine-tune on a Custom Dataset](https://internvl.readthedocs.io/en/latest/internvl2.0/finetune.html)
```
playground/
├── opensource
│ ├── ai2d_train_12k.jsonl
├── data
│ ├── ai2d
│ │ ├── abc_images
│ │ └── images
```
下载预训练模型后,准备自定义的 SFT(监督微调)数据。之后在`internvl_chat/shell/data/`中创建一个 JSON 文件格式如下,并命名为 **internvl_1_2_finetune_custom.json**
```
{
"ai2d_train_12k": {
"root": "playground/data/ai2d/",
"annotation": "playground/opensource/ai2d_train_12k.jsonl",
"data_augment": false,
"repeat_time": 1,
"length": 12413
}
}
```
## 训练
根据实际情况在脚本中修改权重相关路径
### 单机多卡
```
sh finetune_lora_multi_dcu.sh
```
## 推理
### 单机多卡
推理前需要修改模型路径和图片路径
```
path = 'OpenGVLab/InternVL2-40B'
pixel_values = load_image('./examples/image1.jpg', max_num=12).to(torch.bfloat16).cuda()
generation_config = dict(max_new_tokens=1024, do_sample=False)
```
```
python internvl_chat.py
```
## result
### OCR
<div align=center>
<img src="./images/ocr_result.png"/>
</div>
### 问答
<div align=center>
<img src="./images/qa_result.png"/>
</div>
### 精度
测试数据:[ai2d](https://allenai.org/data/diagrams),使用的加速卡:K100AI/A800。
| device | train_loss | samples/second | samples/step |
| :------: | :------: | :------: | :------: |
| K100AI | 0.1223 | 0.118 |0.019 |
| A800 | 0.1245 | 0.249 | 0.041 |
## 应用场景
### 算法类别
`COR`
### 热点应用行业
`金融,教育,交通,政府`
## 预训练权重
- [OpenGVLab/InternVL2-40B](https://modelscope.cn/models/Duxiaoman-DI/XuanYuan-13B-Chat/files)
预训练权重快速下载中心:[SCNet AIModels](http://113.200.138.88:18080/aimodels)
项目中的预训练权重可从快速下载通道下载: [OpenGVLab/InternVL2-40B](http://113.200.138.88:18080/aimodels/opengvlab/internvl2-40b)
## 源码仓库及问题反馈
- https://developer.hpccube.com/codes/modelzoo/internvl2_pytorch
## 参考资料
- [OpenGVLab/InternVL github](https://github.com/OpenGVLab/InternVL)
- [InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks](https://arxiv.org/abs/2312.14238)
- [How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites](https://arxiv.org/abs/2404.16821)
This diff is collapsed.
This diff is collapsed.
# InternViT-6B for Image Classification
This folder contains the implementation of the InternViT-6B for image classification, which corresponds to Section 4.2.1 of our [InternVL 1.0 paper](https://arxiv.org/pdf/2312.14238).
The codebase for this part is derived from [InternImage](https://github.com/OpenGVLab/InternImage), with some code references to [EVA](https://github.com/baaivision/EVA/tree/master) and [DINOv2](https://github.com/facebookresearch/dinov2). Thanks for their great work.
In this part, we validate the visual perception capabilities of InternViT-6B, the most core component of InternVL 1.0.
We evaluate the quality of visual representation produced by InternViT-6B using the ImageNet-1K dataset. Following common practices, we adopt the linear probing evaluation, i.e. training a linear classifier while keeping the backbone frozen. In addition to the ImageNet-1K validation set,
we also report performance metrics on several ImageNet variants, to benchmark the domain generalization capability.
InternViT-6B follows the structure of vanilla ViT, and its hyperparameters are listed in the table below.
<img width="558" alt="image" src="https://github.com/OpenGVLab/InternVL/assets/23737120/e6bb0151-ab2f-4436-982f-6c68c5a69bc4">
## 🛠️ Installation
Follow the [installation guide](../INSTALLATION.md) to perform installations.
## 📦 Data Preparation
> Please prepare the dataset according to your needs.
- `ImageNet-1K`: We use the standard ImageNet dataset, you can download it from [http://image-net.org/](http://image-net.org/).
- `ImageNet-A`: Download it from [https://people.eecs.berkeley.edu/~hendrycks/imagenet-a.tar](https://people.eecs.berkeley.edu/~hendrycks/imagenet-a.tar).
- `ImageNet-R`: Download it from [https://people.eecs.berkeley.edu/~hendrycks/imagenet-r.tar](https://people.eecs.berkeley.edu/~hendrycks/imagenet-r.tar).
- `ImageNetV2`: Download it from [https://imagenetv2public.s3-us-west-2.amazonaws.com/imagenetv2-matched-frequency.tar.gz](https://imagenetv2public.s3-us-west-2.amazonaws.com/imagenetv2-matched-frequency.tar.gz).
- `ImageNet-Sketch`: Download it using `gdown`.
```shell
# GDown is needed to download the dataset.
# Please install it via `pip install gdown`
gdown --id 1Mj0i5HBthqH1p_yeXzsg22gZduvgoNeA
```
First, please prepare the `ImageNet-1K`, `ImageNet-A`, `ImageNet-R`, `ImageNetV2`, and `ImageNet-Sketch` datasets following the directory structure outlined below.
```bash
$ tree data
data
├── imagenet-1k
│ ├── train
│ ├── n01498041
│ └── ...
│ └── val
│ ├── ILSVRC2012_val_00000001.JPEG
│ └── ...
├── imagenet-a
│ ├── n01498041
│ └── ...
├── imagenet-r
│ ├── n01443537
│ └── ...
├── imagenet-sketch
│ ├── n01440764
│ └── ...
└── imagenetv2
└── ImageNetV2-matched-frequency
```
Then, unzip the `train.txt.zip` and `val.txt.zip` in `meta_data/`.
```shell
cd meta_data/
unzip train.txt.zip
unzip val.txt.zip
```
## 📦 Model Preparation
| model name | type | download | size |
| ---------------------------- | ------- | ---------------------------------------------------------------------------------------------- | :-----: |
| intern_vit_6b_224px.pth | pytorch | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL/blob/main/intern_vit_6b_224px.pth) | 12 GB |
| intern_vit_6b_224px_head.pth | pytorch | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL/blob/main/intern_vit_6b_224px_head.pth) | 25.7 MB |
Please download the above model weights and place them in the `pretrained/` folder.
```sh
cd pretrained
wget https://huggingface.co/OpenGVLab/InternVL/resolve/main/intern_vit_6b_224px.pth
wget https://huggingface.co/OpenGVLab/InternVL/resolve/main/intern_vit_6b_224px_head.pth
```
The directory structure is:
```sh
pretrained
├── intern_vit_6b_224px_head.pth
└── intern_vit_6b_224px.pth
```
## 🔍 Linear Probing on ImageNet-1K
> **Warning**: Please install `apex` before training (see [installation guide](../INSTALLATION.md#additional-instructions) for details).
To train a linear classifier for `InternViT-6B` on ImageNet with 8 GPUs, run:
```bash
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --cfg configs/intern_vit_6b_1k_224.yaml
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224.yaml --launcher slurm
```
Note, it is normal for the following information to appear during training and it can be safely ignored:
> \_IncompatibleKeys(missing_keys=\[\], unexpected_keys=\['clip_projector.norm1_q.weight', 'clip_projector.norm1_q.bias', 'clip_projector.norm1_k.weight', 'clip_projector.norm1_k.bias', 'clip_projector.norm1_v.weight', 'clip_projector.norm1_v.bias', 'clip_projector.cross_attn.q_bias', 'clip_projector.cross_attn.k_bias', 'clip_projector.cross_attn.v_bias', 'clip_projector.cross_attn.q.weight', 'clip_projector.cross_attn.k.weight', 'clip_projector.cross_attn.v.weight', 'clip_projector.cross_attn.proj.weight', 'clip_projector.cross_attn.proj.bias'\])
## 📊 Evaluation
> **Warning**: Please install `apex` before evaluation (see [installation guide](../INSTALLATION.md#additional-instructions) for details).
| model name | IN-1K | IN-ReaL | IN-V2 | IN-A | IN-R | IN-Sketch | download |
| -------------------------------------------------------------- | :---: | :-----: | :---: | :--: | :--: | :-------: | :--------------------------------------------------------------------------------------------------------------------------------------------------: |
| [intern_vit_6b_1k_224.yaml](configs/intern_vit_6b_1k_224.yaml) | 88.2 | 90.4 | 79.9 | 77.5 | 89.8 | 69.1 | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/intern_vit_6b_224px_head.pth) \| [log](./work_dirs/intern_vit_6b_1k_224/log_rank0.txt) |
<details>
<summary>Evaluate InternViT-6B on <b>ImageNet-1K val</b> with 8 GPUs (click to expand).</summary>
```bash
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
--cfg configs/intern_vit_6b_1k_224.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224.yaml --eval \
--resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm
```
Expected results:
```
* Acc@1 88.230 Acc@5 98.474
Accuracy of the network on the 50000 test images: 88.2%
```
</details>
<details>
<summary>Evaluate InternViT-6B on <b>ImageNet-ReaL</b> with 1 GPU (click to expand).</summary>
**Note: ImageNet-ReaL now only supports single-GPU testing.**
```bash
python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --eval \
--cfg configs/intern_vit_6b_1k_224_test_imagenet_real.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=1 GPUS_PER_NODE=1 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenet_real.yaml --eval \
--resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm
```
Expected results:
```
* ReaL Acc@1 90.437 Acc@5 98.567 loss 0.605
ReaL Accuracy of the network on the 50000 test images: 90.4%
```
</details>
<details>
<summary>Evaluate InternViT-6B on <b>ImageNetV2</b> with 8 GPUs (click to expand).</summary>
```bash
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
--cfg configs/intern_vit_6b_1k_224_test_imagenetv2.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenetv2.yaml --eval \
--resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm
```
Expected results:
```
* Acc@1 79.940 Acc@5 95.340
Accuracy of the network on the 10000 test images: 79.9%
```
</details>
<details>
<summary>Evaluate InternViT-6B on <b>ImageNet-A</b> with 8 GPUs (click to expand).</summary>
```bash
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
--cfg configs/intern_vit_6b_1k_224_test_imagenet_a.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenet_a.yaml --eval \
--resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm
```
Expected results:
```
* Acc@1 77.479 Acc@5 92.737
Accuracy of the network on the 7500 test images: 77.5%
```
</details>
<details>
<summary>Evaluate InternViT-6B on <b>ImageNet-R</b> with 8 GPUs (click to expand).</summary>
```bash
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
--cfg configs/intern_vit_6b_1k_224_test_imagenet_r.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenet_r.yaml --eval \
--resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm
```
Expected results:
```
* Acc@1 89.777 Acc@5 97.023
Accuracy of the network on the 30000 test images: 89.8%
```
</details>
<details>
<summary>Evaluate InternViT-6B on <b>ImageNet-Sketch</b> with 8 GPUs (click to expand).</summary>
```bash
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
--cfg configs/intern_vit_6b_1k_224_test_imagenet_sketch.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenet_sketch.yaml --eval \
--resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm
```
Expected results:
```
* Acc@1 69.117 Acc@5 88.341
Accuracy of the network on the 50889 test images: 69.1%
```
</details>
# --------------------------------------------------------
# InternVL
# Copyright (c) 2022 OpenGVLab
# Licensed under The MIT License [see LICENSE for details]
# --------------------------------------------------------
import os
import yaml
from yacs.config import CfgNode as CN
_C = CN()
# Base config files
_C.BASE = ['']
# -----------------------------------------------------------------------------
# Data settings
# -----------------------------------------------------------------------------
_C.DATA = CN()
# Batch size for a single GPU, could be overwritten by command line argument
_C.DATA.BATCH_SIZE = 128
# Path to dataset, could be overwritten by command line argument
_C.DATA.DATA_PATH = ''
# Dataset name
_C.DATA.DATASET = 'imagenet'
# Input image size
_C.DATA.IMG_SIZE = 224
# Interpolation to resize image (random, bilinear, bicubic)
_C.DATA.INTERPOLATION = 'bicubic'
# Use zipped dataset instead of folder dataset
# could be overwritten by command line argument
_C.DATA.ZIP_MODE = False
# Cache Data in Memory, could be overwritten by command line argument
_C.DATA.CACHE_MODE = 'part'
# Pin CPU memory in DataLoader for more efficient (sometimes) transfer to GPU.
_C.DATA.PIN_MEMORY = True
# Number of data loading threads
_C.DATA.NUM_WORKERS = 8
# Load data to memory
_C.DATA.IMG_ON_MEMORY = False
# Name of the build_transform function
_C.DATA.TRANSFORM = 'build_transform'
# -----------------------------------------------------------------------------
# Model settings
# -----------------------------------------------------------------------------
_C.MODEL = CN()
# Model type
_C.MODEL.TYPE = 'intern_vit_6b'
# Model name
_C.MODEL.NAME = 'intern_vit_6b'
# Pretrained weight from checkpoint, could be imagenet22k pretrained weight
# could be overwritten by command line argument
_C.MODEL.PRETRAINED = ''
# Checkpoint to resume, could be overwritten by command line argument
_C.MODEL.RESUME = ''
# Number of classes, overwritten in data preparation
_C.MODEL.NUM_CLASSES = 1000
# Dropout rate
_C.MODEL.DROP_RATE = 0.0
# Drop path rate
_C.MODEL.DROP_PATH_RATE = 0.1
# Drop path type
_C.MODEL.DROP_PATH_TYPE = 'linear' # linear, uniform
# Label Smoothing
_C.MODEL.LABEL_SMOOTHING = 0.1
# INTERN_VIT_6B parameters
_C.MODEL.INTERN_VIT_6B = CN()
_C.MODEL.INTERN_VIT_6B.PATCH_SIZE = 14
_C.MODEL.INTERN_VIT_6B.PRETRAIN_SIZE = 224
_C.MODEL.INTERN_VIT_6B.QKV_BIAS = False
_C.MODEL.INTERN_VIT_6B.EMBED_DIM = 3200
_C.MODEL.INTERN_VIT_6B.NUM_HEADS = 25
_C.MODEL.INTERN_VIT_6B.MLP_RATIO = 4
_C.MODEL.INTERN_VIT_6B.INIT_VALUES = 0.1
_C.MODEL.INTERN_VIT_6B.QK_NORMALIZATION = True
_C.MODEL.INTERN_VIT_6B.DEPTH = 48
_C.MODEL.INTERN_VIT_6B.USE_FLASH_ATTN = True
_C.MODEL.INTERN_VIT_6B.FREEZE_VIT = True
_C.MODEL.INTERN_VIT_6B.PRETRAINED = None
_C.MODEL.INTERN_VIT_6B.CLS_TARGET = 'cls_patch_concat'
_C.MODEL.INTERN_VIT_6B.HEAD_NORM_TYPE = 'bn'
# -----------------------------------------------------------------------------
# Training settings
# -----------------------------------------------------------------------------
_C.TRAIN = CN()
_C.TRAIN.START_EPOCH = 0
_C.TRAIN.EPOCHS = 300
_C.TRAIN.WARMUP_EPOCHS = 20
_C.TRAIN.WEIGHT_DECAY = 0.05
_C.TRAIN.BASE_LR = 5e-4
_C.TRAIN.WARMUP_LR = 5e-7
_C.TRAIN.MIN_LR = 5e-6
# Clip gradient norm
_C.TRAIN.CLIP_GRAD = 5.0
# Auto resume from latest checkpoint
_C.TRAIN.AUTO_RESUME = True
# Gradient accumulation steps
# could be overwritten by command line argument
_C.TRAIN.ACCUMULATION_STEPS = 0
# Whether to use gradient checkpointing to save memory
# could be overwritten by command line argument
_C.TRAIN.USE_CHECKPOINT = False
# LR scheduler
_C.TRAIN.LR_SCHEDULER = CN()
_C.TRAIN.LR_SCHEDULER.NAME = 'cosine'
# Epoch interval to decay LR, used in StepLRScheduler
_C.TRAIN.LR_SCHEDULER.DECAY_EPOCHS = 30
# LR decay rate, used in StepLRScheduler
_C.TRAIN.LR_SCHEDULER.DECAY_RATE = 0.1
# Optimizer
_C.TRAIN.OPTIMIZER = CN()
_C.TRAIN.OPTIMIZER.NAME = 'adamw'
# Optimizer Epsilon
_C.TRAIN.OPTIMIZER.EPS = 1e-8
# Optimizer Betas
_C.TRAIN.OPTIMIZER.BETAS = (0.9, 0.999)
# SGD momentum
_C.TRAIN.OPTIMIZER.MOMENTUM = 0.9
# ZeRO
_C.TRAIN.OPTIMIZER.USE_ZERO = False
# freeze backbone
_C.TRAIN.OPTIMIZER.FREEZE_BACKBONE = None
# dcn lr
_C.TRAIN.OPTIMIZER.DCN_LR_MUL = None
# EMA
_C.TRAIN.EMA = CN()
_C.TRAIN.EMA.ENABLE = False
_C.TRAIN.EMA.DECAY = 0.9998
# LR_LAYER_DECAY
_C.TRAIN.LR_LAYER_DECAY = False
_C.TRAIN.LR_LAYER_DECAY_RATIO = 0.875
# FT head init weights
_C.TRAIN.RAND_INIT_FT_HEAD = False
# -----------------------------------------------------------------------------
# Augmentation settings
# -----------------------------------------------------------------------------
_C.AUG = CN()
# Color jitter factor
_C.AUG.COLOR_JITTER = 0.4
# Use AutoAugment policy. "v0" or "original"
_C.AUG.AUTO_AUGMENT = 'rand-m9-mstd0.5-inc1'
# Random erase prob
_C.AUG.REPROB = 0.25
# Random erase mode
_C.AUG.REMODE = 'pixel'
# Random erase count
_C.AUG.RECOUNT = 1
# Mixup alpha, mixup enabled if > 0
_C.AUG.MIXUP = 0.8
# Cutmix alpha, cutmix enabled if > 0
_C.AUG.CUTMIX = 1.0
# Cutmix min/max ratio, overrides alpha and enables cutmix if set
_C.AUG.CUTMIX_MINMAX = None
# Probability of performing mixup or cutmix when either/both is enabled
_C.AUG.MIXUP_PROB = 1.0
# Probability of switching to cutmix when both mixup and cutmix enabled
_C.AUG.MIXUP_SWITCH_PROB = 0.5
# How to apply mixup/cutmix params. Per "batch", "pair", or "elem"
_C.AUG.MIXUP_MODE = 'batch'
# RandomResizedCrop
_C.AUG.RANDOM_RESIZED_CROP = False
_C.AUG.MEAN = (0.485, 0.456, 0.406)
_C.AUG.STD = (0.229, 0.224, 0.225)
# -----------------------------------------------------------------------------
# Testing settings
# -----------------------------------------------------------------------------
_C.TEST = CN()
# Whether to use center crop when testing
_C.TEST.CROP = True
# Whether to use SequentialSampler as validation sampler
_C.TEST.SEQUENTIAL = False
# -----------------------------------------------------------------------------
# Misc
# -----------------------------------------------------------------------------
# Mixed precision opt level, if O0, no amp is used ('O0', 'O1', 'O2')
# overwritten by command line argument
_C.AMP_OPT_LEVEL = ''
# Path to output folder, overwritten by command line argument
_C.OUTPUT = ''
# Tag of experiment, overwritten by command line argument
_C.TAG = 'default'
# Frequency to save checkpoint
_C.SAVE_FREQ = 1
# Frequency to logging info
_C.PRINT_FREQ = 10
# eval freq
_C.EVAL_FREQ = 1
# Fixed random seed
_C.SEED = 0
# Perform evaluation only, overwritten by command line argument
_C.EVAL_MODE = False
# Test throughput only, overwritten by command line argument
_C.THROUGHPUT_MODE = False
# local rank for DistributedDataParallel, given by command line argument
_C.LOCAL_RANK = 0
_C.EVAL_22K_TO_1K = False
_C.AMP_TYPE = 'float16'
def _update_config_from_file(config, cfg_file):
config.defrost()
with open(cfg_file, 'r') as f:
yaml_cfg = yaml.load(f, Loader=yaml.FullLoader)
for cfg in yaml_cfg.setdefault('BASE', ['']):
if cfg:
_update_config_from_file(
config, os.path.join(os.path.dirname(cfg_file), cfg))
print('=> merge config from {}'.format(cfg_file))
config.merge_from_file(cfg_file)
config.freeze()
def update_config(config, args):
_update_config_from_file(config, args.cfg)
config.defrost()
if hasattr(args, 'opts') and args.opts:
config.merge_from_list(args.opts)
# merge from specific arguments
if hasattr(args, 'batch_size') and args.batch_size:
config.DATA.BATCH_SIZE = args.batch_size
if hasattr(args, 'dataset') and args.dataset:
config.DATA.DATASET = args.dataset
if hasattr(args, 'data_path') and args.data_path:
config.DATA.DATA_PATH = args.data_path
if hasattr(args, 'zip') and args.zip:
config.DATA.ZIP_MODE = True
if hasattr(args, 'cache_mode') and args.cache_mode:
config.DATA.CACHE_MODE = args.cache_mode
if hasattr(args, 'pretrained') and args.pretrained:
config.MODEL.PRETRAINED = args.pretrained
if hasattr(args, 'resume') and args.resume:
config.MODEL.RESUME = args.resume
if hasattr(args, 'accumulation_steps') and args.accumulation_steps:
config.TRAIN.ACCUMULATION_STEPS = args.accumulation_steps
if hasattr(args, 'use_checkpoint') and args.use_checkpoint:
config.TRAIN.USE_CHECKPOINT = True
if hasattr(args, 'amp_opt_level') and args.amp_opt_level:
config.AMP_OPT_LEVEL = args.amp_opt_level
if hasattr(args, 'output') and args.output:
config.OUTPUT = args.output
if hasattr(args, 'tag') and args.tag:
config.TAG = args.tag
if hasattr(args, 'eval') and args.eval:
config.EVAL_MODE = True
if hasattr(args, 'throughput') and args.throughput:
config.THROUGHPUT_MODE = True
if hasattr(args, 'save_ckpt_num') and args.save_ckpt_num:
config.SAVE_CKPT_NUM = args.save_ckpt_num
if hasattr(args, 'use_zero') and args.use_zero:
config.TRAIN.OPTIMIZER.USE_ZERO = True
# set local rank for distributed training
if hasattr(args, 'local_rank') and args.local_rank:
config.LOCAL_RANK = args.local_rank
# output folder
config.MODEL.NAME = args.cfg.split('/')[-1].replace('.yaml', '')
config.OUTPUT = os.path.join(config.OUTPUT, config.MODEL.NAME)
# config.OUTPUT = os.path.join(config.OUTPUT, config.MODEL.NAME, config.TAG)
config.freeze()
def get_config(args):
"""Get a yacs CfgNode object with default values."""
# Return a clone so that the defaults will not be altered
# This is for the "local variable" use pattern
config = _C.clone()
update_config(config, args)
return config
DATA:
IMG_ON_MEMORY: False
BATCH_SIZE: 128
TRANSFORM: 'build_transform_for_linear_probe'
DATA_PATH: './data/imagenet-1k'
MODEL:
TYPE: intern_vit_6b
DROP_PATH_RATE: 0.0
INTERN_VIT_6B:
FREEZE_VIT: True
PATCH_SIZE: 14
PRETRAIN_SIZE: 224
QKV_BIAS: False
EMBED_DIM: 3200
NUM_HEADS: 25
MLP_RATIO: 4
INIT_VALUES: 0.1
QK_NORMALIZATION: True
DEPTH: 48
USE_FLASH_ATTN: True
PRETRAINED: "./pretrained/intern_vit_6b_224px.pth"
CLS_TARGET: 'cls_patch_concat'
TRAIN:
EMA:
ENABLE: False
DECAY: 0.998
EPOCHS: 10
WARMUP_EPOCHS: 1
WEIGHT_DECAY: 0.0
BASE_LR: 0.1 # 512
WARMUP_LR: .0
MIN_LR: .0
LR_LAYER_DECAY: false
OPTIMIZER:
NAME: 'sgd'
DATA:
IMG_ON_MEMORY: False
BATCH_SIZE: 128
DATASET: 'imagenet_a'
TRANSFORM: 'build_transform_for_linear_probe'
DATA_PATH: './data/imagenet-a'
MODEL:
TYPE: intern_vit_6b
DROP_PATH_RATE: 0.0
INTERN_VIT_6B:
FREEZE_VIT: True
PATCH_SIZE: 14
PRETRAIN_SIZE: 224
QKV_BIAS: False
EMBED_DIM: 3200
NUM_HEADS: 25
MLP_RATIO: 4
INIT_VALUES: 0.1
QK_NORMALIZATION: True
DEPTH: 48
USE_FLASH_ATTN: True
PRETRAINED: "./pretrained/intern_vit_6b_224px.pth"
CLS_TARGET: 'cls_patch_concat'
TRAIN:
EMA:
ENABLE: False
DECAY: 0.998
EPOCHS: 10
WARMUP_EPOCHS: 1
WEIGHT_DECAY: 0.0
BASE_LR: 0.1 # 512
WARMUP_LR: .0
MIN_LR: .0
LR_LAYER_DECAY: false
OPTIMIZER:
NAME: 'sgd'
DATA:
IMG_ON_MEMORY: False
BATCH_SIZE: 128
DATASET: 'imagenet_r'
TRANSFORM: 'build_transform_for_linear_probe'
DATA_PATH: './data/imagenet-r'
MODEL:
TYPE: intern_vit_6b
DROP_PATH_RATE: 0.0
INTERN_VIT_6B:
FREEZE_VIT: True
PATCH_SIZE: 14
PRETRAIN_SIZE: 224
QKV_BIAS: False
EMBED_DIM: 3200
NUM_HEADS: 25
MLP_RATIO: 4
INIT_VALUES: 0.1
QK_NORMALIZATION: True
DEPTH: 48
USE_FLASH_ATTN: True
PRETRAINED: "./pretrained/intern_vit_6b_224px.pth"
CLS_TARGET: 'cls_patch_concat'
TRAIN:
EMA:
ENABLE: False
DECAY: 0.998
EPOCHS: 10
WARMUP_EPOCHS: 1
WEIGHT_DECAY: 0.0
BASE_LR: 0.1 # 512
WARMUP_LR: .0
MIN_LR: .0
LR_LAYER_DECAY: false
OPTIMIZER:
NAME: 'sgd'
DATA:
IMG_ON_MEMORY: False
BATCH_SIZE: 128
DATASET: 'imagenet-real'
TRANSFORM: 'build_transform_for_linear_probe'
DATA_PATH: './data/imagenet-1k'
MODEL:
TYPE: intern_vit_6b
DROP_PATH_RATE: 0.0
INTERN_VIT_6B:
FREEZE_VIT: True
PATCH_SIZE: 14
PRETRAIN_SIZE: 224
QKV_BIAS: False
EMBED_DIM: 3200
NUM_HEADS: 25
MLP_RATIO: 4
INIT_VALUES: 0.1
QK_NORMALIZATION: True
DEPTH: 48
USE_FLASH_ATTN: True
PRETRAINED: "./pretrained/intern_vit_6b_224px.pth"
CLS_TARGET: 'cls_patch_concat'
TRAIN:
EMA:
ENABLE: False
DECAY: 0.998
EPOCHS: 10
WARMUP_EPOCHS: 1
WEIGHT_DECAY: 0.0
BASE_LR: 0.1 # 512
WARMUP_LR: .0
MIN_LR: .0
LR_LAYER_DECAY: false
OPTIMIZER:
NAME: 'sgd'
DATA:
IMG_ON_MEMORY: False
BATCH_SIZE: 128
DATASET: 'imagenet_sketch'
TRANSFORM: 'build_transform_for_linear_probe'
DATA_PATH: './data/imagenet-sketch'
MODEL:
TYPE: intern_vit_6b
DROP_PATH_RATE: 0.0
INTERN_VIT_6B:
FREEZE_VIT: True
PATCH_SIZE: 14
PRETRAIN_SIZE: 224
QKV_BIAS: False
EMBED_DIM: 3200
NUM_HEADS: 25
MLP_RATIO: 4
INIT_VALUES: 0.1
QK_NORMALIZATION: True
DEPTH: 48
USE_FLASH_ATTN: True
PRETRAINED: "./pretrained/intern_vit_6b_224px.pth"
CLS_TARGET: 'cls_patch_concat'
TRAIN:
EMA:
ENABLE: False
DECAY: 0.998
EPOCHS: 10
WARMUP_EPOCHS: 1
WEIGHT_DECAY: 0.0
BASE_LR: 0.1 # 512
WARMUP_LR: .0
MIN_LR: .0
LR_LAYER_DECAY: false
OPTIMIZER:
NAME: 'sgd'
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment