Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
Megatron-DeepSpeed-ViT_pytorch
Commits
bed86591
"vscode:/vscode.git/clone" did not exist on "833a35140b21ad06ba2ba574903a66761c958acb"
Commit
bed86591
authored
Aug 21, 2023
by
chenzk
Browse files
v1.2.2
parent
c727dd02
Pipeline
#518
failed with stage
Changes
5
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
19 additions
and
62 deletions
+19
-62
README.md
README.md
+19
-62
doc/attention.png
doc/attention.png
+0
-0
doc/cases_april2021.png
doc/cases_april2021.png
+0
-0
doc/classify.png
doc/classify.png
+0
-0
doc/vit.png
doc/vit.png
+0
-0
No files found.
README.md
View file @
bed86591
...
@@ -6,26 +6,27 @@
...
@@ -6,26 +6,27 @@
## 模型结构
## 模型结构
Vision Transformer先将图像用卷积进行分块以降低计算量,再对每一块进行展平处理变成序列,然后将序列添加位置编码和cls token,再输入多层Transformer结构提取特征,最后将cls tooken取出来通过一个MLP(多层感知机)用于分类。
Vision Transformer先将图像用卷积进行分块以降低计算量,再对每一块进行展平处理变成序列,然后将序列添加位置编码和cls token,再输入多层Transformer结构提取特征,最后将cls tooken取出来通过一个MLP(多层感知机)用于分类。


## 算法原理
## 算法原理
图像领域借鉴《Transformer is all you need!》算法论文中的Encoder结构提取特征,Transformer的核心思想是利用注意力模块attention提取特征:
图像领域借鉴《Transformer is all you need!》算法论文中的Encoder结构提取特征,Transformer的核心思想是利用注意力模块attention提取特征:


## 环境配置
## 环境配置
### Docker
### Docker
(方法一)
```
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.10.0-centos7.6-dtk-23.04-py38-latest
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.10.0-centos7.6-dtk-23.04-py38-latest
# <your IMAGE ID>用以上拉取的docker的镜像ID替换
docker run --shm-size 10g --network=host --name=megatron --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $PWD/megatron-deepspeed-vit:/home/megatron-deepspeed-vit -it <your IMAGE ID> bash
docker run --shm-size 10g --network=host --name=megatron --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $PWD/megatron-deepspeed-vit:/home/megatron-deepspeed-vit -it <your IMAGE ID> bash
pip install -r requirements.txt
pip install -r requirements.txt
```
```
### Dockerfile
### Dockerfile
(方法二)
```
```
cd megatron-deepspeed-vit/docker
cd megatron-deepspeed-vit/docker
docker build --no-cache -t megatron
-
:latest .
docker build --no-cache -t megatron:latest .
docker run --rm --shm-size 10g --network=host --name=megatron --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $PWD/../../megatron-deepspeed-vit:/home/megatron-deepspeed-vit -it megatron bash
docker run --rm --shm-size 10g --network=host --name=megatron --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $PWD/../../megatron-deepspeed-vit:/home/megatron-deepspeed-vit -it megatron bash
# 若遇到Dockerfile启动的方式安装环境需要长时间等待,可注释掉里面的pip安装,启动容器后再安装python库:pip install -r requirements.txt
# 若遇到Dockerfile启动的方式安装环境需要长时间等待,可注释掉里面的pip安装,启动容器后再安装python库:pip install -r requirements.txt
```
```
### Anaconda
### Anaconda
(方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
https://developer.hpccube.com/tool/
https://developer.hpccube.com/tool/
```
```
...
@@ -45,13 +46,14 @@ pip install -r requirements.txt
...
@@ -45,13 +46,14 @@ pip install -r requirements.txt
```
```
## 数据集
## 数据集
ILSVRC 2012:
`ILSVRC 2012`
https://image-net.org/challenges/LSVRC/index.php
-
https://image-net.org/challenges/LSVRC/index.php
`imagenet 2012`
的解压与整理方法参照链接:
`imagenet 2012`
的解压与整理方法参照链接:
https://www.jianshu.com/p/a42b7d863825
https://www.jianshu.com/p/a42b7d863825
整理完成后的数据目录结构如下
:
项目中已提供用于试验训练的迷你数据集,训练数据目录结构如下,用于正常训练的完整数据集请按此目录结构进行制备
:
```
```
data
data
|
|
...
@@ -74,77 +76,32 @@ data
...
@@ -74,77 +76,32 @@ data
...
...
```
```
## 训练
## 训练
进入主目录:
### 单机多卡
```
cd megatron-deepspeed-vit && mkdir logs
```
### 一、deepspeed训练:
**多机多卡:**
```
sbatch examples/vit_dsp.sh
```
**备注**
:deepspeed利用shell脚本创建环境目前存在问题,可通过如下方式解决:
```
1、vim ~/.bashrc
2、末尾加入如下配置参数:
# 导入dtk
module purge
module load compiler/devtoolset/7.3.1
module load mpi/hpcx/gcc-7.3.1
module load compiler/dtk/23.04
# source /opt/dtk-23.04/env.sh
source /public/home/xxx/dtk-23.04/env.sh
# 导入python
source /public/home/xxx/anaconda3/bin/activate megatron
# 或conda activate megatron
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/public/home/xxx/anaconda3/envs/megatron/lib
3、保存.bashrc,并source ~/.bashrc使配置生效。
```
**单机多卡**
(需先单独申请线上节点):
```
```
cd megatron-deepspeed-vit
sh examples/dspvit_1node.sh
sh examples/dspvit_1node.sh
```
```
**单机单卡**
(需先单独申请线上节点):
### 单机单卡
```
```
sh examples/dspvit_1dcu.sh
sh examples/dspvit_1dcu.sh
```
```
### 二、mpirun训练
注释
[
`arguments.py`
](
./megatron/arguments.py
)
中的rank和world_size:
```
# args.rank = int(os.getenv('RANK', '0'))
# args.world_size = int(os.getenv("WORLD_SIZE", '1'))
```
**多机多卡:**
```
sbatch examples/vit_mpi.sh
```
## 推理
## 推理
方法类似以上训练步骤,只需
在
传参时额外添加以下两个参数:
方法类似以上训练步骤,只需传参时
在
[
`dspvit_1node.sh`
](
./examples/dspvit_1node.sh
)
中
额外添加以下两个参数:
```
```
--eval-only True \
--eval-only True \
--do_test True \
--do_test True \
```
```
### 一、deepspeed测试:
### 单机多卡
**多机多卡:**
```
sbatch examples/vit_dsp.sh
```
### 二、mpirun测试:
**多机多卡:**
```
```
s
batc
h examples/vit_
mpi
.sh
sh examples/
dsp
vit_
1node
.sh
```
```
## result
## result


## 应用场景
## 应用场景
### 算法类别
### 算法类别
`图像分类`
`图像分类`
### 应用行业
###
热点
应用行业
`制造,环境,医疗,气象`
`制造,环境,医疗,气象`
### 算法框架
`pytorch`
## 源码仓库及问题反馈
## 源码仓库及问题反馈
-
https://developer.hpccube.com/codes/modelzoo/megatron-deepspeed-vit_pytorch
-
https://developer.hpccube.com/codes/modelzoo/megatron-deepspeed-vit_pytorch
## 参考资料
## 参考资料
...
...
images
/attention.png
→
doc
/attention.png
View file @
bed86591
File moved
images
/cases_april2021.png
→
doc
/cases_april2021.png
View file @
bed86591
File moved
images
/classify.png
→
doc
/classify.png
View file @
bed86591
File moved
images
/vit.png
→
doc
/vit.png
View file @
bed86591
File moved
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment