README.md 4.18 KB
Newer Older
huaerkl's avatar
v1.0  
huaerkl committed
1
2
3
4
5
6
7
8
9
10
11
12
# ViT 

## 论文
https://arxiv.org/abs/2010.11929
## 模型结构
![img](./images/vit.png)
## 算法原理
Vision Transformer先将图像用卷积进行分块以降低计算量,再对每一块进行展平处理变成序列,然后将序列添加位置编码和cls token,再输入多层Transformer结构提取特征,最后将cls tooken取出来通过一个MLP(多层感知机)用于分类。

Transformer的核心思想是利用注意力模块attention提取特征:
![img](./images/attention.png)
## 环境配置
chenzk's avatar
v1.1  
chenzk committed
13
14
15
16
17
18
19
20
21
22
23
24
25
26
### Docker
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.10.0-centos7.6-dtk-23.04-py38-latest
docker run --shm-size 10g --network=host --name=megatron --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $PWD/megatron-deepspeed-vit:/home/megatron-deepspeed-vit -it <your IMAGE ID> bash
pip install -r requirements.txt
```
### Dockerfile
```
cd megatron-deepspeed-vit/docker
docker build --no-cache -t megatron-:latest .
docker run --rm --shm-size 10g --network=host --name=megatron --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $PWD/../../megatron-deepspeed-vit:/home/megatron-deepspeed-vit -it megatron bash
# 若遇到Dockerfile启动的方式安装环境需要长时间等待,可注释掉里面的pip安装,启动容器后再安装python库:pip install -r requirements.txt
```
### Anaconda
huaerkl's avatar
v1.0  
huaerkl committed
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
https://developer.hpccube.com/tool/
```
DTK驱动:dtk23.04
python:python3.8
torch:1.10.0
torchvision:0.10.0
torchaudio:0.10.0
deepspeed:0.9.2
apex:0.1
```
`Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应`

2、其它非特殊库参照requirements.txt安装
```
pip install -r requirements.txt
```
## 数据集

ILSVRC 2012:
https://image-net.org/challenges/LSVRC/index.php

`imagenet 2012` 的解压与整理方法参照链接:
https://www.jianshu.com/p/a42b7d863825

整理完成后的数据目录结构如下:
```
data
    |
    train
        |
        n01440764
        n01806143
        ...
    val
        |
chenzk's avatar
v1.1  
chenzk committed
63
64
        n01440764
        n01824575
huaerkl's avatar
v1.0  
huaerkl committed
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
        ...
    test
        |
        images
            |
            test_x.JPEG
            test_xxx.JPEG
            ...
```
## 训练
进入主目录:
```
cd megatron-deepspeed-vit && mkdir logs
```
### 一、deepspeed训练:
**多机多卡:**
```
sbatch examples/vit_dsp.sh
```
**备注**:deepspeed利用shell脚本创建环境目前存在问题,可通过如下方式解决:
```
1、vim ~/.bashrc
2、末尾加入如下配置参数:
# 导入dtk
module purge
module load compiler/devtoolset/7.3.1
module load mpi/hpcx/gcc-7.3.1
module load compiler/dtk/23.04
# source /opt/dtk-23.04/env.sh
source /public/home/xxx/dtk-23.04/env.sh
# 导入python
source /public/home/xxx/anaconda3/bin/activate megatron
# 或conda activate megatron
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/public/home/xxx/anaconda3/envs/megatron/lib
3、保存.bashrc,并source ~/.bashrc使配置生效。
```
**单机多卡**(需先单独申请线上节点):
```
chenzk's avatar
v1.1  
chenzk committed
103
sh examples/dspvit_1node.sh
huaerkl's avatar
v1.0  
huaerkl committed
104
105
106
```
**单机单卡**(需先单独申请线上节点):
```
chenzk's avatar
v1.1  
chenzk committed
107
sh examples/dspvit_1dcu.sh
huaerkl's avatar
v1.0  
huaerkl committed
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
```
### 二、mpirun训练
注释[`arguments.py`](./megatron/arguments.py)中的rank和world_size:
```
# args.rank = int(os.getenv('RANK', '0'))
# args.world_size = int(os.getenv("WORLD_SIZE", '1'))
```
**多机多卡:**
```
sbatch examples/vit_mpi.sh
```
## 推理
方法类似以上训练步骤,只需在传参时额外添加以下两个参数:
```
--eval-only True \
--do_test True \
```
### 一、deepspeed测试:

**多机多卡:**
```
sbatch examples/vit_dsp.sh
```
### 二、mpirun测试:

**多机多卡:**
```
sbatch examples/vit_mpi.sh
```
## result
![img](./images/classify.png)
## 应用场景
### 算法类别
`图像分类`
### 应用行业
`制造,环境,医疗,气象`
### 算法框架
`pytorch`
## 参考资料
- https://github.com/bigscience-workshop/Megatron-DeepSpeed

- https://www.deepspeed.ai/getting-started/

- https://deepspeed.readthedocs.io/en/latest/index.html