"tests/pipelines/flux/__init__.py" did not exist on "3f1e95928e905daccb4ae8bce0049612c53f0737"
README.md 3.66 KB
Newer Older
sunxx1's avatar
sunxx1 committed
1
# ViT
unknown's avatar
unknown committed
2

sunxx1's avatar
sunxx1 committed
3
4
5
6
7
## 论文

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

https://arxiv.org/abs/2010.11929
unknown's avatar
unknown committed
8
9
10

## 模型结构

sunxx1's avatar
sunxx1 committed
11
Vision Transformer先将图像用卷积进行分块以降低计算量,再对每一块进行展平处理变成序列,然后将序列添加位置编码和cls token,再输入多层Transformer结构提取特征,最后将cls tooken取出来通过一个MLP(多层感知机)用于分类。
unknown's avatar
unknown committed
12

sunxx1's avatar
sunxx1 committed
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
![img](https://developer.hpccube.com/codes/modelzoo/megatron-deepspeed-vit_pytorch/-/raw/main/doc/vit.png)

## 算法原理
图像领域借鉴《Transformer is all you need!》算法论文中的Encoder结构提取特征,Transformer的核心思想是利用注意力模块attention提取特征:

![img](https://developer.hpccube.com/codes/modelzoo/megatron-deepspeed-vit_pytorch/-/raw/main/doc/attention.png)

## 环境配置

### Docker(方法一)

```plaintext
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.10.0-centos7.6-dtk-22.10.1-py37-latest
# <your IMAGE ID>用以上拉取的docker的镜像ID替换
docker run --shm-size 10g --network=host --name=nit-pytorch --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $PWD/megatron-deepspeed-vit:/home/vit-pytorch -it <your IMAGE ID> bash
pip install -r requirements.txt
```

### Dockerfile(方法二)

```plaintext
cd ViT-PyTorch/docker
docker build --no-cache -t ViT-PyTorch:latest .
docker run --rm --shm-size 10g --network=host --name=megatron --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $PWD/../../ViT-PyTorch:/home/ViT-PyTorch -it megatron bash
# 若遇到Dockerfile启动的方式安装环境需要长时间等待,可注释掉里面的pip安装,启动容器后再安装python库:pip install -r requirements.txt
```

### Anaconda(方法三)

1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/
unknown's avatar
unknown committed
43

sunxx1's avatar
sunxx1 committed
44
45
46
47
48
49
50
51
52
```plaintext
DTK驱动:dtk22.10.1
python:python3.7
torch:1.10.0
torchvision:0.10.0
Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应
```

2、其它非特殊库参照requirements.txt安装
unknown's avatar
unknown committed
53

sunxx1's avatar
sunxx1 committed
54
55
56
```plaintext
pip install -r requirements.txt
```
unknown's avatar
unknown committed
57

sunxx1's avatar
sunxx1 committed
58
## 数据集
unknown's avatar
unknown committed
59

sunxx1's avatar
sunxx1 committed
60
cifar10
unknown's avatar
unknown committed
61

sunxx1's avatar
sunxx1 committed
62
链接:https://pan.baidu.com/s/1ZFMQVBGQZI6UWZKJcTYPAQ?pwd=fq3l 提取码:fq3l 
unknown's avatar
unknown committed
63

sunxx1's avatar
sunxx1 committed
64
65
66
67
68
69
70
71
72
73
74
```
├── batches.meta
├── data_batch_1
├── data_batch_2
├── data_batch_3
├── data_batch_4
├── data_batch_5
├── readme.html
└── test_batch
```

sunxx1's avatar
sunxx1 committed
75
## 训练
unknown's avatar
unknown committed
76
77
78
79
80
81
下载预训练模型放在checkpoint目录下:

```
wget https://storage.googleapis.com/vit_models/imagenet21k/ViT-B_16.npz
```

sunxx1's avatar
sunxx1 committed
82
### 单机单卡
unknown's avatar
unknown committed
83

sunxx1's avatar
sunxx1 committed
84
    export HIP_VISIBLE_DEVICES=0
unknown's avatar
unknown committed
85
86
    python3 -m torch.distributed.launch --nproc_per_node=1 train.py --name cifar10-100_500 --dataset cifar10 --model_type ViT-B_16 --pretrained_dir checkpoint/ViT-B_16.npz --train_batch_size 64 --num_steps 500

sunxx1's avatar
sunxx1 committed
87
88
89
90
91
### 单机多卡

```
python3 -m torch.distributed.launch --nproc_per_node=8 train.py --name cifar10-100_500 --dataset cifar10 --model_type ViT-B_16 --pretrained_dir checkpoint/ViT-B_16.npz --train_batch_size 64 --num_steps 500
```
sunxx1's avatar
sunxx1 committed
92
93
## result
![1695381570003](image/README/1695381570003.png)
sunxx1's avatar
sunxx1 committed
94
95
## 精度

unknown's avatar
unknown committed
96
97
测试数据使用的是cifar10,使用的加速卡是DCU Z100L。

sunxx1's avatar
sunxx1 committed
98
99
100
| 卡数 | 精度 |
| :------: | :------: |
| 1 | Best Accuracy=0.3051 |
sunxx1's avatar
sunxx1 committed
101
102
103
104
105
106
107
108
109
110
## 应用场景

### 算法类别

图像分类

### 热点行业

制造,能源,交通,网安

sunxx1's avatar
sunxx1 committed
111
### 源码仓库及问题反馈
sunxx1's avatar
sunxx1 committed
112
113
114

https://developer.hpccube.com/codes/modelzoo/vit-pytorch

unknown's avatar
unknown committed
115
116
117
### 参考

https://github.com/jeonsworld/ViT-pytorch