README.md 2.73 KB
Newer Older
lijian6's avatar
Update  
lijian6 committed
1
# ViT 
lijian6's avatar
lijian6 committed
2

lijian6's avatar
Update  
lijian6 committed
3
4
5
6
7
## 论文
`An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale`
- https://arxiv.org/abs/2010.11929
## 模型结构
Vision Transformer先将图像用卷积进行分块以降低计算量,再对每一块进行展平处理变成序列,然后将序列添加位置编码和cls token,再输入多层Transformer结构提取特征,最后将cls tooken取出来通过一个MLP(多层感知机)用于分类。
lijian6's avatar
lijian6 committed
8

lijian6's avatar
lijian6 committed
9
![img](./docs/vit.png)
lijian6's avatar
Update  
lijian6 committed
10
11
## 算法原理
图像领域借鉴《Transformer is all you need!》算法论文中的Encoder结构提取特征,Transformer的核心思想是利用注意力模块attention提取特征:
lijian6's avatar
lijian6 committed
12

lijian6's avatar
lijian6 committed
13
![img](./docs/attention.png)
lijian6's avatar
lijian6 committed
14
## 环境配置
lijian6's avatar
Update  
lijian6 committed
15
### Docker(方法一)
lijian6's avatar
lijian6 committed
16
17
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:decode-ffmpeg-dtk23.04
lijian6's avatar
Update  
lijian6 committed
18
19
20
21
22
23
24
25
# <your IMAGE ID>用以上拉取的docker的镜像ID替换
docker run --shm-size 10g --network=host --name=vit_migraphx --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v path_to_vit_migraphx:/home/vit_migraphx -it <your IMAGE ID> bash
```
### Dockerfile(方法二)
```
cd vit_migraphx/docker
docker build --no-cache -t vit_migraphx:test .
docker run --rm --shm-size 10g --network=host --name=vit_migraphx --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v path_to_vit_migraphx:/home/vit_migraphx -it vit_migraphx:test bash
lijian6's avatar
lijian6 committed
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
```
## 编译运行

### 编译
```
git clone https://developer.hpccube.com/codes/modelzoo/vit_migraphx.git
cd vit_migraphx
make
```

### 运行

下载推理数据
```
wget https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz
tar -zxvf flower_photos.tgz
```

```
./ViT_MIGraphX
```
根据提示选择要运行的示例程序,比如执行
```
./ViT_MIGraphX --models=Models/model.onnx --input=flower_photos/daisy/
```
运行ViT模型,对daisy图片进行分类


## 准确率数据

测试数据使用的是[flower_photos](https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz),使用的加速卡是DCU Z100

| Engine | Model Path| Data | Accuracy(%) |
| :------: | :------: | :------: | :------: |
| MIGraphX | models/model.onnx | daisy | 98.4 |
| MIGraphX | models/model.onnx | dandelion | 98.1 |
| MIGraphX | models/model.onnx | roses | 91.3 |
| MIGraphX | models/model.onnx | sunflowers | 97.4 |
| MIGraphX | models/model.onnx | tulips | 94.1 |


lijian6's avatar
Update  
lijian6 committed
67
68
69
70
71
72
73
74
75
## 应用场景
### 算法类别
`图像分类`
### 热点应用行业
`制造,环境,医疗,气象`
## 源码仓库及问题反馈
- https://developer.hpccube.com/codes/modelzoo/vit_migraphx.git
## 参考资料
- https://github.com/WZMIAOMIAO/deep-learning-for-image-processing