# ViT ## 论文 `An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale` - https://arxiv.org/abs/2010.11929 ## 模型结构 Vision Transformer先将图像用卷积进行分块以降低计算量,再对每一块进行展平处理变成序列,然后将序列添加位置编码和cls token,再输入多层Transformer结构提取特征,最后将cls tooken取出来通过一个MLP(多层感知机)用于分类。 ![img](./docs/vit.png) ## 算法原理 图像领域借鉴《Transformer is all you need!》算法论文中的Encoder结构提取特征,Transformer的核心思想是利用注意力模块attention提取特征: ![img](./docs/attention.png) ## 环境配置 ### Docker(方法一) ``` docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:decode-ffmpeg-dtk23.04 # 用以上拉取的docker的镜像ID替换 docker run --shm-size 10g --network=host --name=vit_migraphx --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v path_to_vit_migraphx:/home/vit_migraphx -it bash ``` ### Dockerfile(方法二) ``` cd vit_migraphx/docker docker build --no-cache -t vit_migraphx:test . docker run --rm --shm-size 10g --network=host --name=vit_migraphx --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v path_to_vit_migraphx:/home/vit_migraphx -it vit_migraphx:test bash ``` ## 编译运行 ### 编译 ``` git clone https://developer.hpccube.com/codes/modelzoo/vit_migraphx.git cd vit_migraphx make ``` ### 运行 下载推理数据 ``` wget https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz tar -zxvf flower_photos.tgz ``` ``` ./ViT_MIGraphX ``` 根据提示选择要运行的示例程序,比如执行 ``` ./ViT_MIGraphX --models=Models/model.onnx --input=flower_photos/daisy/ ``` 运行ViT模型,对daisy图片进行分类 ## 准确率数据 测试数据使用的是[flower_photos](https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz),使用的加速卡是DCU Z100 | Engine | Model Path| Data | Accuracy(%) | | :------: | :------: | :------: | :------: | | MIGraphX | models/model.onnx | daisy | 98.4 | | MIGraphX | models/model.onnx | dandelion | 98.1 | | MIGraphX | models/model.onnx | roses | 91.3 | | MIGraphX | models/model.onnx | sunflowers | 97.4 | | MIGraphX | models/model.onnx | tulips | 94.1 | ## 应用场景 ### 算法类别 `图像分类` ### 热点应用行业 `制造,环境,医疗,气象` ## 源码仓库及问题反馈 - https://developer.hpccube.com/codes/modelzoo/vit_migraphx.git ## 参考资料 - https://github.com/WZMIAOMIAO/deep-learning-for-image-processing