README.md

# Vision-Transformers-cifar10
## 论文
`An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale`
- https://arxiv.org/abs/2010.11929

## 模型结构
Vision Transformer先将图像用卷积进行分块以降低计算量，再对每一块进行展平处理变成序列，然后将序列添加位置编码和cls token，再输入多层Transformer结构提取特征，最后将cls tooken取出来通过一个MLP（多层感知机）用于分类。

![img](./img/vit.png)
## 算法原理

图像领域借鉴《Transformer is all you need！》算法论文中的Encoder结构提取特征，Transformer的核心思想是利用注意力模块attention提取特征：

![img](./img/attention.png)

## 环境配置
### Docker（方法一）
此处提供[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10

docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video  -v /opt/hyhal:/opt/hyhal:ro --name docker_name imageID bash

cd /path/workspace/
pip install pandas==1.5.3
pip install numpy==1.24.3
```

### Dockerfile（方法二）
此处提供dockerfile的使用方法
```
cd ./docker
docker build --no-cache -t vision-transformers:latest .
docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v /opt/hyhal:/opt/hyhal:ro --name docker_name imageID bash
pip install pandas==1.5.3
pip install numpy==1.24.3

```

### Anaconda（方法三）
此处提供本地配置、编译的详细步骤，例如：

关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
```
DTK驱动：dtk24.04.1
python：python3.10
torch:2.1.0
torchvision:0.16.0
```
`Tips：以上dtk驱动、python、paddle等DCU相关工具版本需要严格一一对应`

其它非深度学习库参照requirements.txt安装：
```
pip install -r requirements.txt
pip install pandas==1.5.3
pip install numpy==1.24.3
```

## 数据集
`CIFAR-10`
- http://113.200.138.88:18080/aidatasets/project-dependency/cifar

训练数据目录结构如下，用于正常训练的完整数据集请按此目录结构进行制备：
```
├── cifar-10-batches-py
│   ├── batches.meta
│   ├── data_batch_1
│   ├── data_batch_2
│   ├── data_batch_3
│   ├── data_batch_4
│   ├── data_batch_5
│   ├── readme.html
│   └── test_batch
```
**tips：请注意在程序中设置数据集路径时，应该设置cifar-10-batches-py的上一级目录，默认是放在data文件夹下，否则会报错找不到文件**

## 训练
### 单机多卡
```
bash train.sh
#默认使用四卡训练，可在train.sh脚本中修改
```

## result
![accuracy](./img/accuracy.png)

### 精度
测试数据：[CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html)，使用的加速卡:DCU Z100L。

根据测试结果情况填写表格：
| 卡数 | 数据精度 | 准确率 |
| :------: | :------: | :------: |
| 4 | fp32 | 84.91% |

## 应用场景
### 算法类别
图像分类

### 热点应用行业
制造,环保,医疗,气象

## 源码仓库及问题反馈
- https://developer.hpccube.com/codes/modelzoo/vision-transformers-cifar10_pytorch

## 参考资料
- https://github.com/kentaroy47/vision-transformers-cifar10