Update README.md ——dtk24.04.2

9d98e461 · dcuai · cf30388b · 9d98e461
Commit 9d98e461 authored Nov 07, 2024 by dcuai
Hide whitespace changes
Inline Side-by-side

Showing with 78 additions and 24 deletions

README.md README.md +78 -24

No files found.
--- a/README.md
+++ b/README.md
 # Swin Transformer 
+## 论文
-## 模型介绍
+[Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030)
-Swin Transformer可以作为计算机视觉的通用支柱。将 Transformer 从语言转换为视觉的挑战来自于两个域之间的差异，例如视觉实体的尺度差异大，以及图像中像素相对于文本中单词的高分辨率。为了解决这些差异，提出了一个分层( hierarchical )Transformer，其表示是用移动窗口( Shifted windows )计算的。移动窗口方案通过将自注意力计算限制到非重叠的局部窗口，同时允许跨窗口连接，从而带来更高的效率。这种分层体系结构具有在各种尺度上建模的灵活性，并且具有与图像大小相关的线性计算复杂度。Swin Trans former 的这些特性使其可以兼容广泛的视觉任务，包括图像分类( ImageNet - 1K的top - 1准确率为87.3 %)和密集预测任务，如目标检测( 在COCO test-dev上实现了58.7 box AP和51.1 mask AP )和语义分割( 53.5 mIoU )。2021年，其性能在COCO上以 + 2.7 box AP 和 + 2.6 mask AP 的大幅优势超越了先前的先进水平，在ADE20K上以+ 3.2 mIoU的优势超越了先前的先进水平，显示了基于 Transformer 的模型作为视觉中枢的潜力。分层设计和移位窗口方法也被证明对 full-MLP 体系结构有利。
 ## 模型结构
 Swin Transformer体系结构的概述如下图所示，其中说明了 tiny version ( Swin-T )。它首先通过 patch 分割模块(如ViT )将输入的RGB图像分割成不重叠的 patch 。每个 patch 被当作一个 "token" ( 相当于NLP中的词源 )处理，它的特征被设置为原始像素RGB值的 concatenation。在我们的实现中，我们使用了 4 × 4 的 patch 大小，因此每个 patch 的特征维度为 4 × 4 × 3 = 48。在这个原始值特征上应用一个线性嵌入层，将其投影到任意维度( 记为C )。Swin Transformer block将Transformer块中的标准多头自注意力( MSA )模块替换为基于移动窗口的模块，其他层保持不变。如图( b )所示，一个 SwinTransformer 模块由一个基于移动窗口的MSA模块组成，其后是一个2层的MLP，GELU非线性介于两者之间。在每个MSA模块和每个MLP之前施加一个 LayerNorm ( LN )层，在每个模块之后施加一个残差连接。
@@ -11,42 +10,97 @@ Swin Transformer体系结构的概述如下图所示，其中说明了 tiny vers
 - ( a )Swin Transformer ( Swin-T )的结构；
 - ( b )连续 2 个Swin Transformer 块。
-## 模型
+## 算法原理
+Swin Transformer 相比于 Transformer block (例如 ViT)，将 标准多头自注意力模块 (MSA) 替换为 基于移位窗口的多头自注意力模块 (W-MSA / SW-MSA) 且保持其他部分不变。如图所示，一个 Swin Transformer block 由一个 基于移位窗口的 MSA 模块 构成，且后接一个夹有 GeLU 非线性在中间的 2 层 MLP。LayerNorm (LN) 层被应用于每个 MSA 模块和每个 MLP 前，且一个 残差连接 被应用于每个模块后。
-[快速下载地址](http://113.200.138.88:18080/aimodels/swin_tiny_patch4_window7_224.ms_in1k)
+<div align=center>
+    <img src="figures/pic.png"/>
+</div>
+## 环境配置
+### Docker（方法一）
+提供[光源](https://sourcefind.cn/#/service-list)拉取的docker镜像：
+```bash
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.2-py3.10
+docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
+cd /your_code_path/
+pip install -r requirements.txt
+```
+Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应。
+### Dockerfile（方法二）
+```bash
+docker build -t swin_transformer:latest .
+docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
+cd /your_code_path/
+pip install -r requirements.txt
+```
+### Anaconda（方法三）
+关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
+```
+DTK驱动: dtk24.04.2
+python: 3.10
+torch: 2.1.0
+```
+`Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应`
+其它非深度学习库安装方式如下：
+```bash
+pip install -r requirements.txt
+```
 ## 数据集
 在本测试中可以使用tiny-imagenet-200数据集。
+SCnet数据集快速下载链接[tiny-imagenet-200](http://113.200.138.88:18080/aidatasets/project-dependency/tiny-imagenet-200/-/tree/main?ref_type=heads)
-数据集处理方法请参考imagenet官方介绍自行处理，也可通过下面链接下载使用。
+## 训练
+### 单机单卡
-链接：链接：https://pan.baidu.com/s/17dg8g5VhMfU5_9SUogMP7w?pwd=fy0p 提取码：fy0p 
+```bash
+HIP_VISIBLE_DEVICES=0 python3 -m torch.distributed.launch --nproc_per_node 1 --master_port 12345  main.py --cfg configs/swin/swin_tiny_patch4_window7_224.yaml --data-path /code/Datasets/tiny-imagenet-200/ --batch-size 128 --disable_amp 
+```
-## Swin-Transformer训练
+### 单机多卡
-### 环境配置
+```bash
-提供[光源](https://www.sourcefind.cn/#/service-details)拉取的训练以及推理的docker镜像：
+HIP_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node 4 --master_port 12345  main.py --cfg configs/swin/swin_tiny_patch4_window7_224.yaml --data-path /code/Datasets/tiny-imagenet-200/ --batch-size 128 --disable_amp 
-* 训练镜像：docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.10.0-centos7.6-dtk-22.10.1-py37-latest
+```
-* pip install -r requirements.txt
+具体参数设置可参考main.py和config.py
-### 训练
+## 推理
-训练命令：
+无
+## result
-    export HIP_VISIBLE_DEVICES=0
+<div align=center>
+    <img src="figures/result.png"/>
-    python3 -m torch.distributed.launch --nproc_per_node 1 --master_port 12345  main.py --cfg configs/swin/swin_tiny_patch4_window7_224.yaml --data-path /code/Datasets/tiny-imagenet-200/ --batch-size 128 --disable_amp
+</div>
-## 准确率数据
+### 精度
 测试数据使用的是tiny-imagenet-200，使用的加速卡是DCU Z100L。
 | 卡数 | 精度 |
 | :------: | :------: |
 | 1 | Acc@1：63.416  Acc：@5 85.666 |
-### 源码仓库及问题反馈
-https://developer.hpccube.com/codes/modelzoo/swin-transformer-pytorch
+## 应用场景
+### 算法类别
+`图像分类`
+### 热点应用行业
+`科研,教育,政府,金融`
+## 源码仓库及问题反馈
-### 参考
+- https://developer.hpccube.com/codes/modelzoo/swin-transformer-pytorch
-https://github.com/microsoft/Swin-Transformer
+### 参考资料
+- https://github.com/microsoft/Swin-Transformer
+<!--
+## 模型
+[快速下载地址](http://113.200.138.88:18080/aimodels/swin_tiny_patch4_window7_224.ms_in1k)
+-->