README.md 4.33 KB
Newer Older
panning's avatar
panning committed
1
# ResNet50
panning's avatar
panning committed
2

panning's avatar
panning committed
3
4
5
6
7
8
## 论文

`Deep Residual Learning for Image Recognition`

- https://arxiv.org/abs/1512.03385

panning's avatar
panning committed
9
10
## 模型结构

panning's avatar
panning committed
11
ResNet50网络中包含了49个卷积层、1个全连接层等。
panning's avatar
panning committed
12

chenzk's avatar
chenzk committed
13
![ResNet50模型结构](https://developer.sourcefind.cn/codes/modelzoo/resnet50-pytorch/-/raw/main/doc/ResNet50.png?inline=false)
panning's avatar
panning committed
14

panning's avatar
panning committed
15
## 算法原理
panning's avatar
panning committed
16

panning's avatar
panning committed
17
ResNet50使用了多个具有残差连接的残差块来解决梯度消失或梯度爆炸问题,并使得网络可以向更深层发展。
panning's avatar
panning committed
18

chenzk's avatar
chenzk committed
19
![算法原理](https://developer.sourcefind.cn/codes/modelzoo/resnet50-pytorch/-/raw/main/doc/Residual_Block.png?inline=false)
panning's avatar
panning committed
20
21

## 环境配置
panning's avatar
panning committed
22

panning's avatar
panning committed
23
### Docker(方法一)
panning's avatar
panning committed
24
25

```
panning's avatar
panning committed
26
拉取镜像:
dcuai's avatar
dcuai committed
27
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
panning's avatar
panning committed
28
创建并启动容器:
dcuai's avatar
dcuai committed
29
docker run --shm-size 16g --network=host --name=resnet50_pytorch --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined  -v /opt/hyhal:/opt/hyhal:ro -v $PWD/resnet50-pytorch:/home/resnet50_pytorch -it <Your Image ID> bash
panning's avatar
panning committed
30
31
安装依赖:
pip install -r requirements.txt 
panning's avatar
panning committed
32
33
```

panning's avatar
panning committed
34
35
36
37
38
### Dockerfile(方法二)

```
cd resnet50-pytorch/docker
docker build --no-cache -t resnet50_pytorch:latest .
dcuai's avatar
dcuai committed
39
docker run --rm --shm-size 16g --network=host --name=resnet50_pytorch --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /opt/hyhal:/opt/hyhal:ro -v $PWD/../../resnet50-pytorch:/home/resnet50_pytorch -it resnet50_pytorch:latest bash
panning's avatar
panning committed
40
41
42
43
44
```

### Anaconda(方法三)

1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
chenzk's avatar
chenzk committed
45
https://developer.sourcefind.cn/tool/
panning's avatar
panning committed
46
47

```
dcuai's avatar
dcuai committed
48
49
50
51
52
DTK驱动:dtk24.04.1
python:python3.10
torch:2.1.0
torchvision:0.16.0
apex:1.1
panning's avatar
panning committed
53
54
```

panning's avatar
panning committed
55
56
57
58
59
60
`Tips:以上DTK、python、torch等DCU相关工具包,版本需要严格一一对应`

2、其它非特殊库参照requirements.txt安装

## 数据集

chenzk's avatar
chenzk committed
61
下载ImageNet数据集:[imagenet-2012](https://image-net.org/download.php)
panning's avatar
panning committed
62

chenzk's avatar
chenzk committed
63
参考[scrips/extract_ILSVRC.sh](https://developer.sourcefind.cn/codes/modelzoo/resnet50-pytorch/-/blob/main/scrips/extract_ILSVRC.sh)处理数据集,处理后数据集结构如下:
panning's avatar
panning committed
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80

```
data
    |
    train
        |
        n01440764
        n01443537
        ...
    val
        |
        n01440764
        n01443537
        ...
```


panning's avatar
panning committed
81
82
83

## 训练

panning's avatar
panning committed
84
85
86
87
### 单卡训练(单精度)

```
export HIP_VISIBLE_DEVICES=0
panning's avatar
panning committed
88
python3 train.py --batch-size=64 --arch=resnet50 -j 6 --epochs=90 --save-path=/path/to/{save_model_dir} /path/to/{ImageNet_pytorch_data_dir}/
panning's avatar
panning committed
89
90
91
92
93
94
95
96
```

其中,'/path/to/{save_model_dir}' 修改为model 想要保存的路径,'/path/to/{ImageNet_pytorch_data_dir}/'修改为ImageNet数据集所在路径,后续混合精度训练与多卡训练脚本修改相同,不再说明。

### 单卡训练(混合精度)

```
export HIP_VISIBLE_DEVICES=0
panning's avatar
panning committed
97
python3 train.py --batch-size=64 --arch=resnet50 -j 6 --epochs=90 --amp --opt-level O1 --loss-scale=dynamic --save-path=/path/to/{save_model_dir} /path/to/{ImageNet_pytorch_data_dir}/
panning's avatar
panning committed
98
99
```

100
### 单机四卡训练(单精度)
panning's avatar
panning committed
101
102

```
dcuai's avatar
dcuai committed
103
104
105
cd scrips
chmod +x single_process.sh
mpirun --allow-run-as-root --bind-to none -np 4 single_process.sh localhost resnet50 64
panning's avatar
panning committed
106
107
```

108
109
110
### 单机四卡训练(混合精度)

```
dcuai's avatar
dcuai committed
111
112
113
cd scrips
chmod +x single_process_amp.sh
mpirun --allow-run-as-root --bind-to none -np 4 single_process_amp.sh localhost resnet50 64
114
115
```

yangzhong's avatar
yangzhong committed
116
117
## result

chenzk's avatar
chenzk committed
118
![result](https://developer.sourcefind.cn/codes/modelzoo/resnet50-pytorch/-/raw/main/doc/result.png?inline=false)
yangzhong's avatar
yangzhong committed
119

panning's avatar
panning committed
120
## 精度
panning's avatar
panning committed
121
122
123

测试数据使用ImageNet数据集,使用的加速卡是DCUZ100L。

yangzhong's avatar
yangzhong committed
124
125
126
127
| 卡数 | learning rate | batch size | 类型 | Accuracy(%) |
| ---- | ------------- | ---------- | ---- | ----------- |
| 4    | 0.1           | 256        | fp32 | 75.94       |
| 4    | 0.1           | 256        | amp  | 75.99       |
yangzhong's avatar
yangzhong committed
128
129
130
131
| 8    | 0.1           | 256        | fp32 | 75.914      |
| 8    | 0.1           | 256        | amp  | 75.64       |
| 8    | 0.15          | 512        | fp32 | 75.902      |
| 8    | 0.15          | 512        | amp  | 75.90       |
panning's avatar
panning committed
132

panning's avatar
panning committed
133
134
135
136
137
138
139
140
141
142
## 应用场景

### 算法类别

`图像分类`

### 热点应用行业

`制造,政府,医疗,科研`

panning's avatar
panning committed
143
# 源码仓库及问题反馈
panning's avatar
panning committed
144

chenzk's avatar
chenzk committed
145
https://developer.sourcefind.cn/codes/modelzoo/resnet50-pytorch
panning's avatar
panning committed
146

dcuai's avatar
dcuai committed
147
# 参考资料
panning's avatar
panning committed
148

panning's avatar
panning committed
149
https://github.com/pytorch/examples/tree/master/imagenet