README.md 3.9 KB
Newer Older
panning's avatar
panning committed
1
# ResNet50
panning's avatar
panning committed
2

panning's avatar
panning committed
3
4
5
6
7
8
## 论文

`Deep Residual Learning for Image Recognition`

- https://arxiv.org/abs/1512.03385

panning's avatar
panning committed
9
10
11
12
13
14
## 模型介绍

使用PyTorch进行ResNet50训练。

## 模型结构

panning's avatar
panning committed
15
ResNet50网络中包含了49个卷积层、1个全连接层等。
panning's avatar
panning committed
16

panning's avatar
panning committed
17
![](C:\Users\15504\resnet50-pytorch\doc\ResNet50.png)
panning's avatar
panning committed
18

panning's avatar
panning committed
19
## 算法原理
panning's avatar
panning committed
20

panning's avatar
panning committed
21
ResNet50使用了多个具有残差连接的残差块来解决梯度消失或梯度爆炸问题,并使得网络可以向更深层发展。
panning's avatar
panning committed
22

panning's avatar
panning committed
23
24
25
![](C:\Users\15504\resnet50-pytorch\doc\Residual_Block.png)

## 环境配置
panning's avatar
panning committed
26

panning's avatar
panning committed
27
### Docker(方法一)
panning's avatar
panning committed
28
29
30
31

[光源](https://www.sourcefind.cn/#/service-details)拉取训练镜像:

```
panning's avatar
panning committed
32
拉取镜像:
panning's avatar
panning committed
33
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.10.0-centos7.6-dtk-22.10.1-py37-latest
panning's avatar
panning committed
34
35
36
37
创建并启动容器:
docker run --shm-size 16g --network=host --name=resnet50_pytorch --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $PWD/resnet50-pytorch:/home/resnet50_pytorch -it <Your Image ID> bash
安装依赖:
pip install -r requirements.txt 
panning's avatar
panning committed
38
39
```

panning's avatar
panning committed
40
41
42
43
44
45
46
47
48
49
50
51
### Dockerfile(方法二)

```
cd resnet50-pytorch/docker
docker build --no-cache -t resnet50_pytorch:latest .
docker run --rm --shm-size 16g --network=host --name=resnet50_pytorch --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $PWD/../../resnet50-pytorch:/home/resnet50_pytorch -it resnet50_pytorch:latest bash
```

### Anaconda(方法三)

1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
https://developer.hpccube.com/tool/
panning's avatar
panning committed
52
53

```
panning's avatar
panning committed
54
55
56
57
58
DTK驱动:dtk22.10.1
python:python3.7
torch:1.10.0
torchvision:0.10.0
apex:0.1
panning's avatar
panning committed
59
60
```

panning's avatar
panning committed
61
62
63
64
65
66
67
68
69
70
71
72
`Tips:以上DTK、python、torch等DCU相关工具包,版本需要严格一一对应`

2、其它非特殊库参照requirements.txt安装

## 数据集

下载ImageNet数据集:[ImageNet (image-net.org)](https://image-net.org/)

参考[scrips/extract_ILSVRC.sh](https://developer.hpccube.com/codes/modelzoo/resnet50-pytorch/-/blob/main/scrips/extract_ILSVRC.sh)处理数据集。

## 训练

panning's avatar
panning committed
73
74
75
76
### 单卡训练(单精度)

```
export HIP_VISIBLE_DEVICES=0
panning's avatar
panning committed
77
python3 train.py --batch-size=64 --arch=resnet50 -j 6 --epochs=90 --save-path=/path/to/{save_model_dir} /path/to/{ImageNet_pytorch_data_dir}/
panning's avatar
panning committed
78
79
80
81
82
83
84
85
```

其中,'/path/to/{save_model_dir}' 修改为model 想要保存的路径,'/path/to/{ImageNet_pytorch_data_dir}/'修改为ImageNet数据集所在路径,后续混合精度训练与多卡训练脚本修改相同,不再说明。

### 单卡训练(混合精度)

```
export HIP_VISIBLE_DEVICES=0
panning's avatar
panning committed
86
python3 train.py --batch-size=64 --arch=resnet50 -j 6 --epochs=90 --amp --opt-level O1 --loss-scale=dynamic --save-path=/path/to/{save_model_dir} /path/to/{ImageNet_pytorch_data_dir}/
panning's avatar
panning committed
87
88
```

89
### 单机四卡训练(单精度)
panning's avatar
panning committed
90
91
92
93
94

```
mpirun --allow-run-as-root --bind-to none -np 4 scrips/single_process.sh localhost resnet50 64
```

95
96
97
98
99
100
### 单机四卡训练(混合精度)

```
mpirun --allow-run-as-root --bind-to none -np 4 scrips/single_process_amp.sh localhost resnet50 64
```

panning's avatar
panning committed
101
## 准确率数据
panning's avatar
panning committed
102
103
104

测试数据使用ImageNet数据集,使用的加速卡是DCUZ100L。

yangzhong's avatar
yangzhong committed
105
106
107
108
| 卡数 | learning rate | batch size | 类型 | Accuracy(%) |
| ---- | ------------- | ---------- | ---- | ----------- |
| 4    | 0.1           | 256        | fp32 | 75.94       |
| 4    | 0.1           | 256        | amp  | 75.99       |
yangzhong's avatar
yangzhong committed
109
110
111
112
| 8    | 0.1           | 256        | fp32 | 75.914      |
| 8    | 0.1           | 256        | amp  | 75.64       |
| 8    | 0.15          | 512        | fp32 | 75.902      |
| 8    | 0.15          | 512        | amp  | 75.90       |
panning's avatar
panning committed
113

panning's avatar
panning committed
114
115
116
117
118
119
120
121
122
123
## 应用场景

### 算法类别

`图像分类`

### 热点应用行业

`制造,政府,医疗,科研`

panning's avatar
panning committed
124
# 源码仓库及问题反馈
panning's avatar
panning committed
125
126

https://developer.hpccube.com/codes/modelzoo/resnet50-pytorch
panning's avatar
panning committed
127
128
129

# 参考

panning's avatar
panning committed
130
https://github.com/pytorch/examples/tree/master/imagenet