README.md 3.42 KB
Newer Older
liangjing's avatar
update  
liangjing committed
1
# ResNet50
liangjing's avatar
liangjing committed
2

liangjing's avatar
update  
liangjing committed
3
## 论文
liangjing's avatar
liangjing committed
4

liangjing's avatar
update  
liangjing committed
5
6
7
Deep Residual Learning for Image Recognition

* https://arxiv.org/abs/1512.03385
liangjing's avatar
liangjing committed
8
9
10

## 模型结构

liangjing's avatar
update  
liangjing committed
11
12
13
14
15
ResNet50是一种用于图像识别的深度神经网络模型,该模型由一系列卷积层、池化层、全局平均池化层和全连接层组成。该模型的特殊之处在于它包含多个残差块,每个残差块由多个卷积层和跳跃连接组成。

![img](ResNet50.png)

## 算法原理
liangjing's avatar
liangjing committed
16
17
18

在ResNet50中,输入图像通过一系列卷积层和池化层进行特征提取,然后通过多个残差块进行深度特征学习。每个残差块包含多个卷积层和跳跃连接,跳跃连接允许信息在残差块内和残差块之间的传递,从而解决了深度神经网络中出现的梯度消失问题。最后,全局平均池化层将特征映射到一个固定长度的向量,该向量再通过全连接层进行分类或回归等任务。

liangjing's avatar
update  
liangjing committed
19
20
21
![img](Residual_Block.png)

## 环境配置
liangjing's avatar
liangjing committed
22

liangjing's avatar
update  
liangjing committed
23
24
**Docker (方法一)**

liangjing's avatar
update  
liangjing committed
25
26
27
28
29
30
31
提供[光源](https://www.sourcefind.cn/#/service-details)拉取的训练的docker镜像:

    docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:mlperf-resnet50-mpirun-latest
    # <Image ID>用上面拉取docker镜像的ID替换
    # <Host Path>主机端路径
    # <Container Path>容器映射路径
    docker run -it --name mlperf_bert --shm-size=32G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v <Host Path>:<Container Path> <Image ID> /bin/bash
liangjing's avatar
liangjing committed
32

liangjing's avatar
update  
liangjing committed
33
34
35
36
37
38
39
40
**Dockerfile (方法二)**

    docker build --no-cache -t mlperf_resnet50:latest
    docker run -it --name mlperf_resnet50 --shm-size=32G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v <Host Path>:<Container Path> <Image ID> /bin/bash
    # <Image ID>用上面拉取docker镜像的ID替换
    # <Host Path>主机端路径
    # <Container Path>容器映射路径

liangjing's avatar
update  
liangjing committed
41
镜像版本依赖:
liangjing's avatar
liangjing committed
42

liangjing's avatar
update  
liangjing committed
43
44
* DTK驱动:dtk22.10.1
* python: python3.8.2
liangjing's avatar
liangjing committed
45

liangjing's avatar
update  
liangjing committed
46
47
48
49
50
测试目录:

```
/root/resnet50
```
liangjing's avatar
liangjing committed
51

liangjing's avatar
liangjing committed
52
53
54
55
56
57
## 数据集

需使用ImageNET数据集,下载地址:http://image-net.org/challenges/LSVRC/2012/2012-downloads (require an account)

具体处理方式可详见:https://github.com/mlcommons/training/tree/master/image_classification

liangjing's avatar
update  
liangjing committed
58
处理好的预训练模型结果如下,镜像中已经给出,无需额外下载
liangjing's avatar
liangjing committed
59

liangjing's avatar
update  
liangjing committed
60
61
62
63
64
65
    mlperf_resnet50
    ├── checkpoint
    ├── ckpt-0.data-00000-of-00001
    ├── ckpt-0.index
    ├── ckpt-500.data-00000-of-00001
    ├── ckpt-500.index
liangjing's avatar
liangjing committed
66

liangjing's avatar
update  
liangjing committed
67
## 训练
liangjing's avatar
liangjing committed
68

liangjing's avatar
update  
liangjing committed
69
### 单机多卡
liangjing's avatar
liangjing committed
70

liangjing's avatar
update  
liangjing committed
71
单机8卡进行性能&&精度测试
liangjing's avatar
liangjing committed
72

liangjing's avatar
update  
liangjing committed
73
74
75
```
bash 8dcu_multi.sh >& output.log &
```
liangjing's avatar
liangjing committed
76

liangjing's avatar
update  
liangjing committed
77
## result
liangjing's avatar
liangjing committed
78

liangjing's avatar
update  
liangjing committed
79
![result](result.png)
liangjing's avatar
liangjing committed
80

liangjing's avatar
update  
liangjing committed
81
## 精度
liangjing's avatar
liangjing committed
82

liangjing's avatar
update  
liangjing committed
83
采用上述输入数据,加速卡采用Z100L * 8,可最终达到官方收敛要求,即达到目标精度75.90% classification;
liangjing's avatar
liangjing committed
84

liangjing's avatar
update  
liangjing committed
85
86
87
| 卡数 | 类型     | 进程数 | 达到精度              |
| ---- | -------- | ------ | --------------------- |
| 8    | 混合精度 | 8      | 75.90% classification |
liangjing's avatar
liangjing committed
88

liangjing's avatar
update  
liangjing committed
89
## 应用场景
liangjing's avatar
liangjing committed
90

liangjing's avatar
update  
liangjing committed
91
### 算法类别
liangjing's avatar
liangjing committed
92

liangjing's avatar
update  
liangjing committed
93
`图像分类`
liangjing's avatar
liangjing committed
94

liangjing's avatar
update  
liangjing committed
95
### 热点应用行业
liangjing's avatar
liangjing committed
96

liangjing's avatar
update  
liangjing committed
97
`制造,政府,医疗,科研`
liangjing's avatar
liangjing committed
98

liangjing's avatar
liangjing committed
99
## 源码仓库及问题反馈
liangjing's avatar
liangjing committed
100

liangjing's avatar
update  
liangjing committed
101
* https://developer.hpccube.com/codes/modelzoo/mlperf_resnet50_tensorflow
liangjing's avatar
liangjing committed
102

liangjing's avatar
update  
liangjing committed
103
## 参考资料
liangjing's avatar
liangjing committed
104
105

* https://mlcommons.org/en/
liangjing's avatar
liangjing committed
106
* https://github.com/mlcommons
liangjing's avatar
update  
liangjing committed
107
108
* https://github.com/mlcommons/training/tree/master/image_classification/tensorflow2