README.md 3.32 KB
Newer Older
yuguo960516's avatar
yuguo960516 committed
1
# Resnet50
yuguo's avatar
update  
yuguo committed
2
3
4
5
6
## 论文
`Deep Residual Learning for Image Recognition`

- [https://arxiv.org/abs/1512.03385](https://arxiv.org/abs/1512.03385)

yuguo960516's avatar
yuguo960516 committed
7
8
## 模型结构
Resnet50 网络中包含了 49 个卷积层、1个全连接层等。
yuguo's avatar
update  
yuguo committed
9

yuguo's avatar
update  
yuguo committed
10
<img src="http://developer.hpccube.com/codes/modelzoo/resnet50_oneflow/-/raw/main/ResNet50%E6%A8%A1%E5%9E%8B%E7%BB%93%E6%9E%84.png" alt="ResNet50模型结构.png" style="zoom:67%;" />
yuguo's avatar
update  
yuguo committed
11
12
13
14
15

## 算法原理

ResNet50使用了多个具有残差连接的残差块来解决梯度消失或梯度爆炸问题,并使得网络可以向更深层发展。

yuguo's avatar
update  
yuguo committed
16
<img src="http://developer.hpccube.com/codes/modelzoo/resnet50_oneflow/-/raw/main/Residual_Block.png" alt="Residual_Block.png" style="zoom:67%;" />
yuguo's avatar
update  
yuguo committed
17
18
19
20
21
22
23
24
25
26
27
28
29

## 环境配置

### Docker

```plaintext
docker pull image.sourcefind.cn:5000/dcu/admin/base/oneflow:0.9.1-centos7.6-dtk-22.10.1-py39-latest
# <Your Image ID>用上面拉取docker镜像的ID替换
docker run --shm-size 16g --network=host --name=resnet50_oneflow --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $PWD/resnet50_oneflow:/home/resnet50_oneflow -it <Your Image ID> bash
cd /home/resnet50_oneflow
pip install -r requirements.txt
```

yuguo960516's avatar
yuguo960516 committed
30
## 数据集
yuguo's avatar
update  
yuguo committed
31

yuguo960516's avatar
yuguo960516 committed
32
33
34
我们考虑到imagenet数据集比较庞大,为了用户可以使用OneFlow快速进行Resnet50的训练验证,采用mini-imagenet小数据集。如果需要原始数据需要参考该链接https://github.com/Oneflow-Inc/models/tree/main/Vision/classification/image/resnet50#prepare-ofrecord-for-the-full-imagenet-dataset进行转换为ofrecord

    OFRECORD_PATH="./mini-imagenet/ofrecord"
yuguo's avatar
update  
yuguo committed
35

yuguo's avatar
update  
yuguo committed
36
制作完成的OFRrecord数据形式如下:
yuguo's avatar
update  
yuguo committed
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

```plaintext
ofrecord-imagenet
                | 
                train-00000-of-01024
                train-00001-of-01024
                ...
                train-01022-of-01024
                train-01023-of-01024
                validation-00000-of-00128
                validation-00001-of-00128
                ...
                validation-00126-of-00128
                validation-00127-of-00128
```

yuguo960516's avatar
yuguo960516 committed
53
## 训练及推理
yuguo's avatar
update  
yuguo committed
54

yuguo960516's avatar
yuguo960516 committed
55
### fp32训练
yuguo's avatar
update  
yuguo committed
56

yuguo960516's avatar
yuguo960516 committed
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
单机单卡训练命令:

    bash examples/train_graph_distributed_fp32.sh

修改examples/train_graph_distributed_fp32.sh中DEVICE_NUM_PER_NODE=4,单机四卡训练命令:

    bash examples/train_graph_distributed_fp32.sh
### fp16训练
单机单卡训练命令:

    bash examples/train_graph_distributed_fp16.sh
修改examples/train_graph_distributed_fp16.sh中DEVICE_NUM_PER_NODE=4,单机四卡训练命令:

    bash examples/train_graph_distributed_fp16.sh

### 推理
执行推理命令:

    bash examples/infer_graph.sh
yuguo's avatar
update  
yuguo committed
76
### 训练精度
yuguo960516's avatar
yuguo960516 committed
77
78
79
测试数据:mini-imagenet,使用的加速卡:4张DCU-Z100-16G。

根据模型情况填写表格:
yuguo's avatar
perf  
yuguo committed
80
81
| 卡数 | batch size | 类型 | Accuracy(%) |
| :------: | :------: | :------: | -------- |
yuguo's avatar
update  
yuguo committed
82
83
84
85
86
87
88
89
90
91
92
93
94
95
| 1 | 128 | fp32 | 76.5/300 epoches |
| 1 | 128 | fp16 | 76.3/300 epoches |
| 4 | 128 | fp32 | 76.5/300 epoches |
| 4 | 128 | fp16 | 76.3/300 epoches |
## 应用场景

### 算法类别

`图像分类`

### 热点应用行业

`制造,政府,医疗,科研`

yuguo960516yuguo's avatar
1.1  
yuguo960516yuguo committed
96
## 源码仓库及问题反馈
yuguo960516yuguo's avatar
v1.0  
yuguo960516yuguo committed
97

yuguo960516yuguo's avatar
v1.0  
yuguo960516yuguo committed
98
- https://developer.hpccube.com/codes/modelzoo/resnet50_oneflow
yuguo960516yuguo's avatar
v1.0  
yuguo960516yuguo committed
99

yuguo960516's avatar
yuguo960516 committed
100
## 参考
yuguo960516yuguo's avatar
v1.0  
yuguo960516yuguo committed
101

yuguo960516's avatar
yuguo960516 committed
102
103
* https://github.com/Oneflow-Inc/models/tree/main/Vision/classification/image/resnet50
* https://github.com/Oneflow-Inc/oneflow