README.md 1.87 KB
Newer Older
yuguo960516's avatar
yuguo960516 committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Resnet50
## 模型介绍
使用OneFlow进行Resnet50的训练与推理
## 模型结构
Resnet50 网络中包含了 49 个卷积层、1个全连接层等。
## 数据集
我们考虑到imagenet数据集比较庞大,为了用户可以使用OneFlow快速进行Resnet50的训练验证,采用mini-imagenet小数据集。如果需要原始数据需要参考该链接https://github.com/Oneflow-Inc/models/tree/main/Vision/classification/image/resnet50#prepare-ofrecord-for-the-full-imagenet-dataset进行转换为ofrecord

    OFRECORD_PATH="./mini-imagenet/ofrecord"
## 训练及推理
### 环境配置
提供[光源](https://www.sourcefind.cn/#/service-details)拉取的docker镜像:image.sourcefind.cn:5000/dcu/admin/base/oneflow:0.9.1-centos7.6-dtk-22.10.1-py39-latest
### fp32训练
单机单卡训练命令:

    bash examples/train_graph_distributed_fp32.sh

修改examples/train_graph_distributed_fp32.sh中DEVICE_NUM_PER_NODE=4,单机四卡训练命令:

    bash examples/train_graph_distributed_fp32.sh
### fp16训练
单机单卡训练命令:

    bash examples/train_graph_distributed_fp16.sh
修改examples/train_graph_distributed_fp16.sh中DEVICE_NUM_PER_NODE=4,单机四卡训练命令:

    bash examples/train_graph_distributed_fp16.sh

### 推理
执行推理命令:

    bash examples/infer_graph.sh
yuguo's avatar
perf  
yuguo committed
33
## 训练精度
yuguo960516's avatar
yuguo960516 committed
34
35
36
测试数据:mini-imagenet,使用的加速卡:4张DCU-Z100-16G。

根据模型情况填写表格:
yuguo's avatar
perf  
yuguo committed
37
38
| 卡数 | batch size | 类型 | Accuracy(%) |
| :------: | :------: | :------: | -------- |
yuguo960516yuguo's avatar
v1.0  
yuguo960516yuguo committed
39
40
| 1 | 128 | fp32 | 76.5/50 epoch |
| 1 | 128 | fp16 | 76.3/50 epoch |
yuguo's avatar
perf  
yuguo committed
41
| 4 | 128 | fp32 | 76.5/50 epoch |
yuguo960516yuguo's avatar
v1.0  
yuguo960516yuguo committed
42
43
44
| 4 | 128 | fp16 | 76.3/50 epoch |
## 历史版本

yuguo960516yuguo's avatar
v1.0  
yuguo960516yuguo committed
45
- https://developer.hpccube.com/codes/modelzoo/resnet50_oneflow
yuguo960516yuguo's avatar
v1.0  
yuguo960516yuguo committed
46

yuguo960516's avatar
yuguo960516 committed
47
## 参考
yuguo960516yuguo's avatar
v1.0  
yuguo960516yuguo committed
48

yuguo960516's avatar
yuguo960516 committed
49
50
* https://github.com/Oneflow-Inc/models/tree/main/Vision/classification/image/resnet50
* https://github.com/Oneflow-Inc/oneflow