README.md

# ResNet50

## 论文

`Deep Residual Learning for Image Recognition`

- https://arxiv.org/abs/1512.03385

## 模型结构

ResNet50网络中包含了49个卷积层、1个全连接层等。

![ResNet50模型结构](https://developer.sourcefind.cn/codes/modelzoo/resnet50-pytorch/-/raw/main/doc/ResNet50.png?inline=false)

## 算法原理

ResNet50使用了多个具有残差连接的残差块来解决梯度消失或梯度爆炸问题，并使得网络可以向更深层发展。

![算法原理](https://developer.sourcefind.cn/codes/modelzoo/resnet50-pytorch/-/raw/main/doc/Residual_Block.png?inline=false)

## 环境配置

### Docker（方法一）

```
拉取镜像：
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
创建并启动容器：
docker run --shm-size 16g --network=host --name=resnet50_pytorch --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined  -v /opt/hyhal:/opt/hyhal:ro -v $PWD/resnet50-pytorch:/home/resnet50_pytorch -it <Your Image ID> bash
安装依赖：
pip install -r requirements.txt 
```

### Dockerfile（方法二）

```
cd resnet50-pytorch/docker
docker build --no-cache -t resnet50_pytorch:latest .
docker run --rm --shm-size 16g --network=host --name=resnet50_pytorch --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /opt/hyhal:/opt/hyhal:ro -v $PWD/../../resnet50-pytorch:/home/resnet50_pytorch -it resnet50_pytorch:latest bash
```

### Anaconda（方法三）

1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装：
https://developer.sourcefind.cn/tool/

```
DTK驱动：dtk24.04.1
python：python3.10
torch:2.1.0
torchvision:0.16.0
apex:1.1
```

`Tips：以上DTK、python、torch等DCU相关工具包，版本需要严格一一对应`

2、其它非特殊库参照requirements.txt安装

## 数据集

下载ImageNet数据集：[imagenet-2012](https://image-net.org/download.php)

参考[scrips/extract_ILSVRC.sh](https://developer.sourcefind.cn/codes/modelzoo/resnet50-pytorch/-/blob/main/scrips/extract_ILSVRC.sh)处理数据集，处理后数据集结构如下：

```
data
    |
    train
        |
        n01440764
        n01443537
        ...
    val
        |
        n01440764
        n01443537
        ...
```


## 训练

### 单卡训练（单精度）

```
export HIP_VISIBLE_DEVICES=0
python3 train.py --batch-size=64 --arch=resnet50 -j 6 --epochs=90 --save-path=/path/to/{save_model_dir} /path/to/{ImageNet_pytorch_data_dir}/
```

其中，'/path/to/{save_model_dir}' 修改为model 想要保存的路径，'/path/to/{ImageNet_pytorch_data_dir}/'修改为ImageNet数据集所在路径，后续混合精度训练与多卡训练脚本修改相同，不再说明。

### 单卡训练（混合精度）

```
export HIP_VISIBLE_DEVICES=0
python3 train.py --batch-size=64 --arch=resnet50 -j 6 --epochs=90 --amp --opt-level O1 --loss-scale=dynamic --save-path=/path/to/{save_model_dir} /path/to/{ImageNet_pytorch_data_dir}/
```

### 单机四卡训练（单精度）

```
cd scrips
chmod +x single_process.sh
mpirun --allow-run-as-root --bind-to none -np 4 single_process.sh localhost resnet50 64
```

### 单机四卡训练（混合精度）

```
cd scrips
chmod +x single_process_amp.sh
mpirun --allow-run-as-root --bind-to none -np 4 single_process_amp.sh localhost resnet50 64
```

## result

![result](https://developer.sourcefind.cn/codes/modelzoo/resnet50-pytorch/-/raw/main/doc/result.png?inline=false)

## 精度

测试数据使用ImageNet数据集，使用的加速卡是DCUZ100L。

| 卡数 | learning rate | batch size | 类型 | Accuracy(%) |
| ---- | ------------- | ---------- | ---- | ----------- |
| 4    | 0.1           | 256        | fp32 | 75.94       |
| 4    | 0.1           | 256        | amp  | 75.99       |
| 8    | 0.1           | 256        | fp32 | 75.914      |
| 8    | 0.1           | 256        | amp  | 75.64       |
| 8    | 0.15          | 512        | fp32 | 75.902      |
| 8    | 0.15          | 512        | amp  | 75.90       |

## 应用场景

### 算法类别

`图像分类`

### 热点应用行业

`制造,政府,医疗,科研`

# 源码仓库及问题反馈

https://developer.sourcefind.cn/codes/modelzoo/resnet50-pytorch

# 参考资料

https://github.com/pytorch/examples/tree/master/imagenet