Commit 584fbdd5 authored by Sugon_ldc's avatar Sugon_ldc
Browse files

modify readme

parent e129194a
# Convolutional Network for Image Classification in PyTorch
In this repository you will find implementations of various image classification models.
Detailed information on each model can be found here:
# ResNet50v1.5
## 论文
## Table Of Contents
* [Models](#models)
* [Validation accuracy results](#validation-accuracy-results)
* [Training performance results](#training-performance-results)
* [Training performance: NVIDIA DGX A100 (8x A100 80GB)](#training-performance-nvidia-dgx-a100-8x-a100-80gb)
* [Training performance: NVIDIA DGX-1 16GB (8x V100 16GB)](#training-performance-nvidia-dgx-1-16gb-8x-v100-16gb)
* [Training performance: NVIDIA DGX-2 (16x V100 32GB)](#training-performance-nvidia-dgx-2-16x-v100-32gb)
* [Model comparison](#model-comparison)
* [Accuracy vs FLOPS](#accuracy-vs-flops)
* [Latency vs Throughput on different batch sizes](#latency-vs-throughput-on-different-batch-sizes)
## Models
The following table provides links to where you can find additional information on each model:
| **Model** | **Link**|
|:-:|:-:|
| resnet50 | [README](./resnet50v1.5/README.md) |
| resnext101-32x4d | [README](./resnext101-32x4d/README.md) |
| se-resnext101-32x4d | [README](./se-resnext101-32x4d/README.md) |
| EfficientNet | [README](./efficientnet/README.md) |
## Validation accuracy results
Our results were obtained by running the applicable
training scripts in the 20.12 PyTorch NGC container
on NVIDIA DGX-1 with (8x V100 16GB) GPUs.
The specific training script that was run is documented
in the corresponding model's README.
The following table shows the validation accuracy results of the
three classification models side-by-side.
| **Model** | **Mixed Precision Top1** | **Mixed Precision Top5** | **32 bit Top1** | **32 bit Top5** |
|:----------------------:|:------------------------:|:------------------------:|:---------------:|:---------------:|
| efficientnet-b0 | 77.63 | 93.82 | 77.31 | 93.76 |
| efficientnet-b4 | 82.98 | 96.44 | 82.92 | 96.43 |
| efficientnet-widese-b0 | 77.89 | 94.00 | 77.97 | 94.05 |
| efficientnet-widese-b4 | 83.28 | 96.45 | 83.30 | 96.47 |
| resnet50 | 78.60 | 94.19 | 78.69 | 94.16 |
| resnext101-32x4d | 80.43 | 95.06 | 80.40 | 95.04 |
| se-resnext101-32x4d | 81.00 | 95.48 | 81.09 | 95.45 |
## Training performance results
### Training performance: NVIDIA DGX A100 (8x A100 80GB)
Our results were obtained by running the applicable
training scripts in the 21.03 PyTorch NGC container
on NVIDIA DGX A100 with (8x A100 80GB) GPUs.
Performance numbers (in images per second)
were averaged over an entire training epoch.
The specific training script that was run is documented
in the corresponding model's README.
The following table shows the training accuracy results of
all the classification models side-by-side.
| **Model** | **Mixed Precision** | **TF32** | **Mixed Precision Speedup** |
|:----------------------:|:-------------------:|:----------:|:---------------------------:|
| efficientnet-b0 | 16652 img/s | 8193 img/s | 2.03 x |
| efficientnet-b4 | 2570 img/s | 1223 img/s | 2.1 x |
| efficientnet-widese-b0 | 16368 img/s | 8244 img/s | 1.98 x |
| efficientnet-widese-b4 | 2585 img/s | 1223 img/s | 2.11 x |
| resnet50 | 16621 img/s | 7248 img/s | 2.29 x |
| resnext101-32x4d | 7925 img/s | 3471 img/s | 2.28 x |
| se-resnext101-32x4d | 5779 img/s | 2991 img/s | 1.93 x |
### Training performance: NVIDIA DGX-1 16G (8x V100 16GB)
Our results were obtained by running the applicable
training scripts in the 21.03 PyTorch NGC container
on NVIDIA DGX-1 with (8x V100 16GB) GPUs.
Performance numbers (in images per second)
were averaged over an entire training epoch.
The specific training script that was run is documented
in the corresponding model's README.
The following table shows the training accuracy results of all the
classification models side-by-side.
| **Model** | **Mixed Precision** | **FP32** | **Mixed Precision Speedup** |
|:----------------------:|:-------------------:|:----------:|:---------------------------:|
| efficientnet-b0 | 7789 img/s | 4672 img/s | 1.66 x |
| efficientnet-b4 | 1366 img/s | 616 img/s | 2.21 x |
| efficientnet-widese-b0 | 7875 img/s | 4592 img/s | 1.71 x |
| efficientnet-widese-b4 | 1356 img/s | 612 img/s | 2.21 x |
| resnet50 | 8322 img/s | 2855 img/s | 2.91 x |
| resnext101-32x4d | 4065 img/s | 1133 img/s | 3.58 x |
| se-resnext101-32x4d | 2971 img/s | 1004 img/s | 2.95 x |
## Model Comparison
### Accuracy vs FLOPS
![ACCvsFLOPS](./img/ACCvsFLOPS.png)
Plot describes relationship between floating point operations
needed for computing forward pass on a 224px x 224px image,
for the implemented models.
Dot size indicates number of trainable parameters.
### Latency vs Throughput on different batch sizes
![LATvsTHR](./img/LATvsTHR.png)
Plot describes relationship between
inference latency, throughput and batch size
for the implemented models.
该模型是ResNet50v1模型的修改版本,和之前版本的区别在于,在需要下采样的瓶颈模块中,v1在第一次1x1卷积中的stride=2,而v1.5在3x3卷积中的stride=2
v1版本论文如下:
`Deep Residual Learning for Image Recognition`
- [https://arxiv.org/pdf/1512.03385.pdf](https://arxiv.org/pdf/1512.03385.pdf)
该模型的初始化过程如下面论文中所述:
`Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification`
- [https://arxiv.org/pdf/1502.01852.pdf](https://arxiv.org/pdf/1502.01852.pdf)
## 模型结构
ResNet50v1.5是一个具有50个卷积层的深度残差神经网络模型,它通过残差连接和批归一化层来实现更深层次的特征提取和优化训练。
![ResNet50](ResNet50.png)
## 算法原理
ResNet50v1.5的算法原理是利用残差连接和深层卷积层来构建更深的神经网络,解决梯度消失和模型退化问题,从而实现更好的特征提取和模型优化。
![Residual_Block](Residual_Block.png)
## 环境配置
### Docker(方法一)
此处提供[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04.1-py38-latest
docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
```
### Dockerfile(方法二)
此处提供dockerfile的使用方法
```
cd ./docker
docker build --no-cache -t resnet:v1.5 .
docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
```
### Anaconda(方法三)
此处提供本地配置、编译的详细步骤,例如:
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
```
DTK驱动:dtk23.04.1
python:python3.8
torch:1.10
torchvision:0.10
```
`Tips:以上dtk驱动、python、paddle等DCU相关工具版本需要严格一一对应`
## 数据集
Imagenet
- https://image-net.org/
参考[scrips/extract_ILSVRC.sh](https://developer.hpccube.com/codes/modelzoo/resnet50-pytorch/-/blob/main/scrips/extract_ILSVRC.sh)处理数据集,处理后数据集结构如下:
```
data
|
train
|
n01440764
n01443537
...
val
|
n01440764
n01443537
...
```
## 训练
### 单机单卡(fp16)
```
bash train_single_fp16.sh
```
### 单机多卡(fp16)
```
bash train_multi_fp16.sh
```
### 单机单卡(fp32)
```
bash train_single_fp32.sh
```
### 单机多卡(fp32)
```
bash train_multi_fp32.sh
```
## result
![result](result.png)
### 精度
测试数据:Imagenet,使用的加速卡:Z100L。
根据测试结果情况填写表格:
| 卡数 | Top1 | Top5 |
| :------: | :------: | :------: |
| 4 | 78.6% | 94.12% |
## 应用场景
### 算法类别
`图像分类`
### 热点应用行业
`制造,政府,科研`
## 源码仓库及问题反馈
- [https://developer.hpccube.com/codes/modelzoo/resnet50v1.5_pytorch](https://developer.hpccube.com/codes/modelzoo/resnet50v1.5_pytorch)
## 参考资料
- [https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/ConvNets/resnet50v1.5](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/ConvNets/resnet50v1.5)
# Convolutional Network for Image Classification in PyTorch
In this repository you will find implementations of various image classification models.
Detailed information on each model can be found here:
## Table Of Contents
* [Models](#models)
* [Validation accuracy results](#validation-accuracy-results)
* [Training performance results](#training-performance-results)
* [Training performance: NVIDIA DGX A100 (8x A100 80GB)](#training-performance-nvidia-dgx-a100-8x-a100-80gb)
* [Training performance: NVIDIA DGX-1 16GB (8x V100 16GB)](#training-performance-nvidia-dgx-1-16gb-8x-v100-16gb)
* [Training performance: NVIDIA DGX-2 (16x V100 32GB)](#training-performance-nvidia-dgx-2-16x-v100-32gb)
* [Model comparison](#model-comparison)
* [Accuracy vs FLOPS](#accuracy-vs-flops)
* [Latency vs Throughput on different batch sizes](#latency-vs-throughput-on-different-batch-sizes)
## Models
The following table provides links to where you can find additional information on each model:
| **Model** | **Link**|
|:-:|:-:|
| resnet50 | [README](./resnet50v1.5/README.md) |
| resnext101-32x4d | [README](./resnext101-32x4d/README.md) |
| se-resnext101-32x4d | [README](./se-resnext101-32x4d/README.md) |
| EfficientNet | [README](./efficientnet/README.md) |
## Validation accuracy results
Our results were obtained by running the applicable
training scripts in the 20.12 PyTorch NGC container
on NVIDIA DGX-1 with (8x V100 16GB) GPUs.
The specific training script that was run is documented
in the corresponding model's README.
The following table shows the validation accuracy results of the
three classification models side-by-side.
| **Model** | **Mixed Precision Top1** | **Mixed Precision Top5** | **32 bit Top1** | **32 bit Top5** |
|:----------------------:|:------------------------:|:------------------------:|:---------------:|:---------------:|
| efficientnet-b0 | 77.63 | 93.82 | 77.31 | 93.76 |
| efficientnet-b4 | 82.98 | 96.44 | 82.92 | 96.43 |
| efficientnet-widese-b0 | 77.89 | 94.00 | 77.97 | 94.05 |
| efficientnet-widese-b4 | 83.28 | 96.45 | 83.30 | 96.47 |
| resnet50 | 78.60 | 94.19 | 78.69 | 94.16 |
| resnext101-32x4d | 80.43 | 95.06 | 80.40 | 95.04 |
| se-resnext101-32x4d | 81.00 | 95.48 | 81.09 | 95.45 |
## Training performance results
### Training performance: NVIDIA DGX A100 (8x A100 80GB)
Our results were obtained by running the applicable
training scripts in the 21.03 PyTorch NGC container
on NVIDIA DGX A100 with (8x A100 80GB) GPUs.
Performance numbers (in images per second)
were averaged over an entire training epoch.
The specific training script that was run is documented
in the corresponding model's README.
The following table shows the training accuracy results of
all the classification models side-by-side.
| **Model** | **Mixed Precision** | **TF32** | **Mixed Precision Speedup** |
|:----------------------:|:-------------------:|:----------:|:---------------------------:|
| efficientnet-b0 | 16652 img/s | 8193 img/s | 2.03 x |
| efficientnet-b4 | 2570 img/s | 1223 img/s | 2.1 x |
| efficientnet-widese-b0 | 16368 img/s | 8244 img/s | 1.98 x |
| efficientnet-widese-b4 | 2585 img/s | 1223 img/s | 2.11 x |
| resnet50 | 16621 img/s | 7248 img/s | 2.29 x |
| resnext101-32x4d | 7925 img/s | 3471 img/s | 2.28 x |
| se-resnext101-32x4d | 5779 img/s | 2991 img/s | 1.93 x |
### Training performance: NVIDIA DGX-1 16G (8x V100 16GB)
Our results were obtained by running the applicable
training scripts in the 21.03 PyTorch NGC container
on NVIDIA DGX-1 with (8x V100 16GB) GPUs.
Performance numbers (in images per second)
were averaged over an entire training epoch.
The specific training script that was run is documented
in the corresponding model's README.
The following table shows the training accuracy results of all the
classification models side-by-side.
| **Model** | **Mixed Precision** | **FP32** | **Mixed Precision Speedup** |
|:----------------------:|:-------------------:|:----------:|:---------------------------:|
| efficientnet-b0 | 7789 img/s | 4672 img/s | 1.66 x |
| efficientnet-b4 | 1366 img/s | 616 img/s | 2.21 x |
| efficientnet-widese-b0 | 7875 img/s | 4592 img/s | 1.71 x |
| efficientnet-widese-b4 | 1356 img/s | 612 img/s | 2.21 x |
| resnet50 | 8322 img/s | 2855 img/s | 2.91 x |
| resnext101-32x4d | 4065 img/s | 1133 img/s | 3.58 x |
| se-resnext101-32x4d | 2971 img/s | 1004 img/s | 2.95 x |
## Model Comparison
### Accuracy vs FLOPS
![ACCvsFLOPS](./img/ACCvsFLOPS.png)
Plot describes relationship between floating point operations
needed for computing forward pass on a 224px x 224px image,
for the implemented models.
Dot size indicates number of trainable parameters.
### Latency vs Throughput on different batch sizes
![LATvsTHR](./img/LATvsTHR.png)
Plot describes relationship between
inference latency, throughput and batch size
for the implemented models.
export HSA_FORCE_FINE_GRAIN_PCIE=1
export USE_MIOPEN_BATCHNORM=1
export HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export DATA_DIR=/data/imagenet2012 #数据集路径
python ./multiproc.py --nproc_per_node 8 ./launch.py --model resnet50 --precision AMP --mode convergence --platform Z100L ${DATA_DIR} --data-backend pytorch --epochs 100 --batch-size 128 --workspace ${1:-./run} --raport-file raport.json 2>&1 | tee resnet50_multi_`date +%Y%m%d%H%M%S`.log
export HSA_FORCE_FINE_GRAIN_PCIE=1
export USE_MIOPEN_BATCHNORM=1
export HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export DATA_DIR=/data/imagenet2012 #数据集路径
python ./multiproc.py --nproc_per_node 8 ./launch.py --model resnet50 --precision FP32 --mode convergence --platform Z100L ${DATA_DIR} --data-backend pytorch --epochs 100 --batch-size 128 --workspace ${1:-./run} --raport-file raport.json 2>&1 | tee resnet50_multi_`date +%Y%m%d%H%M%S`.log
export USE_MIOPEN_BATCHNORM=1
export HIP_VISIBLE_DEVICES=0
export DATA_DIR=/data/imagenet2012 #数据集路径
python ./multiproc.py --nproc_per_node 8 ./launch.py --model resnet50 --precision AMP --mode convergence --platform Z100L ${DATA_DIR} --data-backend pytorch --epochs 100 --batch-size 128 --workspace ${1:-./run} --raport-file raport.json 2>&1 | tee resnet50_multi_`date +%Y%m%d%H%M%S`.log
export USE_MIOPEN_BATCHNORM=1
export HIP_VISIBLE_DEVICES=0
export DATA_DIR=/data/imagenet2012 #数据集路径
python ./multiproc.py --nproc_per_node 8 ./launch.py --model resnet50 --precision FP32 --mode convergence --platform Z100L ${DATA_DIR} --data-backend pytorch --epochs 100 --batch-size 128 --workspace ${1:-./run} --raport-file raport.json 2>&1 | tee resnet50_multi_`date +%Y%m%d%H%M%S`.log
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment