Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
ResNet50_tensorflow
Commits
c71d1885
Commit
c71d1885
authored
Mar 30, 2023
by
qianyj
Browse files
update README
parent
3d61d6b3
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
48 additions
and
30 deletions
+48
-30
README.md
README.md
+48
-30
No files found.
README.md
View file @
c71d1885
# 模型名称
(此处需修改,用英文全称与简写)
# 模型名称
## 模型介绍
使用TensorFlow2进行ResNet50的训练
## 模型结构
...
...
@@ -11,80 +11,98 @@ ImageNet数据集转成TFRecord格式,可以参考以下[script](https://githu
## 训练
### 环境配置
使用
[
光源
](
https://www.sourcefind.cn/#/service-details
)
拉取训练的docker镜像:
*
训练镜像:docker pull image.sourcefind.cn:5000/dcu/admin/base/tensorflow:2.7.0-centos7.6-dtk-22.10.1-py37-latest
训练镜像:docker pull image.sourcefind.cn:5000/dcu/admin/base/tensorflow:2.7.0-centos7.6-dtk-22.10.1-py37-latest
python依赖安装:
pip install -r requirement.txt
### fp32训练
#### 单机单卡训练命令:
不打开xla:
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=128 --num_gpus=1 --use_synthetic_data=false --dtype=fp32
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=128 --num_gpus=1 --use_synthetic_data=false --dtype=fp32
打开xla:
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
TF_XLA_FLAGS="--tf_xla_auto_jit=2" python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=128 --num_gpus=1 --use_synthetic_data=false --dtype=fp32
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
TF_XLA_FLAGS="--tf_xla_auto_jit=2" python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=128 --num_gpus=1 --use_synthetic_data=false --dtype=fp32
#### 单机四卡训练指令:
不打开xla:
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=512 --num_gpus=4 --use_synthetic_data=false --dtype=fp32
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=512 --num_gpus=4 --use_synthetic_data=false --dtype=fp32
打开xla:
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
TF_XLA_FLAGS="--tf_xla_auto_jit=2" python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=512 --num_gpus=4 --use_synthetic_data=false --dtype=fp32
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
TF_XLA_FLAGS="--tf_xla_auto_jit=2" python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=512 --num_gpus=4 --use_synthetic_data=false --dtype=fp32
#### 多机多卡训练指令(以单机四卡模拟四卡四进程为例):
sed指令只需要执行一次,添加支持多卡运行的代码
sed -i '100 r configfile' models-master/official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py
sed -i '100 r configfile' models-master/official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py
不打开xla:
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
mpirun -np 4 --hostfile hostfile -mca btl self,tcp --allow-run-as-root --bind-to none scripts-run/single_process.sh
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
mpirun -np 4 --hostfile hostfile -mca btl self,tcp --allow-run-as-root --bind-to none scripts-run/single_process.sh
打开xla:
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
mpirun -np 4 --hostfile hostfile -mca btl self,tcp --allow-run-as-root --bind-to none scripts-run/single_process_xla.sh
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
mpirun -np 4 --hostfile hostfile -mca btl self,tcp --allow-run-as-root --bind-to none scripts-run/single_process_xla.sh
### fp16训练
#### 单机单卡训练指令
不打开xla:
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=128 --num_gpus=1 --use_synthetic_data=false --dtype=fp16
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=128 --num_gpus=1 --use_synthetic_data=false --dtype=fp16
打开xla:
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
TF_XLA_FLAGS="--tf_xla_auto_jit=2" python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=128 --num_gpus=1 --use_synthetic_data=false --dtype=fp16
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
TF_XLA_FLAGS="--tf_xla_auto_jit=2" python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=128 --num_gpus=1 --use_synthetic_data=false --dtype=fp16
#### 单机四卡训练指令
不打开xla:
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=512 --num_gpus=4 --use_synthetic_data=false --dtype=fp16
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=512 --num_gpus=4 --use_synthetic_data=false --dtype=fp16
打开xla:
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
TF_XLA_FLAGS="--tf_xla_auto_jit=2" python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=512 --num_gpus=4 --use_synthetic_data=false --dtype=fp16
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
TF_XLA_FLAGS="--tf_xla_auto_jit=2" python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=512 --num_gpus=4 --use_synthetic_data=false --dtype=fp16
#### 多机多卡训练指令(以单机四卡模拟四卡四进程为例)
sed指令只需要执行一次,添加支持多卡运行的代码
sed -i '100 r configfile' models-master/official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py
sed -i '100 r configfile' models-master/official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py
修改scripts-run/single_process.sh和scripts-run/single_process_xla.sh文件里的--dtype=fp16
不打开xla:
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
mpirun -np 4 --hostfile hostfile -mca btl self,tcp --allow-run-as-root --bind-to none scripts-run/single_process.sh
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
mpirun -np 4 --hostfile hostfile -mca btl self,tcp --allow-run-as-root --bind-to none scripts-run/single_process.sh
打开xla:
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
mpirun -np 4 --hostfile hostfile -mca btl self,tcp --allow-run-as-root --bind-to none scripts-run/single_process_xla.sh
export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
mpirun -np 4 --hostfile hostfile -mca btl self,tcp --allow-run-as-root --bind-to none scripts-run/single_process_xla.sh
## 性能和准确率数据
测试数据:
[
ImageNet的测试数据集
](
https://image-net.org/
"ImageNet数据集官网"
)
,使用的加速卡:DCU-Z00-16G
根据模型情况填写表格:
| 卡数 | batch size | 类型 | 性能 | Accuracy | 是否打开xla |
|
进程数 |
| :------: | :------: | :------: | :------: |:------: |
| 卡数 | batch size | 类型 | 性能 | Accuracy | 是否打开xla | 进程数 |
| :------: | :------: | :------: | :------: |:------: |
:------: |
| 4 | 512 | fp32 | 843 examples/second | 0.7628 | 否 | 单进程 |
| 4 | 512 | fp16 | - | 0.7616 | 否 | 单进程 |
| 4 | 512 | fp32 | - | 0.7608 | 否 | 四进程 |
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment