Update README

5ab49e19 · “qianyj” · 4e398183 · 5ab49e19
Commit 5ab49e19 authored Aug 25, 2023 by “qianyj”
Show whitespace changes
Inline Side-by-side

Showing with 49 additions and 34 deletions

README.md README.md +49 -34

No files found.
--- a/README.md
+++ b/README.md
@@ -16,14 +16,14 @@ ResNet50使用了多个具有残差连接的残差块来解决梯度消失或梯
 ```
 docker pull image.sourcefind.cn:5000/dcu/admin/base/tensorflow:2.7.0-centos7.6-dtk-22.10.1-py38-latest
 # <Your Image ID>用上面拉取docker镜像的ID替换
-docker run --shm-size 16g --network=host --name=ResNet50-TensorFlow2x --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $PWD/ResNet50-TensorFlow2x:/home/ResNet50-TensorFlow2x -it <Your Image ID> bash
+docker run --shm-size 16g --network=host --name=resnet50_tensorFlow --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $PWD/resnet50_tensorflow:/home/resnet50_tensorflow -it <Your Image ID> bash
 pip install -r requirements.txt
 ```
 ### Dockerfile(方法二)
 ```
-cd ResNet50-TensorFlow2x/docker
+cd resnet50_tensorflow/docker
-docker build --no-cache -t ResNet50-TensorFlow2x:latest .
+docker build --no-cache -t resnet50_tensorflow:latest .
-docker run --rm --shm-size 16g --network=host --name=ResNet50-TensorFlow2x --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $PWD/ResNet50-TensorFlow2x:/home/ResNet50-TensorFlow2x -it ResNet50-TensorFlow2x:latest bash
+docker run --rm --shm-size 16g --network=host --name=resnet50_tensorflow --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $PWD/../../resnet50_tensorflow:/home/resnet50_tensorflow -it resnet50_tensorflow:latest bash
 ```
 ### Anaconda(方法三)
@@ -40,60 +40,69 @@ tensorboard: 2.7
 `Tips:以上dtk、python、tensorflow等DCU相关工具版本需要严格一一对应`
 2、其他非特殊库参照requirements.txt安装
 ```
-pip3 install -r requirements.txt
+pip3 install -r requirements.txt  --no-deps
 ```
 ## 数据集
+1、真实数据
 使用ImageNet数据集，并且需要转成TFRecord格式
 ImageNet数据集可以[官网](https://image-net.org/ "ImageNet数据集官网")下载、百度搜索或者联系我们
 ImageNet数据集转成TFRecord格式，可以参考以下[script](https://github.com/tensorflow/tpu/blob/master/tools/datasets/imagenet_to_gcs.py)和[README](https://github.com/tensorflow/tpu/tree/master/tools/datasets#imagenet_to_gcspy)
+制作完成的TFRrecord数据形式如下：
+tfrecord-imagenet
+                | 
+                train-00000-of-01024
+                train-00000-of-01024
+                ...
+                train-01023-of-01024
+                validation-00000-of-00128
+                validation-00001-of-00128
+                ...
+                validation-00127-of-00128
+2、合成数据
+基于随机合成的数据，不需要下载ImageNet数据集，执行网络训练时只需要把程序执行语句中的--use_synthetic_data设置为true即可
 ## 训练
-### 环境配置
-使用[光源](https://www.sourcefind.cn/#/service-details)拉取训练的docker镜像：
-训练镜像：docker pull image.sourcefind.cn:5000/dcu/admin/base/tensorflow:2.7.0-centos7.6-dtk-22.10.1-py37-latest
-python依赖安装：
-    pip3 install -r requirements.txt
 ### fp32训练
 #### 单机单卡训练命令：
 不打开xla:
-    export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH  
+    export PYTHONPATH=/path/to/resnet50_tensorFlow:$PYTHONPATH  
-    python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=128 --num_gpus=1  --use_synthetic_data=false --dtype=fp32
+    python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=128 --num_gpus=1  --use_synthetic_data=false  --train_epochs=90  --dtype=fp32
 打开xla:
-    export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
+    export PYTHONPATH=/path/to/resnet50_tensorflow:$PYTHONPATH
-    TF_XLA_FLAGS="--tf_xla_auto_jit=2" python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=128 --num_gpus=1  --use_synthetic_data=false --dtype=fp32
+    TF_XLA_FLAGS="--tf_xla_auto_jit=2" python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=128 --num_gpus=1  --use_synthetic_data=false  --train_epochs=90  --dtype=fp32
 #### 单机四卡训练指令：
 不打开xla:
-    export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
+    export PYTHONPATH=/path/to/resnet50_tensorflow:$PYTHONPATH
-    python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=512 --num_gpus=4  --use_synthetic_data=false --dtype=fp32
+    python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=512 --num_gpus=4  --use_synthetic_data=false  --train_epochs=90  --dtype=fp32
 打开xla:
-    export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
+    export PYTHONPATH=/path/to/resnet50_tensorflow:$PYTHONPATH
-    TF_XLA_FLAGS="--tf_xla_auto_jit=2" python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=512 --num_gpus=4  --use_synthetic_data=false --dtype=fp32
+    TF_XLA_FLAGS="--tf_xla_auto_jit=2" python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=512 --num_gpus=4  --train_epochs=90  --use_synthetic_data=false --dtype=fp32
 #### 多机多卡训练指令(以单机四卡模拟四卡四进程为例)：
 sed指令只需要执行一次，添加支持多卡运行的代码
-    sed -i '100 r configfile' models-master/official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py
+    sed -i '100 r configfile' official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py
 不打开xla：
-    export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
+    export PYTHONPATH=/path/to/resnet50_tensorflow:$PYTHONPATH
    mpirun -np 4 --hostfile hostfile  -mca btl self,tcp  --allow-run-as-root  --bind-to none scripts-run/single_process.sh
 打开xla：
-    export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
+    export PYTHONPATH=/path/to/resnet50_tensorflow:$PYTHONPATH
    mpirun -np 4 --hostfile hostfile  -mca btl self,tcp  --allow-run-as-root  --bind-to none scripts-run/single_process_xla.sh
 ### fp16训练
@@ -101,42 +110,42 @@ sed指令只需要执行一次，添加支持多卡运行的代码
 不打开xla：
-    export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
+    export PYTHONPATH=/path/to/resnet50_tensorFlow:$PYTHONPATH
-    python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=128 --num_gpus=1  --use_synthetic_data=false --dtype=fp16
+    python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=128 --num_gpus=1  --use_synthetic_data=false --train_epochs=90  --dtype=fp16
 打开xla：
-    export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
+    export PYTHONPATH=/path/to/resnet50_tensorflow:$PYTHONPATH
-    TF_XLA_FLAGS="--tf_xla_auto_jit=2" python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=128 --num_gpus=1  --use_synthetic_data=false --dtype=fp16
+    TF_XLA_FLAGS="--tf_xla_auto_jit=2" python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=128 --num_gpus=1  --train_epochs=90  --use_synthetic_data=false --dtype=fp16
 #### 单机四卡训练指令
 不打开xla:
-    export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
+    export PYTHONPATH=/path/to/resnet50_tensorflow:$PYTHONPATH
-    python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=512 --num_gpus=4  --use_synthetic_data=false --dtype=fp16
+    python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=512 --num_gpus=4  --train_epochs=90  --use_synthetic_data=false --dtype=fp16
 打开xla：
-    export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
+    export PYTHONPATH=/path/to/resnet50_tensorflow:$PYTHONPATH
-    TF_XLA_FLAGS="--tf_xla_auto_jit=2" python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=512 --num_gpus=4  --use_synthetic_data=false --dtype=fp16
+    TF_XLA_FLAGS="--tf_xla_auto_jit=2" python3 official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=512 --num_gpus=4  --train_epochs=90  --use_synthetic_data=false --dtype=fp16
 #### 多机多卡训练指令(以单机四卡模拟四卡四进程为例)
 sed指令只需要执行一次，添加支持多卡运行的代码
-    sed -i '100 r configfile' models-master/official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py
+    sed -i '100 r configfile' official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py
 修改scripts-run/single_process.sh和scripts-run/single_process_xla.sh文件里的--dtype=fp16
 不打开xla:
-    export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
+    export PYTHONPATH=/path/to/resnet50_tensorflow:$PYTHONPATH
    mpirun -np 4 --hostfile hostfile  -mca btl self,tcp  --allow-run-as-root  --bind-to none scripts-run/single_process.sh
 打开xla：
-    export PYTHONPATH=/path/to/ResNet50_TensorFlow2:$PYTHONPATH
+    export PYTHONPATH=/path/to/resnet50_tensorflow:$PYTHONPATH
    mpirun -np 4 --hostfile hostfile  -mca btl self,tcp  --allow-run-as-root  --bind-to none scripts-run/single_process_xla.sh
@@ -150,6 +159,12 @@ sed指令只需要执行一次，添加支持多卡运行的代码
 | 4 | 512 | fp32 |  0.7608 | 否 | 四进程 |
 | 4 | 512 | fp16 |  0.7615 | 否 | 四进程 |
+## 应用场景
+### 算法类别
+`图像分类`
+### 热点应用行业
+`制造,政府,医疗,科研`
 ## 源码仓库及问题反馈
 * https://developer.hpccube.com/codes/modelzoo/resnet50_tensorflow