"TensorFlow2x/ComputeVision/vscode:/vscode.git/clone" did not exist on "7c14c9d1b0d2de4d6f92712c839c912dd5ec815a"
README.md 2.03 KB
Newer Older
huchen's avatar
huchen committed
1

qianyj's avatar
qianyj committed
2
3

TenorFlow 框架 训练 图像分类相关网络的代码,tensorflow 官方基准测试程序,使用的数据集是 imagenet。
huchen's avatar
huchen committed
4
5
6
7
8
9
10

# 测试运行

- 测试代码分为两部分,基础性能测试和大规模性能测试。

## 基础 benchmark

qianyj's avatar
qianyj committed
11
- 创建 TensorFlow 运行时环境后,以 resnet50 网络为例,计算其 batch_size=32 num_gpu=1 条件下不同精度下的性能。
huchen's avatar
huchen committed
12
13
14

### fp32 train

qianyj's avatar
qianyj committed
15
     python3 ./benchmarks-master/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --data_format=NCHW --batch_size=128 --model=resnet50 --save_model_steps=10020 --optimizer=momentum --variable_update=parameter_server  --print_training_accuracy=true  --eval_during_training_every_n_epochs=1  --nodistortions --num_gpus=1 --num_epochs=90 --weight_decay=1e-4 --data_dir=$data_dir_path   --use_fp16=False --data_name=imagenet --train_dir=$save_checkpoint_path
huchen's avatar
huchen committed
16
17
18

### fp16 train

qianyj's avatar
qianyj committed
19
    python3  ./benchmarks-master/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --data_format=NCHW --batch_size=128 --model=resnet50 --save_model_steps=10020 --optimizer=momentum --variable_update=parameter_server  --print_training_accuracy=true  --eval_during_training_every_n_epochs=1  --nodistortions --num_gpus=1 --num_epochs=90 --weight_decay=1e-4 --data_dir=$data_dir_path   --use_fp16=True --data_name=imagenet --train_dir=$save_checkpoint_path
huchen's avatar
huchen committed
20
21
22
23
24

## 大规模测试

### 单卡

qianyj's avatar
qianyj committed
25
    HIP_VISIBLE_DEVICES=0  python3 ./benchmarks-master/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --data_format=NCHW --batch_size=128 --model=resnet50 --save_model_steps=10020 --optimizer=momentum --variable_update=parameter_server  --print_training_accuracy=true  --eval_during_training_every_n_epochs=1  --nodistortions --num_gpus=1 --num_epochs=90 --weight_decay=1e-4 --data_dir=$data_dir_path   --use_fp16=True --data_name=imagenet --train_dir=$save_checkpoint_path
huchen's avatar
huchen committed
26
27
28

### 多卡

qianyj's avatar
qianyj committed
29
    mpirun -np ${num_gpu} --hostfile hostfile --bind-to none scripts-run/single_process.sh 
huchen's avatar
huchen committed
30
31

# 参考资料
qianyj's avatar
qianyj committed
32
33
[https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks]
[https://github.com/horovod/horovod]
huchen's avatar
huchen committed
34