README.md 700 Bytes
Newer Older
huchen's avatar
huchen committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# 简介

该测试用例用于PyTorch分类模型性能测试

*  该脚本可支持PyTorch的nccl和gloo分布式通信库方式 

# 运行

## 单卡

    python3 `pwd`/main_bench.py --batch-size=64 --a=resnet50 -j 24 --epochs=1 --synthetic /path/to/any/existing/folder
  
## 单机多卡
    mpirun -np 4  --bind-to none `pwd`/single_process.sh localhost inception_v3 64

## 分布式多卡

    mpirun -np $np --hostfile hostfile --bind-to none `pwd`/single_process.sh $dist_url resnet50 64

hostfile格式参考:
     
    node1 slots=4  
    node2 slots=4



# 参考


[https://github.com/pytorch/examples/tree/master/imagenet](https://github.com/pytorch/examples/tree/master/imagenet)