• Yifan Xiong's avatar
    Benchmark - Fix torch.dist init issue with multiple models (#495) · 644b5395
    Yifan Xiong authored
    Fix potential barrier timeout in init_process_group due to race
    condition of using the same port. Change to different ports when running
    multiple models sequentially in one process.
    For example, when running vgg11/13/16/19, will use port 29501~29504
    respectively.
    644b5395
pytorch_base.py 11.7 KB