• Haoran Li's avatar
    move distributed_init after get_batch_iterator · 34028c63
    Haoran Li authored
    Summary: There are constantly wait timeout issue for using multiple nodes, even setting copylocallytempdir:/ doesn't help, eg f105637629. It seems to be working after I moved distributed_init after get_batch_iterator, eg f106520580
    
    Reviewed By: myleott
    
    Differential Revision: D14817769
    
    fbshipit-source-id: edbb101a28d8082241c7bdd8c5500c9dad27647c
    34028c63
train.py 15.3 KB