• Vinh Nguyen's avatar
    Fix apex distributed training (#1124) · c187c2b1
    Vinh Nguyen authored
    * adding mixed precision training with Apex
    
    * fix APEX default optimization level
    
    * adding python version check for apex
    
    * fix LINT errors and raise exceptions if apex not available
    
    * fixing apex distributed training
    
    * fix throughput calculation: include forward pass
    
    * remove torch.cuda.set_device(args.gpu) as it's already called in init_distributed_mode
    
    * fix linter: new line
    
    * move Apex initialization code back to the beginning of main
    
    * move apex initialization to before lr_scheduler - for peace of mind. Though, doing apex initialization after lr_scheduler seems to work fine as well
    c187c2b1
train.py 11.7 KB