references/classification/train.py · c187c2b12d86c3909e59a40dbe49555d85b98703 · OpenDAS / vision

Fix apex distributed training (#1124) · c187c2b1

Vinh Nguyen authored Jul 19, 2019

* adding mixed precision training with Apex

* fix APEX default optimization level

* adding python version check for apex

* fix LINT errors and raise exceptions if apex not available

* fixing apex distributed training

* fix throughput calculation: include forward pass

* remove torch.cuda.set_device(args.gpu) as it's already called in init_distributed_mode

* fix linter: new line

* move Apex initialization code back to the beginning of main

* move apex initialization to before lr_scheduler - for peace of mind. Though, doing apex initialization after lr_scheduler seems to work fine as well

c187c2b1

train.py 11.7 KB

Replace train.py