Avoid unnecessary DDP synchronization when gradient_accumulation_steps > 1 (#7742)
* use DDP no_sync when possible * fix is_nlp_available addition mistake * reformat trainer.py * reformat trainer.py * drop support for pytorch < 1.2 * return support for pytorch < 1.2
Showing
Please register or sign in to comment