-
Thorsten Kurth authored
* updating amp to new torch.amp * using amp autocrats to FP32 for disco convolution kernels * implemented reduce_scatter routines but disabled those because of memory fluctuations which can cause OOM on big networks
78365cb9
* updating amp to new torch.amp * using amp autocrats to FP32 for disco convolution kernels * implemented reduce_scatter routines but disabled those because of memory fluctuations which can cause OOM on big networks