Tkurth/distributed memory reduction (#43)
* updating amp to new torch.amp * using amp autocrats to FP32 for disco convolution kernels * implemented reduce_scatter routines but disabled those because of memory fluctuations which can cause OOM on big networks
Showing
Please register or sign in to comment