• Thorsten Kurth's avatar
    Tkurth/distributed memory reduction (#43) · 78365cb9
    Thorsten Kurth authored
    * updating amp to new torch.amp
    * using amp autocrats to FP32 for disco convolution kernels
    * implemented reduce_scatter routines but disabled those because of memory fluctuations which can cause OOM on big networks
    78365cb9
convolution.py 22.5 KB