Flag to call empty_cache() each iteration, to reduce fragmentation See merge request ADLR/megatron-lm!306
Attach a file by drag & drop or click to upload