* Multiply lr scheduler steps by `num_processes`. * Stop multiplying steps by gradient accumulation.
Attach a file by drag & drop or click to upload