do not scale the initial global step by gradient accumulation steps when loading from checkpoint (#3506)
Attach a file by drag & drop or click to upload