Smp grad accum (#10488)
* Fix gradient accumulation for SM Model Parallelism * Style and divide loss by grad accum steps
Showing
Please register or sign in to comment
* Fix gradient accumulation for SM Model Parallelism * Style and divide loss by grad accum steps