Merge branch 't5_pipeline_parallelism_grad_norm_fix' into 't5_pipeline_parallelism'
Fix grad norm computation See merge request ADLR/megatron-lm!296
Showing
Please register or sign in to comment
Fix grad norm computation See merge request ADLR/megatron-lm!296