Merge branch 'lmcafee/byte-buffer' into 'main'
Perform distributed optimizer's all-gather in param dtype (instead of grad dtype) See merge request ADLR/megatron-lm!448
Showing
Please register or sign in to comment
Perform distributed optimizer's all-gather in param dtype (instead of grad dtype) See merge request ADLR/megatron-lm!448