Merge branch 'ckpt_transpose' into 'main'
Rework handling of older checkpoint's attention weight/bias ordering. See merge request ADLR/megatron-lm!219
Showing
Please register or sign in to comment
Rework handling of older checkpoint's attention weight/bias ordering. See merge request ADLR/megatron-lm!219