"tests/vscode:/vscode.git/clone" did not exist on "d5dbed7325d428ce1d8278f19d2030769169bb90"
Merge branch 'scale_qkt_exp_lr' into 'master'
Scale Q*K (query times key) by 1/layer-number and add exponential decay option See merge request ADLR/megatron-lm!27
Showing
Please register or sign in to comment