"...git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "c41f2bad69923da6f23d76e47639ad350206d757"
Exclude the load balancing loss of padding tokens in Mixtral-8x7B (#28517)
* fix the function load_balancing_loss_func in Mixtral_Moe to include attention_mask * format code using black and ruff * skip computing mask if attention_mask=None * add tests for load balancing loss Mixtral-Moe * fix assert loss is different in mixtral_test * fix pad_leng * use assertNotAlmostEqual and print to debug * remove print for debug * minor updates * reduce rtol and atol
Showing
Please register or sign in to comment