Fix load balancing loss func for mixtral (#28256)
* Correct the implementation of auxiliary loss of mixtrtal
* correct the implementation of auxiliary loss of mixtrtal
* Implement a simpler calculation method
---------
Co-authored-by:
zhangliangxu3 <zhangliangxu3@jd.com>
Showing
Please register or sign in to comment