Bug: apply final_hidden_states*=self.routed_scaling_factor at MoE lay… (#8511)
Co-authored-by:
Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Showing
Please register or sign in to comment
Co-authored-by:
Cheng Wan <54331508+ch-wan@users.noreply.github.com>