-
bhargav-patel-29 authored
[Bugfix] Fix mismatch between global and local attention heads in tensor-parallel mode for param2moe model (#39707) Signed-off-by:
bhargav-patel-29 <bhargav.patel@tihiitb.org> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
f7e62e3d