Support Tensorrt-LLM MoE fp4 for low-latency (#21331)
Signed-off-by:Shu Wang <shuw@nvidia.com> Signed-off-by:
Po-Han Huang <pohanh@nvidia.com> Signed-off-by:
Shu Wang. <shuw@nvidia.com> Signed-off-by:
XIn Li <xinli@nvidia.com> Co-authored-by:
XIn Li <xinli@nvidia.com>
Showing
This diff is collapsed.
Please register or sign in to comment