Unverified Commit de509ae8 authored by Kaixi Hou's avatar Kaixi Hou Committed by GitHub
Browse files

[NVIDIA] Explicitly disable shuffled weights for flashinfer blockscale moe fp8 kernels (#21411)


Signed-off-by: default avatarkaixih <kaixih@nvidia.com>
parent e7c4f9ee
...@@ -1127,6 +1127,7 @@ def flashinfer_fused_moe_blockscale_fp8( ...@@ -1127,6 +1127,7 @@ def flashinfer_fused_moe_blockscale_fp8(
tile_tokens_dim=_get_tile_tokens_dim(x.shape[0], top_k, tile_tokens_dim=_get_tile_tokens_dim(x.shape[0], top_k,
global_num_experts), global_num_experts),
routing_method_type=2, # DeepSeek-styled routing method routing_method_type=2, # DeepSeek-styled routing method
use_shuffled_weight=False,
) )
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment