[NVIDIA] Explicitly disable shuffled weights for flashinfer blockscale moe fp8 kernels (#21411)

Signed-off-by: kaixih <kaixih@nvidia.com>

[NVIDIA] Explicitly disable shuffled weights for flashinfer blockscale moe fp8 kernels (#21411)
Signed-off-by: kaixih <kaixih@nvidia.com>
de509ae8 · Kaixi Hou · GitHub · e7c4f9ee · de509ae8
Unverified Commit de509ae8 authored Jul 26, 2025 by Kaixi Hou Committed by GitHub Jul 26, 2025
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 0 deletions

vllm/model_executor/layers/fused_moe/fused_moe.py vllm/model_executor/layers/fused_moe/fused_moe.py +1 -0

No files found.
--- a/vllm/model_executor/layers/fused_moe/fused_moe.py
+++ b/vllm/model_executor/layers/fused_moe/fused_moe.py
@@ -1127,6 +1127,7 @@ def flashinfer_fused_moe_blockscale_fp8(
        tile_tokens_dim=_get_tile_tokens_dim(x.shape[0], top_k,
                                             global_num_experts),
        routing_method_type=2,  # DeepSeek-styled routing method
+        use_shuffled_weight=False,
    )