[Bug] Fix routing bias dtype for trtllm per-block fp8 moe (#38989)

Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

[Bug] Fix routing bias dtype for trtllm per-block fp8 moe (#38989)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
92fbec39 · Wei Zhao · GitHub · 2f41d6c0 · 92fbec39
Unverified Commit 92fbec39 authored Apr 08, 2026 by Wei Zhao Committed by GitHub Apr 08, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 5 additions and 0 deletions

vllm/model_executor/layers/fused_moe/experts/trtllm_fp8_moe.py ...model_executor/layers/fused_moe/experts/trtllm_fp8_moe.py +5 -0

No files found.
--- a/vllm/model_executor/layers/fused_moe/experts/trtllm_fp8_moe.py
+++ b/vllm/model_executor/layers/fused_moe/experts/trtllm_fp8_moe.py
@@ -358,6 +358,11 @@ class TrtLlmFp8ExpertsMonolithic(TrtLlmFp8ExpertsBase, mk.FusedMoEExpertsMonolit
        if self.routing_method_type == RoutingMethodType.DeepSeekV3:
            router_logits = router_logits.to(torch.float32)
+        # Currently FI requires bfloat16 routing bias.
+        # https://github.com/flashinfer-ai/flashinfer/issues/2909
+        if e_score_correction_bias is not None:
+            e_score_correction_bias = e_score_correction_bias.to(torch.bfloat16)
        is_mxfp8 = self.quant_config.block_shape == [1, 32]
        if is_mxfp8:
            fp8_quant_type = Fp8QuantizationType.MxFp8