update usage of `trtllm_fp8_per_tensor_scale_moe` (#12569)

d31d48b3 · b8zhong · GitHub · 88342607 · d31d48b3 · d31d48b3
Unverified Commit d31d48b3 authored Nov 03, 2025 by b8zhong Committed by GitHub Nov 03, 2025
Showing with 2 additions and 2 deletions

docs/advanced_features/attention_backend.md docs/advanced_features/attention_backend.md +1 -1

python/sglang/srt/layers/quantization/modelopt_quant.py python/sglang/srt/layers/quantization/modelopt_quant.py +1 -1

No files found.
--- a/docs/advanced_features/attention_backend.md
+++ b/docs/advanced_features/attention_backend.md
@@ -21,7 +21,7 @@ The support matrix is split into two parts: MHA (standard attention) and MLA (mu
 | **Triton**                      | ❌                          | ❌               | ✅              | ✅              | ✅                 | ✅             |
 | **Torch Native (SDPA)**         | ❌                          | ❌               | ❌              | ❌              | ❌                 | ❌             |
 | **FlexAttention (PyTorch)**     | ❌                          | ❌               | ❌              | ❌              | ❌                 | ❌             |
-| **TRTLLM MHA**                  | 16, 32 or 64                | ✅               | ✅              | ❌              | ❌                 | ❌             |
+| **TRTLLM MHA**                  | 16, 32 or 64                | ✅               | ✅              | ❌              | ✅                 | ❌             |
 | **Dual Chunk FlashAttention**   | ✅                          | ❌               | ❌              | ❌              | ❌                 | ❌             |
 | **AITER (ROCm)**                | ✅                          | ❌               | ✅              | ✅              | ❌                 | ❌             |
 | **Wave (ROCm)**                 | ✅                          | ❌               | ❌              | ❌              | ❌                 | ❌             |

--- a/python/sglang/srt/layers/quantization/modelopt_quant.py
+++ b/python/sglang/srt/layers/quantization/modelopt_quant.py
@@ -689,7 +689,7 @@ class ModelOptFp8MoEMethod(FusedMoEMethodBase):
                        else 1.0
                    ),
                    use_routing_scales_on_input=use_routing_scales_on_input,
-                    tile_tokens_dim=8,  # TODO(brayden): use the FI tile calculation
+                    tile_tokens_dim=None,
                    routing_method_type=routing_method_type,
                )
                sm.tag(output)