Unverified Commit d31d48b3 authored by b8zhong's avatar b8zhong Committed by GitHub
Browse files

update usage of `trtllm_fp8_per_tensor_scale_moe` (#12569)

parent 88342607
......@@ -21,7 +21,7 @@ The support matrix is split into two parts: MHA (standard attention) and MLA (mu
| **Triton** | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ |
| **Torch Native (SDPA)** | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| **FlexAttention (PyTorch)** | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| **TRTLLM MHA** | 16, 32 or 64 | ✅ | ✅ | ❌ | | ❌ |
| **TRTLLM MHA** | 16, 32 or 64 | ✅ | ✅ | ❌ | | ❌ |
| **Dual Chunk FlashAttention** | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| **AITER (ROCm)** | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
| **Wave (ROCm)** | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
......
......@@ -689,7 +689,7 @@ class ModelOptFp8MoEMethod(FusedMoEMethodBase):
else 1.0
),
use_routing_scales_on_input=use_routing_scales_on_input,
tile_tokens_dim=8, # TODO(brayden): use the FI tile calculation
tile_tokens_dim=None,
routing_method_type=routing_method_type,
)
sm.tag(output)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment