Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
d31d48b3
Unverified
Commit
d31d48b3
authored
Nov 03, 2025
by
b8zhong
Committed by
GitHub
Nov 03, 2025
Browse files
update usage of `trtllm_fp8_per_tensor_scale_moe` (#12569)
parent
88342607
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
2 additions
and
2 deletions
+2
-2
docs/advanced_features/attention_backend.md
docs/advanced_features/attention_backend.md
+1
-1
python/sglang/srt/layers/quantization/modelopt_quant.py
python/sglang/srt/layers/quantization/modelopt_quant.py
+1
-1
No files found.
docs/advanced_features/attention_backend.md
View file @
d31d48b3
...
...
@@ -21,7 +21,7 @@ The support matrix is split into two parts: MHA (standard attention) and MLA (mu
|
**Triton**
| ❌ | ❌ | ✅ | ✅ | ✅ | ✅ |
|
**Torch Native (SDPA)**
| ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
**FlexAttention (PyTorch)**
| ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
**TRTLLM MHA**
| 16, 32 or 64 | ✅ | ✅ | ❌ |
❌
| ❌ |
|
**TRTLLM MHA**
| 16, 32 or 64 | ✅ | ✅ | ❌ |
✅
| ❌ |
|
**Dual Chunk FlashAttention**
| ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
|
**AITER (ROCm)**
| ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
|
**Wave (ROCm)**
| ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
...
...
python/sglang/srt/layers/quantization/modelopt_quant.py
View file @
d31d48b3
...
...
@@ -689,7 +689,7 @@ class ModelOptFp8MoEMethod(FusedMoEMethodBase):
else
1.0
),
use_routing_scales_on_input
=
use_routing_scales_on_input
,
tile_tokens_dim
=
8
,
# TODO(brayden): use the FI tile calculation
tile_tokens_dim
=
None
,
routing_method_type
=
routing_method_type
,
)
sm
.
tag
(
output
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment