Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
198b9056
Unverified
Commit
198b9056
authored
May 14, 2025
by
Hubert Lu
Committed by
GitHub
May 14, 2025
Browse files
[AMD] Fix Llama 4 Scout and Maverick accuracy issues on MI300X (#6274)
parent
73eb67c0
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
13 additions
and
0 deletions
+13
-0
python/sglang/srt/layers/moe/fused_moe_triton/layer.py
python/sglang/srt/layers/moe/fused_moe_triton/layer.py
+13
-0
No files found.
python/sglang/srt/layers/moe/fused_moe_triton/layer.py
View file @
198b9056
...
...
@@ -186,6 +186,19 @@ class UnquantizedFusedMoEMethod(FusedMoEMethodBase, CustomOp):
if
_is_hip
and
get_bool_env_var
(
"SGLANG_AITER_MOE"
):
assert
not
no_combine
,
"unsupported"
if
apply_router_weight_on_input
:
assert
(
topk_weights
.
dim
()
==
2
),
"`topk_weights` should be in shape (num_tokens, topk)"
_
,
topk
=
topk_weights
.
shape
assert
(
topk
==
1
),
"Only support topk=1 when `apply_router_weight_on_input` is True"
x
=
x
*
topk_weights
.
to
(
x
.
dtype
)
topk_weights
=
torch
.
ones_like
(
topk_weights
,
dtype
=
torch
.
float32
)
# topk_weights must be FP32 (float32)
return
ck_moe_2stages
(
x
,
layer
.
w13_weight
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment