Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
43287082
Unverified
Commit
43287082
authored
Jul 06, 2025
by
Lucia Fang
Committed by
GitHub
Jul 06, 2025
Browse files
[Bugfix] Fix missing per_act_token parameter in compressed_tensors_moe (#20509)
Signed-off-by:
Lu Fang
<
fanglu@fb.com
>
parent
f73d02aa
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
1 deletion
+4
-1
vllm/model_executor/layers/fused_moe/cutlass_moe.py
vllm/model_executor/layers/fused_moe/cutlass_moe.py
+4
-1
No files found.
vllm/model_executor/layers/fused_moe/cutlass_moe.py
View file @
43287082
...
...
@@ -322,7 +322,7 @@ def cutlass_moe_fp8(
topk_ids
:
torch
.
Tensor
,
w1_scale
:
torch
.
Tensor
,
w2_scale
:
torch
.
Tensor
,
per_act_token
:
bool
,
per_act_token
:
Optional
[
bool
]
=
None
,
activation
:
str
=
"silu"
,
a1_scale
:
Optional
[
torch
.
Tensor
]
=
None
,
a2_scale
:
Optional
[
torch
.
Tensor
]
=
None
,
...
...
@@ -366,6 +366,9 @@ def cutlass_moe_fp8(
Returns:
- torch.Tensor: The fp16 output tensor after applying the MoE layer.
"""
if
per_act_token
is
None
:
per_act_token
=
a1_scale
.
numel
()
!=
1
if
a1_scale
is
not
None
else
(
a2_scale
.
numel
()
!=
1
if
a2_scale
is
not
None
else
False
)
per_out_ch
=
w1_scale
.
numel
()
!=
w1_q
.
size
(
0
)
num_experts
=
global_num_experts
if
global_num_experts
!=
-
1
else
w1_q
.
size
(
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment