Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
d6fc629f
Unverified
Commit
d6fc629f
authored
Apr 04, 2025
by
bnellnm
Committed by
GitHub
Apr 04, 2025
Browse files
[Kernel][Minor] Re-fuse triton moe weight application (#16071)
Signed-off-by:
Bill Nell
<
bnell@redhat.com
>
parent
af51d80f
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
18 additions
and
24 deletions
+18
-24
vllm/model_executor/layers/fused_moe/fused_moe.py
vllm/model_executor/layers/fused_moe/fused_moe.py
+18
-24
No files found.
vllm/model_executor/layers/fused_moe/fused_moe.py
View file @
d6fc629f
...
...
@@ -1297,30 +1297,24 @@ def fused_experts_impl(hidden_states: torch.Tensor,
qintermediate_cache2
=
intermediate_cache2
a2q_scale
=
a2_scale
invoke_fused_moe_kernel
(
qintermediate_cache2
,
w2
,
intermediate_cache3
,
a2q_scale
,
w2_scale
,
w2_zp
,
curr_topk_weights
,
sorted_token_ids
,
expert_ids
,
num_tokens_post_padded
,
False
,
#True,
1
,
config
,
compute_type
=
compute_type
,
use_fp8_w8a8
=
use_fp8_w8a8
,
use_int8_w8a16
=
use_int8_w8a16
,
use_int4_w4a16
=
use_int4_w4a16
,
block_shape
=
block_shape
)
if
True
:
intermediate_cache3
=
intermediate_cache3
.
view
(
-
1
,
top_k_num
,
K
)
intermediate_cache3
.
mul_
(
curr_topk_weights
.
view
(
tokens_in_chunk
,
-
1
,
1
))
invoke_fused_moe_kernel
(
qintermediate_cache2
,
w2
,
intermediate_cache3
,
a2q_scale
,
w2_scale
,
w2_zp
,
curr_topk_weights
,
sorted_token_ids
,
expert_ids
,
num_tokens_post_padded
,
True
,
1
,
config
,
compute_type
=
compute_type
,
use_fp8_w8a8
=
use_fp8_w8a8
,
use_int8_w8a16
=
use_int8_w8a16
,
use_int4_w4a16
=
use_int4_w4a16
,
block_shape
=
block_shape
)
ops
.
moe_sum
(
intermediate_cache3
.
view
(
*
intermediate_cache3
.
shape
),
out_hidden_states
[
begin_chunk_idx
:
end_chunk_idx
])
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment