Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
4470ee2f
Unverified
Commit
4470ee2f
authored
Dec 04, 2025
by
Alexander Matveev
Committed by
GitHub
Dec 05, 2025
Browse files
[Perf] Enable separate shared_experts stream only for CUDA (#30085)
Signed-off-by:
Alexander Matveev
<
amatveev@redhat.com
>
parent
690cc3ef
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
2 additions
and
1 deletion
+2
-1
vllm/model_executor/layers/fused_moe/layer.py
vllm/model_executor/layers/fused_moe/layer.py
+2
-1
No files found.
vllm/model_executor/layers/fused_moe/layer.py
View file @
4470ee2f
...
@@ -863,7 +863,8 @@ class FusedMoE(CustomOp):
...
@@ -863,7 +863,8 @@ class FusedMoE(CustomOp):
use_chunked_impl
:
bool
,
use_chunked_impl
:
bool
,
)
->
tuple
[
bool
,
torch
.
Tensor
|
None
]:
)
->
tuple
[
bool
,
torch
.
Tensor
|
None
]:
use_shared_experts_stream
=
(
use_shared_experts_stream
=
(
has_separate_shared_experts
current_platform
.
is_cuda
()
and
has_separate_shared_experts
and
not
use_chunked_impl
and
not
use_chunked_impl
and
self
.
shared_experts_stream
is
not
None
and
self
.
shared_experts_stream
is
not
None
and
(
and
(
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment