[Perf] Enable separate shared_experts stream only for CUDA (#30085)

Signed-off-by: Alexander Matveev <amatveev@redhat.com>

[Perf] Enable separate shared_experts stream only for CUDA (#30085)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
4470ee2f · Alexander Matveev · GitHub · 690cc3ef · 4470ee2f
Unverified Commit 4470ee2f authored Dec 04, 2025 by Alexander Matveev Committed by GitHub Dec 05, 2025
Show whitespace changes
Inline Side-by-side

Showing with 2 additions and 1 deletion

vllm/model_executor/layers/fused_moe/layer.py vllm/model_executor/layers/fused_moe/layer.py +2 -1

No files found.
--- a/vllm/model_executor/layers/fused_moe/layer.py
+++ b/vllm/model_executor/layers/fused_moe/layer.py
@@ -863,7 +863,8 @@ class FusedMoE(CustomOp):
        use_chunked_impl: bool,
    ) -> tuple[bool, torch.Tensor | None]:
        use_shared_experts_stream = (
-            has_separate_shared_experts
+            current_platform.is_cuda()
+            and has_separate_shared_experts
            and not use_chunked_impl
            and self.shared_experts_stream is not None
            and (