Update note comment for flashinfer attention warmup (#30711)

Signed-off-by: mgoin <mgoin64@gmail.com>

Update note comment for flashinfer attention warmup (#30711)
Signed-off-by: mgoin <mgoin64@gmail.com>
d4d27517 · Michael Goin · GitHub · 009a7738 · d4d27517
Unverified Commit d4d27517 authored Dec 17, 2025 by Michael Goin Committed by GitHub Dec 16, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 4 deletions

vllm/model_executor/warmup/kernel_warmup.py vllm/model_executor/warmup/kernel_warmup.py +3 -4

No files found.
--- a/vllm/model_executor/warmup/kernel_warmup.py
+++ b/vllm/model_executor/warmup/kernel_warmup.py
@@ -49,13 +49,12 @@ def kernel_warmup(worker: "Worker"):
        except NotImplementedError:
            return False
-    # NOTE: we add check for empty attn_groups to avoid errors when
-    # deploying models such as E instances and encoder-only models.
-    # As for those models, worker.model_runner.attn_groups is empty.
-    # This change is made during EPD feature development.
    if (
        not worker.model_runner.is_pooling_model
        and worker.model_runner.attn_groups
+        # NOTE: This should be `any` instead of `all` but other hybrid attention
+        # backends don't support this dummy run. Once we remove
+        # `build_for_cudagraph_capture`, we can change it to `any`.
        and all(
            _is_flashinfer_backend(group.backend)
            for groups in worker.model_runner.attn_groups