[Attention] add `_cudagraph_support` for linear attention (#28934)

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

[Attention] add `_cudagraph_support` for linear attention (#28934)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
81db702e · Jiangyun Zhu · GitHub · 92effb07 · 81db702e
Unverified Commit 81db702e authored Nov 25, 2025 by Jiangyun Zhu Committed by GitHub Nov 25, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 0 deletions

vllm/v1/attention/backends/linear_attn.py vllm/v1/attention/backends/linear_attn.py +3 -0

No files found.
--- a/vllm/v1/attention/backends/linear_attn.py
+++ b/vllm/v1/attention/backends/linear_attn.py
@@ -7,6 +7,7 @@ import torch
 from vllm.attention.backends.abstract import AttentionBackend
 from vllm.config import VllmConfig
 from vllm.v1.attention.backends.utils import (
+    AttentionCGSupport,
    AttentionMetadataBuilder,
    CommonAttentionMetadata,
    split_decodes_and_prefills,
@@ -35,6 +36,8 @@ class LinearAttentionMetadata:
 class LinearAttentionMetadataBuilder(AttentionMetadataBuilder[LinearAttentionMetadata]):
    reorder_batch_threshold: int = 1
+    _cudagraph_support = AttentionCGSupport.UNIFORM_SINGLE_TOKEN_DECODE
    def __init__(
        self,
        kv_cache_spec: AttentionSpec,