[CI Failure] fix models/language/pooling/test_auto_prefix_cache_support.py (#24636)

Signed-off-by: wang.yuqi <noooop@126.com>

[CI Failure] fix models/language/pooling/test_auto_prefix_cache_support.py (#24636)
Signed-off-by: wang.yuqi <noooop@126.com>
25bb9e8c · wang.yuqi · GitHub · a1213fae · 25bb9e8c
Unverified Commit 25bb9e8c authored Sep 11, 2025 by wang.yuqi Committed by GitHub Sep 11, 2025
Show whitespace changes
Inline Side-by-side

Showing with 4 additions and 0 deletions

vllm/config/__init__.py vllm/config/__init__.py +4 -0

No files found.
--- a/vllm/config/__init__.py
+++ b/vllm/config/__init__.py
@@ -3558,6 +3558,10 @@ class VllmConfig:
                    disable_chunked_prefill_reasons.append(
                        "Only \"last\" pooling supports chunked "
                        "prefill and prefix caching; disabling both.")
+                if not getattr(self.model_config.hf_config, "is_causal", True):
+                    disable_chunked_prefill_reasons.append(
+                        "Only models using causal attention supports chunked "
+                        "prefill and prefix caching; disabling both.")
            elif self.model_config.is_encoder_decoder:
                self.scheduler_config.max_num_encoder_input_tokens = \
                    MULTIMODAL_REGISTRY.get_encdec_max_encoder_len(self.model_config)