[BugFix] Add block_size validation for mamba cache align mode (#34445)

Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>

[BugFix] Add block_size validation for mamba cache align mode (#34445)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
6f019e6e · Harry Huang · GitHub · d707678d · 6f019e6e
Unverified Commit 6f019e6e authored Feb 13, 2026 by Harry Huang Committed by GitHub Feb 12, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 9 additions and 0 deletions

vllm/config/vllm.py vllm/config/vllm.py +9 -0

No files found.
--- a/vllm/config/vllm.py
+++ b/vllm/config/vllm.py
@@ -1110,6 +1110,15 @@ class VllmConfig:
            self.scheduler_config.disable_hybrid_kv_cache_manager = False
        if self.cache_config.mamba_cache_mode == "align":
+            assert (
+                self.cache_config.block_size
+                <= self.scheduler_config.max_num_batched_tokens
+            ), (
+                "In Mamba cache align mode, block_size "
+                f"({self.cache_config.block_size}) must be <= "
+                "max_num_batched_tokens "
+                f"({self.scheduler_config.max_num_batched_tokens})."
+            )
            if self.scheduler_config.long_prefill_token_threshold > 0:
                assert (
                    self.scheduler_config.long_prefill_token_threshold