[Model Runner V2] Do not error on attention backends (#32820)

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

[Model Runner V2] Do not error on attention backends (#32820)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
5e00b561 · Woosuk Kwon · GitHub · 408195ec · 5e00b561
Unverified Commit 5e00b561 authored Jan 21, 2026 by Woosuk Kwon Committed by GitHub Jan 21, 2026
Show whitespace changes
Inline Side-by-side

Showing with 0 additions and 10 deletions

vllm/v1/worker/gpu/model_runner.py vllm/v1/worker/gpu/model_runner.py +0 -10

No files found.
--- a/vllm/v1/worker/gpu/model_runner.py
+++ b/vllm/v1/worker/gpu/model_runner.py
@@ -247,16 +247,6 @@ class GPUModelRunner(LoRAModelRunnerMixin, KVConnectorModelRunnerMixin):
                self.block_tables,
            )

-        # TODO(woosuk): Support other backends.
-        supported_backends = ("FLASH_ATTN", "FLASHINFER", "FLASHINFER_MLA")
-        for backend in self.attn_backends.values():
-            backend_name = backend.get_name()
-            if backend_name not in supported_backends:
-                raise NotImplementedError(
-                    f"The {backend_name} attention backend is not supported yet. "
-                    f"Supported backends are: {supported_backends}."
-                )
-
        self.kv_caches: list[torch.Tensor] = []
        init_kv_cache(
            self.kv_caches,