Unverified Commit 5e00b561 authored by Woosuk Kwon's avatar Woosuk Kwon Committed by GitHub
Browse files

[Model Runner V2] Do not error on attention backends (#32820)


Signed-off-by: default avatarWoosuk Kwon <woosuk.kwon@berkeley.edu>
parent 408195ec
...@@ -247,16 +247,6 @@ class GPUModelRunner(LoRAModelRunnerMixin, KVConnectorModelRunnerMixin): ...@@ -247,16 +247,6 @@ class GPUModelRunner(LoRAModelRunnerMixin, KVConnectorModelRunnerMixin):
self.block_tables, self.block_tables,
) )
# TODO(woosuk): Support other backends.
supported_backends = ("FLASH_ATTN", "FLASHINFER", "FLASHINFER_MLA")
for backend in self.attn_backends.values():
backend_name = backend.get_name()
if backend_name not in supported_backends:
raise NotImplementedError(
f"The {backend_name} attention backend is not supported yet. "
f"Supported backends are: {supported_backends}."
)
self.kv_caches: list[torch.Tensor] = [] self.kv_caches: list[torch.Tensor] = []
init_kv_cache( init_kv_cache(
self.kv_caches, self.kv_caches,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment