Fix FP8 KV-cache condition (#2611)

Update kv_cache.py

Fix FP8 KV-cache condition (#2611)
Update kv_cache.py
0da4df4b · Florian Zimmermeister · GitHub · 2358c2bb · 0da4df4b
Unverified Commit 0da4df4b authored Oct 07, 2024 by Florian Zimmermeister Committed by GitHub Oct 07, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

server/text_generation_server/layers/attention/kv_cache.py server/text_generation_server/layers/attention/kv_cache.py +2 -2

No files found.
--- a/server/text_generation_server/layers/attention/kv_cache.py
+++ b/server/text_generation_server/layers/attention/kv_cache.py
@@ -26,8 +26,8 @@ class KVCache:
        if (
            dtype == torch.float8_e5m2
-            and ATTENTION != "flashinfer"
+            and (ATTENTION != "flashinfer"
-            and SYSTEM != "cuda"
+            or SYSTEM != "cuda")
        ):
            raise ValueError(
                "float8_e5m2 KV cache is currently only supported for flashinfer on CUDA"