[Bug Fix] Fix the support check for FP8 CUTLASS (#5352)

Bug description: With torch 2.4.0.dev20240603+cu121, cutlass_fp8_supported outputs False, and the (capability, version) before the comparison is (90, 11111111112) This PR fixes the support check for FP8 CUTLASS ( cutlass_fp8_supported) which was introduced in https://github.com/vllm-project/vllm/pull/5183.

[Bug Fix] Fix the support check for FP8 CUTLASS (#5352)
Bug description: With torch 2.4.0.dev20240603+cu121, cutlass_fp8_supported outputs False, and the (capability, version) before the comparison is (90, 11111111112) This PR fixes the support check for FP8 CUTLASS ( cutlass_fp8_supported) which was introduced in https://github.com/vllm-project/vllm/pull/5183.
e69ded7d · Cheng Li · GitHub · 767c727a · e69ded7d
Unverified Commit e69ded7d authored Jun 07, 2024 by Cheng Li Committed by GitHub Jun 08, 2024
Show whitespace changes
Inline Side-by-side

Showing with 7 additions and 7 deletions

vllm/model_executor/layers/quantization/fp8.py vllm/model_executor/layers/quantization/fp8.py +7 -7

No files found.
--- a/vllm/model_executor/layers/quantization/fp8.py
+++ b/vllm/model_executor/layers/quantization/fp8.py
@@ -20,16 +20,16 @@ logger = init_logger(__name__)
 def cutlass_fp8_supported() -> bool:
    capability = torch.cuda.get_device_capability()
    capability = capability[0] * 10 + capability[1]
-    version = torch.version.cuda
-    version = version[0] * 10 + version[1]
+    major, minor = torch.version.cuda.split(".")
+    version = int(major) * 10 + int(minor)

    # CUTLASS FP8 kernels need at least
    #   CUDA 12.0 on SM90 systems (Hopper)
    #   CUDA 12.4 on SM89 systems (Lovelace)
    gpu_is_supported = False
-    if capability >= 900:
+    if capability >= 90:
        gpu_is_supported = version > 120
-    elif capability >= 890:
+    elif capability >= 89:
        gpu_is_supported = version > 124

    return gpu_is_supported