[Kernel] Enable FP8 Cutlass for Ada Lovelace (#6950)

Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

[Kernel] Enable FP8 Cutlass for Ada Lovelace (#6950)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
93548eb3 · Varun Sundar Rabindranath · GitHub · 460c1884 · 93548eb3
Unverified Commit 93548eb3 authored Jul 31, 2024 by Varun Sundar Rabindranath Committed by GitHub Jul 31, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 7 deletions

csrc/quantization/cutlass_w8a8/scaled_mm_entry.cu csrc/quantization/cutlass_w8a8/scaled_mm_entry.cu +1 -7

No files found.
--- a/csrc/quantization/cutlass_w8a8/scaled_mm_entry.cu
+++ b/csrc/quantization/cutlass_w8a8/scaled_mm_entry.cu
@@ -38,13 +38,7 @@ bool cutlass_scaled_mm_supports_fp8(int64_t cuda_device_capability) {
  if (cuda_device_capability >= 90) {
    return CUDA_VERSION >= 12000;
  } else if (cuda_device_capability >= 89) {
-    // CUTLASS Kernels have not been tuned for Ada Lovelace systems
+    return CUDA_VERSION >= 12040;
-    // and are slower than torch.mm. Return false unconditionally in this case.
-    return false;
-    // Once the CUTLASS kernels have been optimized for Lovelace systems,
-    // use the following check:
-    // return CUDA_VERSION >= 12040;
  }
 #endif