[CI/Build] Conditionally register cutlass_fp4_group_mm to fix building on Hopper (#26138)

Signed-off-by: mgoin <mgoin64@gmail.com>

[CI/Build] Conditionally register cutlass_fp4_group_mm to fix building on Hopper (#26138)
Signed-off-by: mgoin <mgoin64@gmail.com>
5d5146ee · Michael Goin · GitHub · 2aaa4238 · 5d5146ee · 5d5146ee
Unverified Commit 5d5146ee authored Oct 02, 2025 by Michael Goin Committed by GitHub Oct 02, 2025
Show whitespace changes
Inline Side-by-side

Showing with 7 additions and 1 deletion

csrc/quantization/fp4/nvfp4_blockwise_moe_kernel.cu csrc/quantization/fp4/nvfp4_blockwise_moe_kernel.cu +6 -0

csrc/torch_bindings.cpp csrc/torch_bindings.cpp +1 -1

No files found.
--- a/csrc/quantization/fp4/nvfp4_blockwise_moe_kernel.cu
+++ b/csrc/quantization/fp4/nvfp4_blockwise_moe_kernel.cu
@@ -14,6 +14,8 @@
 * limitations under the License.
 */

+#include "core/registration.h"
+
 #include <torch/all.h>
 #include <cutlass/arch/arch.h>

@@ -418,3 +420,7 @@ void cutlass_fp4_group_mm(
      "12.8 or above.");
 #endif
 }
+
+TORCH_LIBRARY_IMPL_EXPAND(TORCH_EXTENSION_NAME, CUDA, m) {
+  m.impl("cutlass_fp4_group_mm", &cutlass_fp4_group_mm);
+}
--- a/csrc/torch_bindings.cpp
+++ b/csrc/torch_bindings.cpp
@@ -397,7 +397,7 @@ TORCH_LIBRARY_EXPAND(TORCH_EXTENSION_NAME, ops) {
      " Tensor a_blockscale, Tensor b_blockscales, Tensor alphas,"
      " Tensor problem_sizes, Tensor expert_offsets, Tensor sf_offsets) -> ()",
      {stride_tag});
-  ops.impl("cutlass_fp4_group_mm", torch::kCUDA, &cutlass_fp4_group_mm);
+  // conditionally compiled so impl registration is in source file

  // CUTLASS w8a8 GEMM, supporting symmetric per-tensor or per-row/column
  // quantization, as well as bias