Benchmarks: micro benchmarks - change cublasLtMatmulDescCreate scaleType from...

Benchmarks: micro benchmarks - change cublasLtMatmulDescCreate scaleType from CUDA_R_32F to CUDA_R_16F in FP16 dist inference (#732) **Description** change cublasLtMatmulDescCreate scaleType from CUDA_R_32F to CUDA_R_16F in FP16 dist inference to fix cublaslt error.

Benchmarks: micro benchmarks - change cublasLtMatmulDescCreate scaleType from...
Benchmarks: micro benchmarks - change cublasLtMatmulDescCreate scaleType from CUDA_R_32F to CUDA_R_16F in FP16 dist inference (#732) **Description** change cublasLtMatmulDescCreate scaleType from CUDA_R_32F to CUDA_R_16F in FP16 dist inference to fix cublaslt error.
a7c4ed92 · Yuting Jiang · GitHub · 0b4311cd · a7c4ed92
Unverified Commit a7c4ed92 authored Sep 20, 2025 by Yuting Jiang Committed by GitHub Sep 19, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

superbench/benchmarks/micro_benchmarks/dist_inference_cpp/dist_inference.cu ...rks/micro_benchmarks/dist_inference_cpp/dist_inference.cu +2 -2

No files found.
--- a/superbench/benchmarks/micro_benchmarks/dist_inference_cpp/dist_inference.cu
+++ b/superbench/benchmarks/micro_benchmarks/dist_inference_cpp/dist_inference.cu
@@ -416,8 +416,8 @@ void TestModel(int64_t m, int64_t n, int64_t k, float alpha, float beta, int32_t
    CHECK_CUBLASLT_ERROR(cublasLtMatrixLayoutCreate(&matF, CUDA_R_16F, k, n, k));
    CHECK_CUBLASLT_ERROR(cublasLtMatrixLayoutCreate(&matG, CUDA_R_16F, k, n, k));

-    CHECK_CUBLASLT_ERROR(cublasLtMatmulDescCreate(&matmul1, CUBLAS_COMPUTE_16F, CUDA_R_32F));
-    CHECK_CUBLASLT_ERROR(cublasLtMatmulDescCreate(&matmul2, CUBLAS_COMPUTE_16F, CUDA_R_32F));
+    CHECK_CUBLASLT_ERROR(cublasLtMatmulDescCreate(&matmul1, CUBLAS_COMPUTE_16F, CUDA_R_16F));
+    CHECK_CUBLASLT_ERROR(cublasLtMatmulDescCreate(&matmul2, CUBLAS_COMPUTE_16F, CUDA_R_16F));

    cublasOperation_t trans = CUBLAS_OP_N;
    CHECK_CUBLASLT_ERROR(cublasLtMatmulDescSetAttribute(matmul1, CUBLASLT_MATMUL_DESC_TRANSA, &trans, sizeof(int32_t)));