• Babak Hejazi's avatar
    Benchmark - Support autotuning in cublaslt gemm (#706) · 60b13256
    Babak Hejazi authored
    **Description**
    Enable autotuning as an opt-in mode when benchmarking cublasLt via
    `cublaslt_gemm`
    
    The implementation is based on
    https://github.com/NVIDIA/CUDALibrarySamples/blob/master/cuBLASLt/LtSgemmSimpleAutoTuning/sample_cublasLt_LtSgemmSimpleAutoTuning.cu
    
    The behavior of original benchmark command remains unchanged, e.g.:
    - `cublaslt_gemm -m 2048 -n 12288 -k 1536 -w10000 -i 1000 -t fp8e4m3`
    
    The new opt-in options are `-a` (for autotune) and `-I` (for autotune
    iterations, default is 50, same as the default for `-i`) and `-W` (for
    autotune warmups, default=20, same as the default for `-w`), e.g.:
    - `cublaslt_gemm -m 2048 -n 12288 -k 1536 -w 10000 -i 1000 -t fp8e4m3
    -a`
    - `cublaslt_gemm -m 2048 -n 12288 -k 1536 -w 10000 -i 1000 -t fp8e4m3 -a
    -I 10 -W 10`
    
    **Note:** This PR also changes the default `gemm_compute_type` for BF16
    and FP16 to `CUBLAS_COMPUTE_32F`.
    
    **Further observations:** 
    1. The support matrix of the `cublaslt_gemm` could be furt...
    60b13256
cublaslt_function.py 2.43 KB