1. 24 Jun, 2025 1 commit
  2. 20 Jun, 2025 1 commit
    • Babak Hejazi's avatar
      Benchmark - Support autotuning in cublaslt gemm (#706) · 60b13256
      Babak Hejazi authored
      **Description**
      Enable autotuning as an opt-in mode when benchmarking cublasLt via
      `cublaslt_gemm`
      
      The implementation is based on
      https://github.com/NVIDIA/CUDALibrarySamples/blob/master/cuBLASLt/LtSgemmSimpleAutoTuning/sample_cublasLt_LtSgemmSimpleAutoTuning.cu
      
      The behavior of original benchmark command remains unchanged, e.g.:
      - `cublaslt_gemm -m 2048 -n 12288 -k 1536 -w10000 -i 1000 -t fp8e4m3`
      
      The new opt-in options are `-a` (for autotune) and `-I` (for autotune
      iterations, default is 50, same as the default for `-i`) and `-W` (for
      autotune warmups, default=20, same as the default for `-w`), e.g.:
      - `cublaslt_gemm -m 2048 -n 12288 -k 1536 -w 10000 -i 1000 -t fp8e4m3
      -a`
      - `cublaslt_gemm -m 2048 -n 12288 -k 1536 -w 10000 -i 1000 -t fp8e4m3 -a
      -I 10 -W 10`
      
      **Note:** This PR also changes the default `gemm_compute_type` for BF16
      and FP16 to `CUBLAS_COMPUTE_32F`.
      
      **Further observations:** 
      1. The support matrix of the `cublaslt_gemm` could be further extended
      in the future to support non-FP16 output as well for FP8 inputs.
      2. Currently, the input matrices are initialized with values of 1.0 and
      2.0 which makes them less demanding in terms of power. Another future
      extension could be to enable another fill mode for, say, uniform random
      numbers between -1 and 1.
      3. cuBLAS workspace recommendations are listed under
      https://docs.nvidia.com/cuda/cublas/#cublassetworkspace
      
      
      
      Update (June 10, 2025): verified using higher level test driver with
      these commands:
      
      1. inline:
      ```
      python3 -c "                                                                            
      from superbench.benchmarks import BenchmarkRegistry, Platform
      from superbench.common.utils import logger
      
      parameters = (
          '--num_warmup 10 --num_steps 50 '
          '--shapes 512,512,512 1024,1024,1024 --in_types fp16 fp32 '
          '--enable_autotune --num_warmup_autotune 20 --num_steps_autotune 50'
      )
      context = BenchmarkRegistry.create_benchmark_context(
          'cublaslt-gemm', platform=Platform.CUDA, parameters=parameters
      )
      benchmark = BenchmarkRegistry.launch_benchmark(context)
      logger.info('Result: {}'.format(benchmark.result))
      "
      ```
      
      2. newly added script: 
      `python3 examples/benchmarks/cublaslt_function.py`
      
      ---------
      Co-authored-by: default avatarBabak Hejazi <babakh@nvidia.com>
      60b13256
  3. 20 Nov, 2023 1 commit
  4. 14 Apr, 2023 1 commit
    • Yifan Xiong's avatar
      Release - SuperBench v0.8.0 (#517) · 51761b3a
      Yifan Xiong authored
      
      
      **Description**
      
      Cherry-pick bug fixes from v0.8.0 to main.
      
      **Major Revisions**
      
      * Monitor - Fix the cgroup version checking logic (#502)
      * Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503)
      * Fix wrong torch usage in communication wrapper for Distributed
      Inference Benchmark (#505)
      * Analyzer: Fix bug in python3.8 due to pandas api change (#504)
      * Bug - Fix bug to get metric from cmd when error happens (#506)
      * Monitor - Collect realtime GPU power when benchmarking (#507)
      * Add num_workers argument in model benchmark (#511)
      * Remove unreachable condition when write host list (#512)
      * Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513)
      * Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515)
      * Docs - Upgrade version and release note (#508)
      Co-authored-by: default avatarguoshzhao <guzhao@microsoft.com>
      Co-authored-by: default avatarZiyue Yang <ziyyang@microsoft.com>
      Co-authored-by: default avatarYuting Jiang <yutingjiang@microsoft.com>
      51761b3a
  5. 20 Mar, 2023 1 commit
  6. 03 Jan, 2023 1 commit