• Yuting Jiang's avatar
    Benchmarks: Micro benchmark - add ncu profile support in cublaslt-gemm (#740) · f6e65a98
    Yuting Jiang authored
    **Description**
    This PR adds NCU (NVIDIA Nsight Compute) profiling support to the
    cublaslt-gemm micro benchmark, enabling detailed kernel analysis
    including DRAM throughput, compute throughput, and launch arguments.
    
    **Major Revision**
    - Add --enable_ncu_profiling and --profiling_metrics for ncu profiling
    - Modifies command execution to use NCU when profiling is enabled
    - Updates result parsing to handle both standard and NCU profiled output
    formats
    f6e65a98
cublaslt_function.py 7.13 KB