"vscode:/vscode.git/clone" did not exist on "bef618adae322636c11194e1371e26f56e0da406"
  • Bartłomiej Kocot's avatar
    Add SplitK support into Batched GEMM V3 (#1729) · 4d8fce33
    Bartłomiej Kocot authored
    
    
    * add bmm api
    
    * add bf16 multi_d
    
    * add ckProfiler for bf16
    
    * add ckProfiler files
    
    * add more instance; fixed 64bit index issue
    
    * fixed naming
    
    * enabled batched Ds
    
    * use long_index for ds offsets
    
    * clean
    
    * add bmm fp8 ckProfiler
    
    * Update example/24_batched_gemm/batched_gemm_xdl_bf16_v3.cpp
    Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
    
    * Update example/24_batched_gemm/batched_gemm_xdl_fp8_rowwise_v3.cpp
    Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
    
    * Update example/24_batched_gemm/run_batched_gemm_example_rowwise.inc
    Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
    
    * Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn.hpp
    Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
    
    * Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_mem_v1_default_instance.cpp
    Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
    
    * Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_mem_v2_default_instance.cpp
    Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
    
    * Update profiler/src/profile_gemm_universal_batched.cpp
    Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
    
    * Update profiler/include/profiler/profile_gemm_universal_batched_impl.hpp
    Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
    
    * clean
    
    * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
    
    * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
    
    * Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_comp_default_instance.cpp
    
    * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
    
    * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
    
    * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
    
    * refactor batch offset func
    
    * add splitk suppport into bmm_v3
    
    * clean
    
    * clean
    
    * format
    
    * fixed
    
    * fix
    
    ---------
    Co-authored-by: default avatarJing Zhang <jizhan@fb.com>
    Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
    4d8fce33
profile_gemm_universal_batched.cpp 9.03 KB