1. 17 Dec, 2024 18 commits
  2. 16 Dec, 2024 5 commits
  3. 15 Dec, 2024 1 commit
  4. 14 Dec, 2024 2 commits
  5. 13 Dec, 2024 2 commits
    • Bartłomiej Kocot's avatar
      Add SplitK support into Batched GEMM V3 (#1729) · 4d8fce33
      Bartłomiej Kocot authored
      
      
      * add bmm api
      
      * add bf16 multi_d
      
      * add ckProfiler for bf16
      
      * add ckProfiler files
      
      * add more instance; fixed 64bit index issue
      
      * fixed naming
      
      * enabled batched Ds
      
      * use long_index for ds offsets
      
      * clean
      
      * add bmm fp8 ckProfiler
      
      * Update example/24_batched_gemm/batched_gemm_xdl_bf16_v3.cpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update example/24_batched_gemm/batched_gemm_xdl_fp8_rowwise_v3.cpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update example/24_batched_gemm/run_batched_gemm_example_rowwise.inc
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn.hpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_mem_v1_default_instance.cpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_mem_v2_default_instance.cpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update profiler/src/profile_gemm_universal_batched.cpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update profiler/include/profiler/profile_gemm_universal_batched_impl.hpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * clean
      
      * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
      
      * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
      
      * Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_comp_default_instance.cpp
      
      * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
      
      * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
      
      * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
      
      * refactor batch offset func
      
      * add splitk suppport into bmm_v3
      
      * clean
      
      * clean
      
      * format
      
      * fixed
      
      * fix
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizhan@fb.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      4d8fce33
    • chenjun's avatar
      Ck tile/smoothquant out stride (#1742) · 4e731776
      chenjun authored
      * add ck_tile/smoothquant out stride parameter
      
      * Remove the default stride value
      
      ---------
      
      Co-authored-by: so <a.com>
      4e731776
  6. 12 Dec, 2024 1 commit
    • carlushuang's avatar
      [CK_TILE] naive attn (#1708) · 77a38e02
      carlushuang authored
      * add reference attention fwd
      
      * refactor addresser
      
      * update
      
      * paged, and i8 reflect-quant
      
      * lets call it forward-quant
      
      * fix error in decode variation
      
      * update naive-attn
      
      * fix page table
      
      * fix build err
      77a38e02
  7. 10 Dec, 2024 4 commits
  8. 09 Dec, 2024 3 commits
  9. 06 Dec, 2024 4 commits