Commits · 457217932519e8eaf53db2b6da7feb55af43bfc1 · gaoqiong / composable_kernel_ROCM

17 Dec, 2024 18 commits
- Merge branch 'feature/use-larger-tile-size-for-chunk-prefill' into feature/add-splitkv-instance · 45721793
  Po Yen Chen authored Dec 17, 2024
  
  45721793
- Update num_splits heuristic · 6ff7fa94
  Po Yen Chen authored Dec 17, 2024
  
  6ff7fa94
- Move num_splits_heuristic() to fmha_fwd.hpp for reusability · 337f073d
  Po Yen Chen authored Nov 28, 2024
  
  337f073d
- Use simpler type hint for backward compatibility · 2da4b185
  Po Yen Chen authored Nov 28, 2024
  
  2da4b185
- Update num_splits heuristic for prefill phase · 59138975
  Po Yen Chen authored Nov 28, 2024
  
  59138975
- Update num_splits heuristic · 708f9d47
  Po Yen Chen authored Nov 28, 2024
  
  708f9d47
- Update num_splits heuristic for decode phase · 1d68af2a
  Po Yen Chen authored Nov 28, 2024
  
  1d68af2a
- Generate prefill combine instances only for group mode · 4a0998ac
  Po Yen Chen authored Nov 28, 2024
  
  4a0998ac
- Merge branch 'feature/use-larger-tile-size-for-chunk-prefill' into feature/add-splitkv-instance · b9dc91cc
  Po Yen Chen authored Dec 17, 2024
  
  b9dc91cc
- Use larger tile size for chunk prefill · 9d8d4c61
  Po Yen Chen authored Dec 17, 2024
  
  9d8d4c61
- Update num_splits heuristic · ff8d3c96
  Po Yen Chen authored Dec 17, 2024
  
  ff8d3c96
- Fix wrong trait template arg · ed634ea4
  Po Yen Chen authored Dec 17, 2024
  
  ed634ea4
- Update MakeKargs() arguments · 47e523ef
  Po Yen Chen authored Dec 17, 2024
  
  47e523ef
- Only launch splitkv kernel if num_splits == 1 · 6c7a3bf4
  Po Yen Chen authored Dec 17, 2024
  
  6c7a3bf4
- Workaround epilogue store issue · fa34e87c
  Po Yen Chen authored Dec 16, 2024
  
  fa34e87c
- Add instances to enable vector load on hdim_q/hdim_v · 7520e32d
  Po Yen Chen authored Dec 04, 2024
  
  7520e32d
- Merge branch 'develop' into feature/use-larger-tile-size-for-chunk-prefill · 401e643e
  Po Yen Chen authored Dec 17, 2024
  
  401e643e
- Enhance printing functionality (#1751) · d46196f2
  Adam Osewski authored Dec 17, 2024
```
* Added object print with all template parameters

* fix clang format

---------
Co-authored-by: ravil-mobile <ravil.aviva.com@gmail.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>
```
  d46196f2
16 Dec, 2024 5 commits
- clarify release notes bullet point · 0fd6978d
  Max Podkorytov authored Dec 10, 2024
  
  0fd6978d
- add contributing placeholder · 1b75c77d
  Max Podkorytov authored Dec 10, 2024
  
  1b75c77d
- add pull request template placeholder · 30a37cac
  Max Podkorytov authored Dec 10, 2024
  
  30a37cac
- add template placeholders · a8ad7fcc
  Max Podkorytov authored Dec 10, 2024
  
  a8ad7fcc
- upgrade sqlalchemy version (#1748) · fdfe2102
  Illia Silin authored Dec 15, 2024
```
* upgrade sqlalchemy version

* replace the connection with engine in to_sql call

* change the hipTes=nsor ctest syntax
```
  fdfe2102
15 Dec, 2024 1 commit

added moe interleaving pipeline (#1712) · f57d720c

Xu, Shengnan authored Dec 15, 2024



* added moe interleaving pipeline

* remove redundant code

* formater

---------
Co-authored-by: root <root@hjbog-srdc-14.amd.com>

f57d720c

14 Dec, 2024 2 commits
- upgrade pandas package (#1746) · d68974a5
  Illia Silin authored Dec 13, 2024
  
  d68974a5
- Add zstd lib for building hipTensor. (#1745) · 41ebf117
  Illia Silin authored Dec 13, 2024
```
* add zstd library to CI docker

* fix the libzstd name
```
  41ebf117
13 Dec, 2024 2 commits

Add SplitK support into Batched GEMM V3 (#1729) · 4d8fce33

Bartłomiej Kocot authored Dec 13, 2024



* add bmm api

* add bf16 multi_d

* add ckProfiler for bf16

* add ckProfiler files

* add more instance; fixed 64bit index issue

* fixed naming

* enabled batched Ds

* use long_index for ds offsets

* clean

* add bmm fp8 ckProfiler

* Update example/24_batched_gemm/batched_gemm_xdl_bf16_v3.cpp
Co-authored-by: Bartłomiej Kocot <bartlomiejkocot98@gmail.com>

* Update example/24_batched_gemm/batched_gemm_xdl_fp8_rowwise_v3.cpp
Co-authored-by: Bartłomiej Kocot <bartlomiejkocot98@gmail.com>

* Update example/24_batched_gemm/run_batched_gemm_example_rowwise.inc
Co-authored-by: Bartłomiej Kocot <bartlomiejkocot98@gmail.com>

* Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn.hpp
Co-authored-by: Bartłomiej Kocot <bartlomiejkocot98@gmail.com>

* Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_mem_v1_default_instance.cpp
Co-authored-by: Bartłomiej Kocot <bartlomiejkocot98@gmail.com>

* Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_mem_v2_default_instance.cpp
Co-authored-by: Bartłomiej Kocot <bartlomiejkocot98@gmail.com>

* Update profiler/src/profile_gemm_universal_batched.cpp
Co-authored-by: Bartłomiej Kocot <bartlomiejkocot98@gmail.com>

* Update profiler/include/profiler/profile_gemm_universal_batched_impl.hpp
Co-authored-by: Bartłomiej Kocot <bartlomiejkocot98@gmail.com>

* clean

* Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp

* Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp

* Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_comp_default_instance.cpp

* Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp

* Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp

* Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp

* refactor batch offset func

* add splitk suppport into bmm_v3

* clean

* clean

* format

* fixed

* fix

---------
Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

4d8fce33

Ck tile/smoothquant out stride (#1742) · 4e731776

chenjun authored Dec 13, 2024

* add ck_tile/smoothquant out stride parameter

* Remove the default stride value

---------

Co-authored-by: so <a.com>

4e731776

12 Dec, 2024 1 commit

[CK_TILE] naive attn (#1708) · 77a38e02

carlushuang authored Dec 12, 2024

* add reference attention fwd

* refactor addresser

* update

* paged, and i8 reflect-quant

* lets call it forward-quant

* fix error in decode variation

* update naive-attn

* fix page table

* fix build err

77a38e02

10 Dec, 2024 4 commits
- add missing stdexcept header (#1740) · 357a0b1c
  Illia Silin authored Dec 10, 2024
  
  357a0b1c
- Upgrade to Ubuntu22.04 as default OS. (#1738) · 90d8410d
  Illia Silin authored Dec 10, 2024
```
* upgrade to ubuntu 22.04

* try adding -u roof docker options for ubuntu 22
```
  90d8410d
- Make sure we call __hneg with half to remove ambigios error (#1736) · 67497a04
  Jatin Chaudhary authored Dec 10, 2024
  
  67497a04
- [CK TILE] Use config name instead of data type in FmhaFwdTypeConfig<config> (#1731) · 94ae7113
  rocking authored Dec 10, 2024
```
* Add data type config, Prepare to add mix precision in the future

* Fix compile error
```
  94ae7113
09 Dec, 2024 3 commits
- build CI for gfx12 by default (#1734) · 23cf2026
  Illia Silin authored Dec 09, 2024
  
  23cf2026
- update CI timeout limits (#1733) · 2f088b87
  Illia Silin authored Dec 09, 2024
  
  2f088b87
- remove unnecessary file (#1732) · c773cc25
  Illia Silin authored Dec 09, 2024
  
  c773cc25
06 Dec, 2024 4 commits

Refactor CI performance tests. (#1726) · 355893cd

Illia Silin authored Dec 06, 2024

* merge the build and performance tests CI stages together

* add gemm performance test on gfx11/gfx12

* add suffices to distinguish gemm performance logs from different archs

* use smaller gemm set in CI for gfx10/gfx11/gfx12

* disable performance tests on gfx1030

* fix the shashing logic

* fix finding python3 for mha instances

355893cd

Add copy assignment op test (#1718) · 5e6bd75a
Rostyslav Geyyer authored Dec 06, 2024
```
* Add copy assignment op test

* Add a deep copy testing
```
5e6bd75a
Support large batch tensors in grouped conv bwd data (#1711) · 261f1759
Bartłomiej Kocot authored Dec 06, 2024
```
* Support large batch tensors in grouped conv bwd data

* Fix multiD

* fixes

* fixes

* fixes
```
261f1759
Undo padding-flag changes in fmha_fwd_kernel.hpp (#1725) · 58e7f37f
Po Yen Chen authored Dec 06, 2024

58e7f37f