Commits · 988478d452fbddd1140b196049ece4a813864795 · gaoqiong / composable_kernel_ROCM

26 Dec, 2024 1 commit
- edit fp8 ab scale for Scale_Block_M=1 · 988478d4
  chenjun authored Dec 26, 2024
  
  988478d4
25 Dec, 2024 1 commit
- Modify the a_thread offset since the A data load is different from B. · f728087c
  mtgu0705 authored Dec 25, 2024
  
  f728087c
23 Dec, 2024 1 commit
- Enable multiply_multiply for Scale_Block_M = 1 for deepseek · 1fcd3329
  mtgu0705 authored Dec 23, 2024
  
  1fcd3329
20 Dec, 2024 3 commits
- Comment the first one · e5bc56a4
  mtgu0705 authored Dec 20, 2024
  
  e5bc56a4
- Added two kernel for M=32 problem · f2948084
  mtgu0705 authored Dec 20, 2024
  
  f2948084
- fix profiler_grouped_gemm (#1766) · 2944c508
  Illia Silin authored Dec 19, 2024
  
  2944c508
19 Dec, 2024 1 commit

Apply Ck-tile argument parser for vectors [I/O] (#1758) · e758d006

Mateusz Ozga authored Dec 19, 2024

* Parser for a vector was added. Additionaly we valid correctnes of numbers

* Remove unnecessary comments

* Review part 1

* Review part 2

* Add const to variadic lambda

* Rename C->K

e758d006

18 Dec, 2024 3 commits

[CK TILE] Refactor GemmKernel to be reused by other GEMM related operators (#1730) · 453ca373

aledudek authored Dec 18, 2024

* Gemm Kernel Refactor part1

* Gemm Kernel Refactor common gemm pipeline part2

* [CK TILE] Refactor batched gemm to reuse GemmKernel

* [CK TILE] Refactor GemmKernel - review changes part1

* [CK TILE] Refactor GemmKernel - references fix

* [CK TILE] Refactor GemmKernel - naming changes, add problem

* [CK_TILE] Refactor GemmKernel - update tests

* [CK_TILE] Refactor GemmKernel - review changes

* [CK_TILE] Refactor GemmKernel - update test

* [CK_TILE] Refactor GemmKernel - constness fixes

* [CK_TILE] Refactor GemmKernel - update tests

453ca373

Disambiguate bit_cast (#1749) · 1c1b3363

Xiaodong Wang authored Dec 18, 2024



Adding namespace to disambiguate with std::bit_cast
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

1c1b3363

[CK_TILE] Move hipmalloc/memcpy calls out of gpu reference gemm (#1743) · f6c4d614

aledudek authored Dec 18, 2024

* [CK_TILE] Move hipmalloc/memcpy calls out of gpu reference gemm

* [CK_TILE] Move hipmalloc/memcpy calls out of gpu reference gemm - review changes

* [CK_TILE] Move hipmalloc/memcpy calls out of gpu reference gemm - review fix

f6c4d614

17 Dec, 2024 6 commits

updated fp16 instances to be on parity with universal gemm instances (#1754) · d9e37c68
Harisankar Sadasivan authored Dec 17, 2024
```
* updated fp16 instances to be on parity with universal gemm instances

* corrected instance name to streamk instance
```
d9e37c68
Pass build flags to config.h (#1760) · 689a5ae4
Illia Silin authored Dec 17, 2024
```
* pass the build flags to config.h

* fix clang format
```
689a5ae4
refactor conditional usage; fix build on rocm6.1 where the reference didn't exist · 6ef8d3c2
Max Podkorytov authored Dec 12, 2024

6ef8d3c2

Bump rocm-docs-core from 1.11.0 to 1.12.0 in /docs/sphinx (#1753) · 0e54d7ae

dependabot[bot] authored Dec 17, 2024

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.11.0 to 1.12.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.11.0...v1.12.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

0e54d7ae

Added unit tests for CK Tile compute bound gemm pipeline (#1728) · 627a27bd
jakpiase authored Dec 17, 2024

627a27bd

Enhance printing functionality (#1751) · d46196f2

Adam Osewski authored Dec 17, 2024



* Added object print with all template parameters

* fix clang format

---------
Co-authored-by: ravil-mobile <ravil.aviva.com@gmail.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>

d46196f2

16 Dec, 2024 5 commits
- clarify release notes bullet point · 0fd6978d
  Max Podkorytov authored Dec 10, 2024
  
  0fd6978d
- add contributing placeholder · 1b75c77d
  Max Podkorytov authored Dec 10, 2024
  
  1b75c77d
- add pull request template placeholder · 30a37cac
  Max Podkorytov authored Dec 10, 2024
  
  30a37cac
- add template placeholders · a8ad7fcc
  Max Podkorytov authored Dec 10, 2024
  
  a8ad7fcc
- upgrade sqlalchemy version (#1748) · fdfe2102
  Illia Silin authored Dec 15, 2024
```
* upgrade sqlalchemy version

* replace the connection with engine in to_sql call

* change the hipTes=nsor ctest syntax
```
  fdfe2102
15 Dec, 2024 1 commit

added moe interleaving pipeline (#1712) · f57d720c

Xu, Shengnan authored Dec 15, 2024



* added moe interleaving pipeline

* remove redundant code

* formater

---------
Co-authored-by: root <root@hjbog-srdc-14.amd.com>

f57d720c

14 Dec, 2024 2 commits
- upgrade pandas package (#1746) · d68974a5
  Illia Silin authored Dec 13, 2024
  
  d68974a5
- Add zstd lib for building hipTensor. (#1745) · 41ebf117
  Illia Silin authored Dec 13, 2024
```
* add zstd library to CI docker

* fix the libzstd name
```
  41ebf117
13 Dec, 2024 2 commits

Add SplitK support into Batched GEMM V3 (#1729) · 4d8fce33

Bartłomiej Kocot authored Dec 13, 2024



* add bmm api

* add bf16 multi_d

* add ckProfiler for bf16

* add ckProfiler files

* add more instance; fixed 64bit index issue

* fixed naming

* enabled batched Ds

* use long_index for ds offsets

* clean

* add bmm fp8 ckProfiler

* Update example/24_batched_gemm/batched_gemm_xdl_bf16_v3.cpp
Co-authored-by: Bartłomiej Kocot <bartlomiejkocot98@gmail.com>

* Update example/24_batched_gemm/batched_gemm_xdl_fp8_rowwise_v3.cpp
Co-authored-by: Bartłomiej Kocot <bartlomiejkocot98@gmail.com>

* Update example/24_batched_gemm/run_batched_gemm_example_rowwise.inc
Co-authored-by: Bartłomiej Kocot <bartlomiejkocot98@gmail.com>

* Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn.hpp
Co-authored-by: Bartłomiej Kocot <bartlomiejkocot98@gmail.com>

* Update library/src/tensor_operation_instance/g...

4d8fce33

Ck tile/smoothquant out stride (#1742) · 4e731776

chenjun authored Dec 13, 2024

* add ck_tile/smoothquant out stride parameter

* Remove the default stride value

---------

Co-authored-by: so <a.com>

4e731776

12 Dec, 2024 1 commit

[CK_TILE] naive attn (#1708) · 77a38e02

carlushuang authored Dec 12, 2024

* add reference attention fwd

* refactor addresser

* update

* paged, and i8 reflect-quant

* lets call it forward-quant

* fix error in decode variation

* update naive-attn

* fix page table

* fix build err

77a38e02

10 Dec, 2024 4 commits
- add missing stdexcept header (#1740) · 357a0b1c
  Illia Silin authored Dec 10, 2024
  
  357a0b1c
- Upgrade to Ubuntu22.04 as default OS. (#1738) · 90d8410d
  Illia Silin authored Dec 10, 2024
```
* upgrade to ubuntu 22.04

* try adding -u roof docker options for ubuntu 22
```
  90d8410d
- Make sure we call __hneg with half to remove ambigios error (#1736) · 67497a04
  Jatin Chaudhary authored Dec 10, 2024
  
  67497a04
- [CK TILE] Use config name instead of data type in FmhaFwdTypeConfig<config> (#1731) · 94ae7113
  rocking authored Dec 10, 2024
```
* Add data type config, Prepare to add mix precision in the future

* Fix compile error
```
  94ae7113
09 Dec, 2024 3 commits
- build CI for gfx12 by default (#1734) · 23cf2026
  Illia Silin authored Dec 09, 2024
  
  23cf2026
- update CI timeout limits (#1733) · 2f088b87
  Illia Silin authored Dec 09, 2024
  
  2f088b87
- remove unnecessary file (#1732) · c773cc25
  Illia Silin authored Dec 09, 2024
  
  c773cc25
06 Dec, 2024 5 commits

Refactor CI performance tests. (#1726) · 355893cd

Illia Silin authored Dec 06, 2024

* merge the build and performance tests CI stages together

* add gemm performance test on gfx11/gfx12

* add suffices to distinguish gemm performance logs from different archs

* use smaller gemm set in CI for gfx10/gfx11/gfx12

* disable performance tests on gfx1030

* fix the shashing logic

* fix finding python3 for mha instances

355893cd

Add copy assignment op test (#1718) · 5e6bd75a
Rostyslav Geyyer authored Dec 06, 2024
```
* Add copy assignment op test

* Add a deep copy testing
```
5e6bd75a
Support large batch tensors in grouped conv bwd data (#1711) · 261f1759
Bartłomiej Kocot authored Dec 06, 2024
```
* Support large batch tensors in grouped conv bwd data

* Fix multiD

* fixes

* fixes

* fixes
```
261f1759
Undo padding-flag changes in fmha_fwd_kernel.hpp (#1725) · 58e7f37f
Po Yen Chen authored Dec 06, 2024

58e7f37f

Upgrade default compiler to ROCm6.3 (#1723) · 86990558

Illia Silin authored Dec 05, 2024



* upgrade to rocm6.3 compiler

* Proposed solution to convnd test failures in ROCm 6.3

---------
Co-authored-by: Andriy Roshchenko <andriy.roshchenko@amd.com>

86990558

05 Dec, 2024 1 commit

Add IsSupportedArgument() to gemm_kernel (#1698) · feb9a2bd

jakpiase authored Dec 05, 2024

* add IsSupportedArgument to gemm_kernel

* add ut and do some refactoring

* switched to ck_tile's integral_constant

feb9a2bd