Commits · 049cacff7685f816a3e6f3e54ec7bdd54ab0610b · gaoqiong / composable_kernel_ROCM

14 Nov, 2024 1 commit
- start · 049cacff
  letaoqin authored Nov 14, 2024
  
  049cacff
13 Nov, 2024 3 commits
- update first gemm ok · 572865a6
  carlushuang authored Nov 14, 2024
  
  572865a6
- Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant · 9ec4e3f7
  carlushuang authored Nov 13, 2024
  
  9ec4e3f7
- update · 7ccdbe16
  carlushuang authored Nov 13, 2024
  
  7ccdbe16
12 Nov, 2024 3 commits
- test rocm6.3 rc1 build 20 (#1659) · 489c78d0
  Illia Silin authored Nov 12, 2024
  
  489c78d0
- Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant · e2a318bc
  carlushuang authored Nov 12, 2024
  
  e2a318bc
- [CK Tile] Improve the Layout, Padding, and Alignment features of CK Tile GEMM (#1651) · 2b6458dd
  Thomas Ning authored Nov 12, 2024
```
* Finished the feature

* Modified the test file

* Test case update

* addresss comment

* Addressed the review comment

* Fixed the CI error
```
  2b6458dd
11 Nov, 2024 6 commits
- restore collecting performance of mixed prec gemms (#1648) · 5fb150db
  Illia Silin authored Nov 11, 2024
  
  5fb150db
- [CK_TILE] add more stride for layernorm to support un-continuous Tensor (#1650) · 8ef8a994
  valarLip authored Nov 11, 2024
```
* [CK_TILE] add more stride for layernorm to support un-continuous Tensor

* align CK coding style

* extend strides to layernrom expample

* clang-format...
```
  8ef8a994
- update · d0405504
  carlushuang authored Nov 11, 2024
  
  d0405504
- Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant · 9d3cdd21
  carlushuang authored Nov 11, 2024
  
  9d3cdd21
- block-asm · 06914eed
  carlushuang authored Nov 11, 2024
  
  06914eed
- Return nullptr when block index is invalid (#1649) · 13332998
  Po Yen Chen authored Nov 11, 2024
  
  13332998
09 Nov, 2024 2 commits

dummycoderfe authored Nov 09, 2024



* add moe_sorting & check ok

* fix comments & typo

* Run remod.py under include/ck_tile & example/ck_tile directories

* format codes

* fix output ci check bug

* fix moe sorting readme and error commit file

* use magiv div to accelerate compute

* add an loop unroll for moe lds ops

* add extblocksnel to set zeros for moebufs

* [Ck_tile] moe set zero run ok, add size check and fix ref check

* [Ck_tile]fix moe_sorting fuse set_zero remod

* [Ck_tile] change name style, fix zero buffer size err, change folder

* [Ck_tile] moe_sorting: fix name style

* [Ck_tile] moe_sorting, remove useless params in traits

* [Ck_tile] change outputtile cnt * unit_size; change output buf alloc

---------
Co-authored-by: dummycoderfe <noplydummmycoder@163.com>
Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>

bec6fbc6

Fix 'sh' command compatibility of smoke_test_fwd.sh (#1553) · af9546d9
Po Yen Chen authored Nov 09, 2024

af9546d9

08 Nov, 2024 2 commits

Add generic instances for two stage conv bwd wei (#1643) · ea3640fd
Bartłomiej Kocot authored Nov 08, 2024
```
* Add generic instances for two stage conv bwd wei

* Update layout prefix
```
ea3640fd

[Ck tile] layernorm2d fwd optimize (#1637) · 686a58a9

dummycoderfe authored Nov 08, 2024



* optimze small N case using vec io and using rcp div

* [Ck_tile] layernorm, add param to control fastdiv; change generate codes and test pass

* [Ck_tile] fix blockSize compute in Generic2dBlockShape

* [Ck_tile]fix kfastfdiv template style

* [Ck_tile] layernorm, fix stype in review

---------
Co-authored-by: dummycoderfe <noplydummmycoder@163.com>

686a58a9

07 Nov, 2024 4 commits
- enable compilation for generic navi targets (#1645) · 75c5bfa3
  Illia Silin authored Nov 07, 2024
  
  75c5bfa3
- rename to ex pipeline · b0dd570a
  carlushuang authored Nov 07, 2024
  
  b0dd570a
- Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant · 7977f89d
  carlushuang authored Nov 07, 2024
  
  7977f89d
- update pipeline · 45131629
  carlushuang authored Nov 07, 2024
  
  45131629
06 Nov, 2024 6 commits
- Fix F16 type (#1583) · 3599418a
  rocking authored Nov 07, 2024
  
  3599418a
- compiler ok · f09dc1f3
  carlushuang authored Nov 07, 2024
  
  f09dc1f3
- update pipeline_gemm0 · 3bb718ad
  valarLip authored Nov 06, 2024
  
  3bb718ad
- Generic threshold calculation after merge fixes (#1618) · dcafb1de
  aledudek authored Nov 06, 2024
```
* Generic threshold calculation add passing num of accums

* Generic threshold - after merge fixes

* Fix cmakelists

---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
```
  dcafb1de
- update cpu reference · c6c3c142
  carlushuang authored Nov 06, 2024
  
  c6c3c142
- update · a288c57c
  valarLip authored Nov 06, 2024
  
  a288c57c
05 Nov, 2024 10 commits
- Prevent instantiation of undefined FP8 operators. (#1639) · 365f39ae
  Andriy Roshchenko authored Nov 05, 2024
  
  365f39ae
- remove gfx940;gfx941 from default target lists (#1640) · 54440cf5
  Illia Silin authored Nov 05, 2024
  
  54440cf5
- Statically Cast Pointer Offset (#1631) · d0e3a70a
  darren-amd authored Nov 05, 2024
```
* explicit cast ptr offset

* formating change
```
  d0e3a70a
- Make sure cmake can handle the xnack+/xnack- targets. (#1633) · b6e74be1
  Illia Silin authored Nov 05, 2024
```
* make sure cmake can handle xnack targets

* dont build xdl instances for gfx906:xnack-

* dont build xdl tests for gfx906:xnack-
```
  b6e74be1
- compile OK · cf646183
  carlushuang authored Nov 06, 2024
  
  cf646183
- [generate.py] Override blob list if it already exists (#1635) · 464abd23
  Juan Manuel Martinez Caamaño authored Nov 05, 2024
```
Before, generate.py appended the list at the end of the output file.
When running the cmake configuration steps multiple times on the
examples, the blob list (such as fwd_blob_list.txt) would grow at every
configuration.
`library/src/tensor_operation_instance/gpu/mha/CMakeLists.txt` worked around
this issue by removing the output file if it exists.

Now, generate.py overrides the content of the output file.
There is no need for the workaround in the CMakeLists.txt;
and the issue is solved for the example projects too.
```
  464abd23
- update code · 70fa98ad
  carlushuang authored Nov 05, 2024
  
  70fa98ad
- Merge remote-tracking branch 'origin/develop' into ck_tile/moe_quant · 7c81aee8
  carlushuang authored Nov 05, 2024
  
  7c81aee8
- moe pipeline · 49c39b51
  carlushuang authored Nov 05, 2024
  
  49c39b51
- Linsun/convint8 fwd instances (#1626) · 0c9012fb
  Lin Sun authored Nov 04, 2024
```
Add instances for int8 grouped conv2d fwd
---------
Co-authored-by: root <root@dell300x-pla-t28-03.pla.dcgpu>
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
```
  0c9012fb
04 Nov, 2024 1 commit
- Temporary disable part of dynamic op conv instances (#1630) · 4f1fdbb6
  Bartłomiej Kocot authored Nov 04, 2024
```
* Temporary disable part of dynamic op conv instances

* fix
```
  4f1fdbb6
02 Nov, 2024 1 commit

[CK_TILE] layernorm have more accurate residual (#1623) · cb6c5d39

carlushuang authored Nov 02, 2024



* more accurate residual

* modify comment

* Fix literal case in README.md

---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

cb6c5d39

01 Nov, 2024 1 commit

Reduce build time. (#1621) · 03c6448b

Illia Silin authored Oct 31, 2024

* disable fp8 gemm_universal on gfx90a and gfx908 by default

* fix cmake syntax

* fix clang format

* add ifdefs in amd_xdlops

* disable fp8 gemm instances on gfx90a by default

* update readme

03c6448b