Commits · 8209d54cc89f92cf8bb71056d1df1fe35bd92aae · gaoqiong / composable_kernel_ROCM

14 Nov, 2024 1 commit
- Fix gfx1101 build · 8209d54c
  Andriy Roshchenko authored Nov 14, 2024
  
  8209d54c
12 Nov, 2024 2 commits
- Skip on gemm_universal tests. · d1cff7ad
  Andriy Roshchenko authored Nov 12, 2024
```
The tests take too long to complete on the emulator.
Need to see if it is possible to reduce the scope of the testing to just FP8 data types.
```
  d1cff7ad
- Upgrade to NPI 573 build docker. · 3520c19d
  Andriy Roshchenko authored Nov 12, 2024
  
  3520c19d
08 Nov, 2024 2 commits
- Fix data types and improve testing verbocity. · 61b20afa
  Andriy Roshchenko authored Nov 08, 2024
  
  61b20afa
- Verify more tests on floating point data · 51b9abb9
  Andriy Roshchenko authored Nov 08, 2024
  
  51b9abb9
07 Nov, 2024 7 commits
- Verify 38_grouped_conv_bwd_data_multiple_d on floating point numbers · 646b8e5c
  Andriy Roshchenko authored Nov 07, 2024
  
  646b8e5c
- Verify 20_grouped_conv_bwd_weight on floating point numbers · 405fdaec
  Andriy Roshchenko authored Nov 07, 2024
  
  405fdaec
- Verify 04_gemm_add_add_fastgelu on floating point numbers · ff6bbf40
  Andriy Roshchenko authored Nov 07, 2024
  
  ff6bbf40
- Verify 35_splitk_gemm on floating point numbers. · 52cd7ade
  Andriy Roshchenko authored Nov 07, 2024
```
splitk gemm appears to be losing precision VS reference implementation when FP numbers are involved.
```
  52cd7ade
- Enable instances built for gfx94 to be built on gfx950 · e942e568
  Andriy Roshchenko authored Nov 07, 2024
  
  e942e568
- Merge remote-tracking branch origin/gfx950 into andriy/lwpck-2430 · 8c547ea3
  Andriy Roshchenko authored Nov 07, 2024
  
  8c547ea3
- Introduce two new tensor generators · 1fb3bb8d
  Andriy Roshchenko authored Nov 07, 2024
  
  1fb3bb8d
06 Nov, 2024 6 commits
- Facilitate testing of FP8 data types on the emulator · 3dea7cc8
  Andriy Roshchenko authored Nov 06, 2024
  
  3dea7cc8
- Make sure all tests and examples are built for gfx950 · 2eb1ba44
  Andriy Roshchenko authored Nov 06, 2024
  
  2eb1ba44
- sync from public repo · 261d76c4
  illsilin authored Nov 06, 2024
  
  261d76c4
- sync from public repo · a4522ae3
  illsilin authored Nov 06, 2024
  
  a4522ae3
- Change default verification method to CPU. · 360dd17a
  Andriy Roshchenko authored Nov 06, 2024
```
GPU verification takes too much time to complete on the emulator.
```
  360dd17a
- Merge pull request #214 from ROCm/merge_from_public · e0594d08
  Illia Silin authored Nov 06, 2024
```
Merge from public
```
  e0594d08
05 Nov, 2024 10 commits
- merge from public repo · 667cd6ab
  illsilin authored Nov 05, 2024
  
  667cd6ab
- Prevent instantiation of undefined FP8 operators. (#1639) · 365f39ae
  Andriy Roshchenko authored Nov 05, 2024
  
  365f39ae
- remove gfx940;gfx941 from default target lists (#1640) · 54440cf5
  Illia Silin authored Nov 05, 2024
  
  54440cf5
- Fix test success reporting logic · 7b8e2cf6
  Andriy Roshchenko authored Nov 05, 2024
  
  7b8e2cf6
- Statically Cast Pointer Offset (#1631) · d0e3a70a
  darren-amd authored Nov 05, 2024
```
* explicit cast ptr offset

* formating change
```
  d0e3a70a
- Make sure cmake can handle the xnack+/xnack- targets. (#1633) · b6e74be1
  Illia Silin authored Nov 05, 2024
```
* make sure cmake can handle xnack targets

* dont build xdl instances for gfx906:xnack-

* dont build xdl tests for gfx906:xnack-
```
  b6e74be1
- [generate.py] Override blob list if it already exists (#1635) · 464abd23
  Juan Manuel Martinez Caamaño authored Nov 05, 2024
```
Before, generate.py appended the list at the end of the output file.
When running the cmake configuration steps multiple times on the
examples, the blob list (such as fwd_blob_list.txt) would grow at every
configuration.
`library/src/tensor_operation_instance/gpu/mha/CMakeLists.txt` worked around
this issue by removing the output file if it exists.

Now, generate.py overrides the content of the output file.
There is no need for the workaround in the CMakeLists.txt;
and the issue is solved for the example projects too.
```
  464abd23
- Prevent sccache server from shutting down during build · 7d43f3f4
  Andriy Roshchenko authored Nov 05, 2024
  
  7d43f3f4
- Linsun/convint8 fwd instances (#1626) · 0c9012fb
  Lin Sun authored Nov 04, 2024
```
Add instances for int8 grouped conv2d fwd
---------
Co-authored-by: root <root@dell300x-pla-t28-03.pla.dcgpu>
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
```
  0c9012fb
- Add FP8 type selection into client_axample CMakeLists.txt · e34c2ee3
  Andriy Roshchenko authored Nov 05, 2024
  
  e34c2ee3
04 Nov, 2024 3 commits
- Prevent instantiation of operators that are not supported by FP8 data types · 1ccb8112
  Andriy Roshchenko authored Nov 04, 2024
  
  1ccb8112
- Provide single point of truth for FP8 INF and NAN checks · 97a5cca9
  Andriy Roshchenko authored Nov 04, 2024
  
  97a5cca9
- Temporary disable part of dynamic op conv instances (#1630) · 4f1fdbb6
  Bartłomiej Kocot authored Nov 04, 2024
```
* Temporary disable part of dynamic op conv instances

* fix
```
  4f1fdbb6
02 Nov, 2024 1 commit

[CK_TILE] layernorm have more accurate residual (#1623) · cb6c5d39

carlushuang authored Nov 02, 2024



* more accurate residual

* modify comment

* Fix literal case in README.md

---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

cb6c5d39

01 Nov, 2024 7 commits

Merge branch gfx950 into andriy/lwpck-2413 · e941f59f
Andriy Roshchenko authored Nov 01, 2024

e941f59f
Merge remote-tracking branch 'origin/develop' into gfx950 · 7da48908
Andriy Roshchenko authored Nov 01, 2024

7da48908

Reduce build time. (#1621) · 03c6448b

Illia Silin authored Oct 31, 2024

* disable fp8 gemm_universal on gfx90a and gfx908 by default

* fix cmake syntax

* fix clang format

* add ifdefs in amd_xdlops

* disable fp8 gemm instances on gfx90a by default

* update readme

03c6448b

[Ck_tile] smoothquant (#1617) · fbd65454

rocking authored Nov 01, 2024



* fix compile error

* fix typo of padding

* Add smoothquant op

* Add smoothquant instance library

* refine type

* add test script

* Re-generate smoothquant.hpp

* Always use 'current year' in copyright

* use Generic2dBlockShape instead

* Add vector = 8 instance back

* Find exe path automatically

* Simplify the api condition

* Remove debugging code

* update year

* Add blank line between function declaration

* explicitly cast return value to dim3

* refine return value

* Fix default warmup and repeat value

* Add comment

* refactor sommthquant cmake

* Add README

* Fix typo

---------
Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com>

fbd65454

Fix dependencies. · fe9d9812
Andriy Roshchenko authored Nov 01, 2024

fe9d9812
[layernorm] hot fix (#1620) · 550248de
carlushuang authored Nov 01, 2024
```
* hot fix ln

* some rename
```
550248de
Merge pull request #209 from ROCm/andriy/merge_from_public · 7d50244e
Illia Silin authored Oct 31, 2024
```
Update develop branch from public repository
```
7d50244e

31 Oct, 2024 1 commit
- Merge remote-tracking branch 'ck_public/develop' into andriy/merge_from_public · d51701d4
  Andriy Roshchenko authored Oct 31, 2024
  
  d51701d4