Commits · bdc1dd6f6655559e0828304ad3bd90168bab5dc5 · gaoqiong / composable_kernel_ROCM

22 Nov, 2024 2 commits
- Fix typo · bdc1dd6f
  Rostyslav Geyyer authored Nov 22, 2024
  
  bdc1dd6f
- Fix client examples build · 26d32773
  Rostyslav Geyyer authored Nov 22, 2024
  
  26d32773
21 Nov, 2024 8 commits
- Fix even more gfx950 conversions · 1e55b6f6
  Rostyslav Geyyer authored Nov 21, 2024
  
  1e55b6f6
- Fix more gfx950 conversions · 33f4f75b
  Rostyslav Geyyer authored Nov 21, 2024
  
  33f4f75b
- Clean up · af4e2bd2
  Rostyslav Geyyer authored Nov 21, 2024
  
  af4e2bd2
- Fix gfx950 conversions · e137941c
  Rostyslav Geyyer authored Nov 21, 2024
  
  e137941c
- Clean up · 0ad5d7f7
  Rostyslav Geyyer authored Nov 21, 2024
  
  0ad5d7f7
- Fix linker errror · 883d9c6c
  Rostyslav Geyyer authored Nov 21, 2024
  
  883d9c6c
- Fix · c7345e52
  Rostyslav Geyyer authored Nov 21, 2024
  
  c7345e52
- Add vector conversions · a8cd34d6
  Rostyslav Geyyer authored Nov 21, 2024
  
  a8cd34d6
19 Nov, 2024 1 commit
- Fix · b5ac2abd
  Rostyslav Geyyer authored Nov 19, 2024
  
  b5ac2abd
18 Nov, 2024 2 commits
- Add tensor generators · 1475cb44
  Rostyslav Geyyer authored Nov 18, 2024
  
  1475cb44
- Add check_err function · 35a32da2
  Rostyslav Geyyer authored Nov 18, 2024
  
  35a32da2
15 Nov, 2024 1 commit
- Add vector types and tests · 37072aac
  Rostyslav Geyyer authored Nov 15, 2024
  
  37072aac
08 Nov, 2024 2 commits
- Add debug tests · 1bc375e9
  Rostyslav Geyyer authored Nov 08, 2024
  
  1bc375e9
- Add fp4 vectors · aa1920da
  Rostyslav Geyyer authored Nov 08, 2024
  
  aa1920da
07 Nov, 2024 2 commits
- Format · 9433306a
  Rostyslav Geyyer authored Nov 07, 2024
  
  9433306a
- Merge branch 'gfx950' into lwpck-2390 · 630042d8
  Rostyslav Geyyer authored Nov 07, 2024
  
  630042d8
06 Nov, 2024 6 commits
- Add device conversions · 5f1a24a8
  Rostyslav Geyyer authored Nov 06, 2024
  
  5f1a24a8
- Add scaled conversions with tests · 1bca7134
  Rostyslav Geyyer authored Nov 06, 2024
  
  1bca7134
- Add scale <-> float conversions · 0bb6e25f
  Rostyslav Geyyer authored Nov 06, 2024
  
  0bb6e25f
- sync from public repo · 261d76c4
  illsilin authored Nov 06, 2024
  
  261d76c4
- sync from public repo · a4522ae3
  illsilin authored Nov 06, 2024
  
  a4522ae3
- Merge pull request #214 from ROCm/merge_from_public · e0594d08
  Illia Silin authored Nov 06, 2024
```
Merge from public
```
  e0594d08
05 Nov, 2024 7 commits

merge from public repo · 667cd6ab
illsilin authored Nov 05, 2024

667cd6ab
Prevent instantiation of undefined FP8 operators. (#1639) · 365f39ae
Andriy Roshchenko authored Nov 05, 2024

365f39ae
remove gfx940;gfx941 from default target lists (#1640) · 54440cf5
Illia Silin authored Nov 05, 2024

54440cf5
Statically Cast Pointer Offset (#1631) · d0e3a70a
darren-amd authored Nov 05, 2024
```
* explicit cast ptr offset

* formating change
```
d0e3a70a

Make sure cmake can handle the xnack+/xnack- targets. (#1633) · b6e74be1

Illia Silin authored Nov 05, 2024

* make sure cmake can handle xnack targets

* dont build xdl instances for gfx906:xnack-

* dont build xdl tests for gfx906:xnack-

b6e74be1

[generate.py] Override blob list if it already exists (#1635) · 464abd23

Juan Manuel Martinez Caamaño authored Nov 05, 2024

Before, generate.py appended the list at the end of the output file.
When running the cmake configuration steps multiple times on the
examples, the blob list (such as fwd_blob_list.txt) would grow at every
configuration.
`library/src/tensor_operation_instance/gpu/mha/CMakeLists.txt` worked around
this issue by removing the output file if it exists.

Now, generate.py overrides the content of the output file.
There is no need for the workaround in the CMakeLists.txt;
and the issue is solved for the example projects too.

464abd23

Linsun/convint8 fwd instances (#1626) · 0c9012fb

Lin Sun authored Nov 04, 2024



Add instances for int8 grouped conv2d fwd
---------
Co-authored-by: root <root@dell300x-pla-t28-03.pla.dcgpu>
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

0c9012fb

04 Nov, 2024 2 commits
- Temporary disable part of dynamic op conv instances (#1630) · 4f1fdbb6
  Bartłomiej Kocot authored Nov 04, 2024
```
* Temporary disable part of dynamic op conv instances

* fix
```
  4f1fdbb6
- Add stochastic rounding tests · 4c47048f
  Rostyslav Geyyer authored Nov 04, 2024
  
  4c47048f
02 Nov, 2024 1 commit

[CK_TILE] layernorm have more accurate residual (#1623) · cb6c5d39

carlushuang authored Nov 02, 2024



* more accurate residual

* modify comment

* Fix literal case in README.md

---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

cb6c5d39

01 Nov, 2024 5 commits

Merge remote-tracking branch 'origin/develop' into gfx950 · 7da48908
Andriy Roshchenko authored Nov 01, 2024

7da48908

Reduce build time. (#1621) · 03c6448b

Illia Silin authored Oct 31, 2024

* disable fp8 gemm_universal on gfx90a and gfx908 by default

* fix cmake syntax

* fix clang format

* add ifdefs in amd_xdlops

* disable fp8 gemm instances on gfx90a by default

* update readme

03c6448b

[Ck_tile] smoothquant (#1617) · fbd65454

rocking authored Nov 01, 2024



* fix compile error

* fix typo of padding

* Add smoothquant op

* Add smoothquant instance library

* refine type

* add test script

* Re-generate smoothquant.hpp

* Always use 'current year' in copyright

* use Generic2dBlockShape instead

* Add vector = 8 instance back

* Find exe path automatically

* Simplify the api condition

* Remove debugging code

* update year

* Add blank line between function declaration

* explicitly cast return value to dim3

* refine return value

* Fix default warmup and repeat value

* Add comment

* refactor sommthquant cmake

* Add README

* Fix typo

---------
Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com>

fbd65454

[layernorm] hot fix (#1620) · 550248de
carlushuang authored Nov 01, 2024
```
* hot fix ln

* some rename
```
550248de
Merge pull request #209 from ROCm/andriy/merge_from_public · 7d50244e
Illia Silin authored Oct 31, 2024
```
Update develop branch from public repository
```
7d50244e

31 Oct, 2024 1 commit
- Merge remote-tracking branch 'ck_public/develop' into andriy/merge_from_public · d51701d4
  Andriy Roshchenko authored Oct 31, 2024
  
  d51701d4