Commits · fe6b185b97e9f9875ef470884e9f9fba17be02d5 · gaoqiong / composable_kernel_ROCM

27 Nov, 2024 2 commits

move utility headers from library/include to include path (#1697) · fe6b185b
Illia Silin authored Nov 27, 2024

fe6b185b

Polished Grouped GEMM APIs and new BF16 instances (#1600) · 061ac064

Adam Osewski authored Nov 27, 2024

* Few small fixes.

* New GroupedGemm instances (BF16)

* Unify and refactor GroupedGEMM device API.

* Adapt changes to new API.

* Adapt grouped gemm profiler.

* Accept multiple kbatches for grouped gemm profiler.

- delete obsolete two stage as it is now covered by grouped gemm

* Update unit test for grouped gemm.

* Fix thresholds for BF16 and F8. Unblock tests.

* Fix few instances.

* Multiple small fixes.

* Adapt to new API, check dynamic casting.

* Uncomment few data types in grouped gemm profiler.

* Fix call to SetDeviceArgs.

* Fix profile grouped gemm multiply tile loop.

* Fix grouped gemm tile loop kernel args in client examples.

* Review comments.

061ac064

26 Nov, 2024 2 commits

Change block gemm pipeline local prefill loop order. (#1692) · bfe983a1
Adam Osewski authored Nov 26, 2024
```
* Fix loop order.

* Fix loop order in pipeline v4
```
bfe983a1

Add check for bf16 splitk support for grouped gemm splitk (#1673) · b70f367f

jakpiase authored Nov 26, 2024



* add check for bf16 splitk support for grouped gemm splitk

* Update if condition

---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

b70f367f

21 Nov, 2024 1 commit

universal streamk fp8 changes (#1665) · d6d4c278

Harisankar Sadasivan authored Nov 21, 2024



* universal streamk fp8 changes & ckprofiler instances

* revert strides to -1 and verification options

* fp8 exclusion on pre-gfx94 for universal_streamk

* PR review based revisions: permissions reverted,  removed hip err checks


---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

d6d4c278

18 Nov, 2024 2 commits

Add bf16 and int8 wmma gemms for Navi3x and Navi4x. (#1671) · 8aba2724

Illia Silin authored Nov 18, 2024

* add bf16 gemms for gfx11/gfx12

* reduce the input values in test_gemm

* add int8 wmma gemm instances for gfx11/gfx12

* add example gemm_wmma_int8

* fix bug in gemm_wmma_int8 test

* increase bf16 gemm test tolerance

* update the dates and clean-up commented-out instances

8aba2724

Batched GEMM Multiple D based on Universal GEMM (#1655) · 754adc70

Bartłomiej Kocot authored Nov 18, 2024



* Batched GEMM Multiple D based on Universal GEMM
Co-authored-by: Jing Zhang <jizhan@fb.com>

* CI fixes
Co-authored-by: Jing Zhang <jizhan@fb.com>

---------
Co-authored-by: Jing Zhang <jizhan@fb.com>

754adc70

13 Nov, 2024 2 commits
- fix clang format (#1662) · efd92615
  Illia Silin authored Nov 13, 2024
  
  efd92615
- Move checks for compatibility from Argument() to IsSupportedArgument() (#1653) · 73f02a10
  Taylor Ding authored Nov 13, 2024
  
  73f02a10
07 Nov, 2024 1 commit
- enable compilation for generic navi targets (#1645) · 75c5bfa3
  Illia Silin authored Nov 07, 2024
  
  75c5bfa3
05 Nov, 2024 1 commit
- Statically Cast Pointer Offset (#1631) · d0e3a70a
  darren-amd authored Nov 05, 2024
```
* explicit cast ptr offset

* formating change
```
  d0e3a70a
30 Oct, 2024 1 commit
- Remove virtual destructors from unary ops (#1610) · 9a8a5213
  Bartłomiej Kocot authored Oct 30, 2024
```
* Remove virtual destructors from unary ops

* Fixes

* Fixes

* clang format fixes
```
  9a8a5213
29 Oct, 2024 1 commit
- fix compilation errors for gfx12 with clang20 (#1606) · 922e42a0
  Illia Silin authored Oct 28, 2024
  
  922e42a0
26 Oct, 2024 2 commits

Add dynamic elementwise op (#1426) · 31bf253a

Bartłomiej Kocot authored Oct 26, 2024



* Add dynamic elementwise op
Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com>

* CI issues fix

* Custom parameter value for dynamic functions - Comments addressed

---------
Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com>
Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com>

31bf253a

add int8 gemm multiply multiply a8w8 (#1591) · 37f7afed

valarLip authored Oct 26, 2024



* add int8 gemm multiply multiply a8w8

* uncomment

* clang-format-12

* Add example_gemm_multiply_multiply_xdl_int8

* Remove shell scripts

* update preprocess number for mi308; bring back printout in ckprofiler

* format

---------
Co-authored-by: chenjun <junchen2@amd.com>
Co-authored-by: Haocong WANG <haocwang@amd.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>

37f7afed

25 Oct, 2024 1 commit

Generic threshold calculation (#1546) · 9385caa3

aledudek authored Oct 25, 2024

* Calculate generic relative threshold pool3dfwd

* Calculate absolute error threshold pool3d fwd

* Generic threshold calculation take max input for relative error pool3dfwd

* Remove max possible value for error calculation at runtime

* Remove debug print in pool3dfwd

* Pool3d fwd adjusted types in generic threshold calculation

* Generic threshold calculation take into account number of accumulations and accdatatype

* Generic threshold fix final error formula

* Generic threshold calculation - num of accs fix

* Generic threshold calculation - adjust absolute error

* Generic threshold calculation - OutDataType in absolute error

9385caa3

22 Oct, 2024 1 commit
- Explicit cast values to half (#1593) · 4d5248e2
  Jatin Chaudhary authored Oct 22, 2024
```
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
```
  4d5248e2
14 Oct, 2024 2 commits

Add custom type vector support (#1333) · 4cf70b36

Rostyslav Geyyer authored Oct 14, 2024



* Add non_native_vector_type

* Add a test

* Add non-native vector type

* Fix CTOR

* Fix non-native vector type of 1

* Fix CTORs

* Use vector_type to cover non-native implementation as well

* Update the test

* Format

* Format

* Fix copyright years

* Remove BoolVecT so far

* Add AsType test cases

* Update assert error message

* Remove redundant type

* Update naming

* Add complex half type with tests

* Add tests for vector reshaping

* Add missing alignas

* Update test/data_type/test_custom_type.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Compare custom types to built-in types

* Add default constructor test

* Add an alignment test

---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

4cf70b36

Add transpose scale amax example (#1547) · f21cda25
Bartłomiej Kocot authored Oct 14, 2024
```
* Add transpose scale amax example

* fixes

* Tune reduce instance
```
f21cda25

12 Oct, 2024 1 commit
- Implement GetWorkSpaceSize from BaseOperator. (#1564) · 29d384d0
  Adam Osewski authored Oct 12, 2024
  
  29d384d0
09 Oct, 2024 1 commit
- Fixes small memory leak from missing hipEventDestroy (#1554) · ceaed8e0
  Christopher Millette authored Oct 09, 2024
  
  ceaed8e0
07 Oct, 2024 1 commit

Fix build logic using GRU_ARCHS. (#1536) · 7d8ea5f0

Illia Silin authored Oct 07, 2024

* update build logic with GPU_ARCHS

* fix the GPU_ARCHS build for codegen

* unset GPU_TARGETS when GPU_ARCHS are set

7d8ea5f0

04 Oct, 2024 1 commit
- Fix grouped gemm check to avoid overflow (#1545) · 6b54d2fa
  Bartłomiej Kocot authored Oct 04, 2024
  
  6b54d2fa
02 Oct, 2024 1 commit

Fix compilation errors generated by forthcoming Clang changes (#1544) · aeb7c91f

macurtis-amd authored Oct 02, 2024

Without this change, the following diagnostic is generated:
  a template argument list is expected after a name prefixed by the template
  keyword [-Wmissing-template-arg-list-after-template-kw]

See C++17 spec [temp.names] p5.

aeb7c91f

25 Sep, 2024 1 commit
- Fix compilation errors with Clang20.0. (#1533) · 42e6dcea
  Illia Silin authored Sep 25, 2024
```
* fix clang20 compilation errors for gfx90a

* fix clang20 compilation errors for gfx11 targets
```
  42e6dcea
20 Sep, 2024 2 commits
- Add support for NGCHW in grouped conv fwd (#1499) · 4ba52b35
  Bartłomiej Kocot authored Sep 20, 2024
```
* Support NGCHW in grouped conv fwd

* Remove not needed variable

* Fixes
```
  4ba52b35
- Remove unsupported (fp8) type from Add memory operation. (#1521) · 0c39954d
  Adam Osewski authored Sep 20, 2024
```
The dynamic buffer doesn't have support for fp8 in `Update` operation thus fp8 is not supporting `InMemoryDataOperation::Add`
```
  0c39954d
13 Sep, 2024 1 commit

Customize filesystem in CK for legacy systems (#1509) · 81bc1496

Jun Liu authored Sep 13, 2024



* Legacy support: customized filesystem

* Update cmakefile for python alternative path

* fix build issues

* CK has no boost dependency

* More fixes to issues found on legay systems

* fix clang format issue

* Check if blob is correctly generated in cmake

* fix the python issues

* add a compiler flag for codegen when using alternative python

* use target_link_options instead of target_compile_options

---------
Co-authored-by: illsilin <Illia.Silin@amd.com>

81bc1496

12 Sep, 2024 1 commit

Pool2d max/avg kernel in the BWD version (#1494) · 448c0f56

Mateusz Ozga authored Sep 12, 2024

* Add pool2d instance BWD AVG

* Add pool2d instance BWD MAX

* Fix: avg review

* Fix review: part2

* Fix - enable test when type is compiled

* Fix review part3

448c0f56

11 Sep, 2024 2 commits

Rewrite pool2d fwd (#1462) · e8d2887c

jakpiase authored Sep 11, 2024



* added pool2d fwd

* add tests

* add reviewers changes

* Revert "Merge remote-tracking branch 'origin/develop' into jakpiase/pool2d_fwd_new"

This reverts commit 6b2ba7ff8960b0a6ddbe30d8dac53eeb55a8597e, reversing
changes made to 22c82bea0caf3e0f29399100c1bb67b8003fc042.

* Revert "add reviewers changes"

This reverts commit 22c82bea0caf3e0f29399100c1bb67b8003fc042.

* added reviewers comments

* revert some old files

* add reviewers requests

---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

e8d2887c

Added structural sparsity blockwise gemm (#1435) · 2a261afc

jakpiase authored Sep 11, 2024



* Implemented smfmac xdlops

* Added smfmac blockwise xdlops

* fixes

* add reviewers suggestions

---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

2a261afc

05 Sep, 2024 2 commits

Moficiation to fix this issue "threadwise_tensor_slice_transfer_v5r1 issue #1279" (#1492) · 83788553

M.Emin Ozturk authored Sep 04, 2024



* issue fix, one line changed for tmp

* clang

---------
Co-authored-by: Emin Ozturk <emin.ozturk@utah.edu>
Co-authored-by: Harisankar Sadasivan <135730918+hsadasiv@users.noreply.github.com>

83788553

Add gemm universal bf16 instances (#1484) · 5b10dae6

Haocong WANG authored Sep 05, 2024



* revert ckprofiler change

* temp save

* Add test and test pass

* test pass

* Fix bug inside rotating buffer when tensor is not packed

* bug fix

* clang format

---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

5b10dae6

03 Sep, 2024 1 commit
- Add support for NGCHW in grouped conv bwd wei (#1491) · 73b67f29
  Bartłomiej Kocot authored Sep 03, 2024
```
* Add support for NGCHW in grouped conv bwd wei

* Comments fixes

* navi fixes

* Update function names
```
  73b67f29
02 Sep, 2024 1 commit

Revert "Revert "Revert Revert Support access per groups and filter2x3 in... · a9b170b5

Bartłomiej Kocot authored Sep 02, 2024

Revert "Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382) (#1406) (#1415)" (#1455)" (#1490)

This reverts commit 5ff8eeeb.

a9b170b5

21 Aug, 2024 2 commits

Adding Instances and Examples for FP8-based Scaled Convolution and AMAX Reduction. (#1473) · c3515f27

Andriy Roshchenko authored Aug 21, 2024

* Enable CMakePresets build

* Verify Convolution, Scaling and ReLU algorithms.

* Add tensor element-wise scale and type cast operation.

* Reduction implemented but does not work.

* Exploration of Reduction functionality.

* Completed example for Convolution scaled with ReLu activation and AMAX reduction.

* WIP: Add required instances for convolution.

* WIP: Create client example. Implement convolution stage.

* Add elementwise instances.

* Add elementwise scale + convert example.

* Add reduction instances.

* WIP: Client example for AMAX reduction.

* WIP: Add instances for multistage reduction.

* WIP: Implementation of multistage reduction.

* Refactoring.

* Clean up.

* Add CMakePresets.json

* Guard off FP8 instances when the data type is not available.

* Add example for Scaled FP8 Convolution with AMAX reduction.

* Refactor CombConvScaleRelu instances.

* Add CombConvScale instances.

* Add client example for Scaled FP8 Convolution with AMAX reduction.

* Cleanup.

c3515f27

Set RNE fp8 conversion as a default (#1458) · e20f20ef

Rostyslav Geyyer authored Aug 21, 2024

* Set RNE fp8 conversion as a default

* Update f8 tests

* Disable failing test on gfx11

* Update bf8 tests

* Add a flag

* Fix the flag

* Raise flag for gfx10 as well

* Temp commit for tolerance testing

* Update tolerances

e20f20ef

14 Aug, 2024 1 commit

[GEMM] gemm_universal related optimization (#1453) · 3049b546

Haocong WANG authored Aug 14, 2024



* replace buffer_atomic with global_atomic

* fixed global_atomic_add

* added bf16 atomic_add

* format

* clang-format-12

* clean

* clean

* add guards

* Update gtest.cmake

* enabled splitk_gemm_multi_d

* format

* add ckProfiler

* format

* fixed naming

* format

* clean

* clean

* add guards

* fix clang format

* format

* add kbatch printout

* clean

* Add rocm6.2 related gemm optimization

* Limit bf16 atomic usage

* remove redundant RCR gemm_universal instance

* Add RRR fp8 gemm universal instance

* Bug fix

* Add GPU_TARGET guard to FP8/BF8 target

* bug fix

* update cmake

* remove all fp8/bf8 example if arch not support

* Enable fp8 RRR support in ckProfiler

* limit greedy-reverse flag to gemm_universal in ckProfiler

---------
Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: Jing Zhang <jizhan@meta.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>

3049b546

13 Aug, 2024 1 commit
- Support large: 12d tensor size for reduction kenrel (#1465) · 0606e549
  Mateusz Ozga authored Aug 13, 2024
  
  0606e549
10 Aug, 2024 1 commit
- Fix bug with n block id calculation in DeviceGroupedConvXdlCShuffle (#1457) · 4a870942
  Bartłomiej Kocot authored Aug 10, 2024
```
* Fix typo in TransformConvFwdToGemm

* Fix bug in n offset calculation
```
  4a870942