Commits · 0ad5d7f765ace544b7c4410e047e06ee7e8c0442 · gaoqiong / composable_kernel_ROCM

21 Nov, 2024 2 commits
- Clean up · 0ad5d7f7
  Rostyslav Geyyer authored Nov 21, 2024
  
  0ad5d7f7
- Add vector conversions · a8cd34d6
  Rostyslav Geyyer authored Nov 21, 2024
  
  a8cd34d6
15 Nov, 2024 1 commit
- Add vector types and tests · 37072aac
  Rostyslav Geyyer authored Nov 15, 2024
  
  37072aac
08 Nov, 2024 1 commit
- Add fp4 vectors · aa1920da
  Rostyslav Geyyer authored Nov 08, 2024
  
  aa1920da
07 Nov, 2024 1 commit
- Format · 9433306a
  Rostyslav Geyyer authored Nov 07, 2024
  
  9433306a
06 Nov, 2024 3 commits
- Add device conversions · 5f1a24a8
  Rostyslav Geyyer authored Nov 06, 2024
  
  5f1a24a8
- Add scaled conversions with tests · 1bca7134
  Rostyslav Geyyer authored Nov 06, 2024
  
  1bca7134
- Add scale <-> float conversions · 0bb6e25f
  Rostyslav Geyyer authored Nov 06, 2024
  
  0bb6e25f
05 Nov, 2024 1 commit
- Statically Cast Pointer Offset (#1631) · d0e3a70a
  darren-amd authored Nov 05, 2024
```
* explicit cast ptr offset

* formating change
```
  d0e3a70a
04 Nov, 2024 1 commit
- Add stochastic rounding tests · 4c47048f
  Rostyslav Geyyer authored Nov 04, 2024
  
  4c47048f
30 Oct, 2024 3 commits
- Fix typo · b73f83fd
  Rostyslav Geyyer authored Oct 30, 2024
  
  b73f83fd
- Update conversions · cf7e20a8
  Rostyslav Geyyer authored Oct 30, 2024
  
  cf7e20a8
- Remove virtual destructors from unary ops (#1610) · 9a8a5213
  Bartłomiej Kocot authored Oct 30, 2024
```
* Remove virtual destructors from unary ops

* Fixes

* Fixes

* clang format fixes
```
  9a8a5213
29 Oct, 2024 2 commits
- Add scale type and mxfp conversions · f90f5da6
  Rostyslav Geyyer authored Oct 29, 2024
  
  f90f5da6
- fix compilation errors for gfx12 with clang20 (#1606) · 922e42a0
  Illia Silin authored Oct 28, 2024
  
  922e42a0
26 Oct, 2024 2 commits

Add dynamic elementwise op (#1426) · 31bf253a

Bartłomiej Kocot authored Oct 26, 2024



* Add dynamic elementwise op
Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com>

* CI issues fix

* Custom parameter value for dynamic functions - Comments addressed

---------
Co-authored-by: ThruptiRajLakshmanaGowda <thruptiraj.lakshmanagowda@amd.com>
Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com>

31bf253a

add int8 gemm multiply multiply a8w8 (#1591) · 37f7afed

valarLip authored Oct 26, 2024



* add int8 gemm multiply multiply a8w8

* uncomment

* clang-format-12

* Add example_gemm_multiply_multiply_xdl_int8

* Remove shell scripts

* update preprocess number for mi308; bring back printout in ckprofiler

* format

---------
Co-authored-by: chenjun <junchen2@amd.com>
Co-authored-by: Haocong WANG <haocwang@amd.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>

37f7afed

25 Oct, 2024 1 commit

Generic threshold calculation (#1546) · 9385caa3

aledudek authored Oct 25, 2024

* Calculate generic relative threshold pool3dfwd

* Calculate absolute error threshold pool3d fwd

* Generic threshold calculation take max input for relative error pool3dfwd

* Remove max possible value for error calculation at runtime

* Remove debug print in pool3dfwd

* Pool3d fwd adjusted types in generic threshold calculation

* Generic threshold calculation take into account number of accumulations and accdatatype

* Generic threshold fix final error formula

* Generic threshold calculation - num of accs fix

* Generic threshold calculation - adjust absolute error

* Generic threshold calculation - OutDataType in absolute error

9385caa3

22 Oct, 2024 1 commit
- Explicit cast values to half (#1593) · 4d5248e2
  Jatin Chaudhary authored Oct 22, 2024
```
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
```
  4d5248e2
18 Oct, 2024 1 commit
- Add fp4 type with constants · 47f161dc
  Rostyslav Geyyer authored Oct 18, 2024
  
  47f161dc
14 Oct, 2024 3 commits

enable bf16 atomic add on gfx950 · ca15fa77
illsilin authored Oct 14, 2024

ca15fa77

Add custom type vector support (#1333) · 4cf70b36

Rostyslav Geyyer authored Oct 14, 2024



* Add non_native_vector_type

* Add a test

* Add non-native vector type

* Fix CTOR

* Fix non-native vector type of 1

* Fix CTORs

* Use vector_type to cover non-native implementation as well

* Update the test

* Format

* Format

* Fix copyright years

* Remove BoolVecT so far

* Add AsType test cases

* Update assert error message

* Remove redundant type

* Update naming

* Add complex half type with tests

* Add tests for vector reshaping

* Add missing alignas

* Update test/data_type/test_custom_type.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Compare custom types to built-in types

* Add default constructor test

* Add an alignment test

---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

4cf70b36

Add transpose scale amax example (#1547) · f21cda25
Bartłomiej Kocot authored Oct 14, 2024
```
* Add transpose scale amax example

* fixes

* Tune reduce instance
```
f21cda25

12 Oct, 2024 1 commit
- Implement GetWorkSpaceSize from BaseOperator. (#1564) · 29d384d0
  Adam Osewski authored Oct 12, 2024
  
  29d384d0
09 Oct, 2024 1 commit
- Fixes small memory leak from missing hipEventDestroy (#1554) · ceaed8e0
  Christopher Millette authored Oct 09, 2024
  
  ceaed8e0
07 Oct, 2024 1 commit

Fix build logic using GRU_ARCHS. (#1536) · 7d8ea5f0

Illia Silin authored Oct 07, 2024

* update build logic with GPU_ARCHS

* fix the GPU_ARCHS build for codegen

* unset GPU_TARGETS when GPU_ARCHS are set

7d8ea5f0

04 Oct, 2024 1 commit
- Fix grouped gemm check to avoid overflow (#1545) · 6b54d2fa
  Bartłomiej Kocot authored Oct 04, 2024
  
  6b54d2fa
02 Oct, 2024 1 commit

Fix compilation errors generated by forthcoming Clang changes (#1544) · aeb7c91f

macurtis-amd authored Oct 02, 2024

Without this change, the following diagnostic is generated:
  a template argument list is expected after a name prefixed by the template
  keyword [-Wmissing-template-arg-list-after-template-kw]

See C++17 spec [temp.names] p5.

aeb7c91f

25 Sep, 2024 1 commit
- Fix compilation errors with Clang20.0. (#1533) · 42e6dcea
  Illia Silin authored Sep 25, 2024
```
* fix clang20 compilation errors for gfx90a

* fix clang20 compilation errors for gfx11 targets
```
  42e6dcea
20 Sep, 2024 2 commits
- Add support for NGCHW in grouped conv fwd (#1499) · 4ba52b35
  Bartłomiej Kocot authored Sep 20, 2024
```
* Support NGCHW in grouped conv fwd

* Remove not needed variable

* Fixes
```
  4ba52b35
- Remove unsupported (fp8) type from Add memory operation. (#1521) · 0c39954d
  Adam Osewski authored Sep 20, 2024
```
The dynamic buffer doesn't have support for fp8 in `Update` operation thus fp8 is not supporting `InMemoryDataOperation::Add`
```
  0c39954d
13 Sep, 2024 1 commit

Customize filesystem in CK for legacy systems (#1509) · 81bc1496

Jun Liu authored Sep 13, 2024



* Legacy support: customized filesystem

* Update cmakefile for python alternative path

* fix build issues

* CK has no boost dependency

* More fixes to issues found on legay systems

* fix clang format issue

* Check if blob is correctly generated in cmake

* fix the python issues

* add a compiler flag for codegen when using alternative python

* use target_link_options instead of target_compile_options

---------
Co-authored-by: illsilin <Illia.Silin@amd.com>

81bc1496

12 Sep, 2024 1 commit

Pool2d max/avg kernel in the BWD version (#1494) · 448c0f56

Mateusz Ozga authored Sep 12, 2024

* Add pool2d instance BWD AVG

* Add pool2d instance BWD MAX

* Fix: avg review

* Fix review: part2

* Fix - enable test when type is compiled

* Fix review part3

448c0f56

11 Sep, 2024 2 commits

Rewrite pool2d fwd (#1462) · e8d2887c

jakpiase authored Sep 11, 2024



* added pool2d fwd

* add tests

* add reviewers changes

* Revert "Merge remote-tracking branch 'origin/develop' into jakpiase/pool2d_fwd_new"

This reverts commit 6b2ba7ff8960b0a6ddbe30d8dac53eeb55a8597e, reversing
changes made to 22c82bea0caf3e0f29399100c1bb67b8003fc042.

* Revert "add reviewers changes"

This reverts commit 22c82bea0caf3e0f29399100c1bb67b8003fc042.

* added reviewers comments

* revert some old files

* add reviewers requests

---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

e8d2887c

Added structural sparsity blockwise gemm (#1435) · 2a261afc

jakpiase authored Sep 11, 2024



* Implemented smfmac xdlops

* Added smfmac blockwise xdlops

* fixes

* add reviewers suggestions

---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

2a261afc

05 Sep, 2024 2 commits

Moficiation to fix this issue "threadwise_tensor_slice_transfer_v5r1 issue #1279" (#1492) · 83788553

M.Emin Ozturk authored Sep 04, 2024



* issue fix, one line changed for tmp

* clang

---------
Co-authored-by: Emin Ozturk <emin.ozturk@utah.edu>
Co-authored-by: Harisankar Sadasivan <135730918+hsadasiv@users.noreply.github.com>

83788553

Add gemm universal bf16 instances (#1484) · 5b10dae6

Haocong WANG authored Sep 05, 2024



* revert ckprofiler change

* temp save

* Add test and test pass

* test pass

* Fix bug inside rotating buffer when tensor is not packed

* bug fix

* clang format

---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

5b10dae6

03 Sep, 2024 1 commit
- Add support for NGCHW in grouped conv bwd wei (#1491) · 73b67f29
  Bartłomiej Kocot authored Sep 03, 2024
```
* Add support for NGCHW in grouped conv bwd wei

* Comments fixes

* navi fixes

* Update function names
```
  73b67f29
02 Sep, 2024 1 commit

Revert "Revert "Revert Revert Support access per groups and filter2x3 in... · a9b170b5

Bartłomiej Kocot authored Sep 02, 2024

Revert "Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382) (#1406) (#1415)" (#1455)" (#1490)

This reverts commit 5ff8eeeb.

a9b170b5

21 Aug, 2024 1 commit

Adding Instances and Examples for FP8-based Scaled Convolution and AMAX Reduction. (#1473) · c3515f27

Andriy Roshchenko authored Aug 21, 2024

* Enable CMakePresets build

* Verify Convolution, Scaling and ReLU algorithms.

* Add tensor element-wise scale and type cast operation.

* Reduction implemented but does not work.

* Exploration of Reduction functionality.

* Completed example for Convolution scaled with ReLu activation and AMAX reduction.

* WIP: Add required instances for convolution.

* WIP: Create client example. Implement convolution stage.

* Add elementwise instances.

* Add elementwise scale + convert example.

* Add reduction instances.

* WIP: Client example for AMAX reduction.

* WIP: Add instances for multistage reduction.

* WIP: Implementation of multistage reduction.

* Refactoring.

* Clean up.

* Add CMakePresets.json

* Guard off FP8 instances when the data type is not available.

* Add example for Scaled FP8 Convolution with AMAX reduction.

* Refactor CombConvScaleRelu instances.

* Add CombConvScale instances.

* Add client example for Scaled FP8 Convolution with AMAX reduction.

* Cleanup.

c3515f27