Commits · 7e4eb4b800b7bec8adb9a1a766f7aba1557e8aa2 · gaoqiong / composable_kernel_ROCM

19 Jan, 2024 2 commits

Add optimized copy to ck wrapper (#1126) · 7e4eb4b8

Bartłomiej Kocot authored Jan 19, 2024



* Add optimized copy to ck wrapper

* Example optimizations

* Fixes

* Move img2col test to client example

* Refactor example

* Fix docs

* Fixes

* Fix

* Fixes

* Fixes

* Fixes

* Fixes

* Fixes

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

7e4eb4b8

add Adam to code owners (#1136) · 38882d8a
Illia Silin authored Jan 18, 2024

38882d8a

16 Jan, 2024 2 commits

Randyh docfix (#1130) · 402a930a

randyh62 authored Jan 16, 2024



* Update LICENSE

update to 2024

* Update index.rst

change license.md to license.html

* fix syntax

---------
Co-authored-by: illsilin <Illia.Silin@amd.com>

402a930a

add code owners (#1132) · c1b5b581
Illia Silin authored Jan 16, 2024

c1b5b581

15 Jan, 2024 2 commits

Add cppcheck to CK CI. (#1125) · e6d099c8

Illia Silin authored Jan 15, 2024

* add cppcheck to the CK CI

* fix the path to CK source for cppcheck

* fix the path to CK source for cppcheck one more time

* fix the path to CK source for cppcheck third time

* change the path to ck_cppcheck.log

* install latest cppcheck from source

* fix bug in ck.hpp and use 20 threads for cppcheck

* create a switch to turn cppckeck on and off in CI

e6d099c8

Bump rocm-docs-core from 0.30.3 to 0.31.0 in /docs/sphinx (#1131) · 636a3101

dependabot[bot] authored Jan 15, 2024

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.3 to 0.31.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.3...v0.31.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

636a3101

11 Jan, 2024 1 commit

Bump sphinxcontrib-bibtex from 2.6.1 to 2.6.2 in /docs/sphinx (#1129) · 0ce41726

dependabot[bot] authored Jan 11, 2024

Bumps [sphinxcontrib-bibtex](https://github.com/mcmtroffaes/sphinxcontrib-bibtex) from 2.6.1 to 2.6.2.
- [Changelog](https://github.com/mcmtroffaes/sphinxcontrib-bibtex/blob/develop/CHANGELOG.rst)
- [Commits](https://github.com/mcmtroffaes/sphinxcontrib-bibtex/compare/2.6.1...2.6.2

)

---
updated-dependencies:
- dependency-name: sphinxcontrib-bibtex
  dependency-type: direct:production
  update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

0ce41726

09 Jan, 2024 2 commits

Add an option to change the number of warm-up cycles and iterations. (#1124) · 886d9eeb
Illia Silin authored Jan 09, 2024
```
* allow setting the number of warmup cycles and iterations for profiler

* fix the gemm_splitk and grouped_gemm examples
```
886d9eeb

SWDEV-439954 - Use hard coded filename rather than using the macro __FILE__... · e699dbd8

raramakr authored Jan 09, 2024


SWDEV-439954 - Use hard coded filename rather than using the macro __FILE__ for debug prints. (#1123)

* SWDEV-439954 - Use hard coded filename rather than using the macro __FILE__ for debug prints.

Hiptensor library is using the header files from CK. Hard coded ROCm path was getting embedded into the hiptensor library, since the header file was having the macro __FILE__. Replace the macro with filename.

* fix syntax

---------
Co-authored-by: illsilin <Illia.Silin@amd.com>

e699dbd8

05 Jan, 2024 4 commits

fix dockerfile syntax for test compilers (#1120) · 22db1e08
Illia Silin authored Jan 05, 2024

22db1e08

doc reorg and edits (#1112) · a3916381

randyh62 authored Jan 05, 2024



* doc reorg and edits

* Update wrapper.rst with changes from PR #1098

* Update docs/dockerhub.rst
Co-authored-by: Bartlomiej Wroblewski <bwroblewski10@gmail.com>

* Update docs/index.rst
Co-authored-by: Bartlomiej Wroblewski <bwroblewski10@gmail.com>

* Update docs/what-is-ck.rst
Co-authored-by: Bartlomiej Wroblewski <bwroblewski10@gmail.com>

* Update docs/what-is-ck.rst

Restored to 4 bullets, with additional text for wrapper.
Co-authored-by: Bartlomiej Wroblewski <bwroblewski10@gmail.com>

* Update docs/Contributors_Guide.rst
Co-authored-by: Lisa <lisajdelaney@gmail.com>

* Update API_Reference_Guide.rst

using sentence case for title

* updated index structure per Lisa

* separate docker hub and tutorial

---------
Co-authored-by: Bartlomiej Wroblewski <bwroblewski10@gmail.com>
Co-authored-by: Lisa <lisajdelaney@gmail.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

a3916381

Update the recommended version of ROCm in docs (#1110) · 61545bda
Bartlomiej Wroblewski authored Jan 05, 2024

61545bda

Add a docker for testing CK with rocm6.0.1 RC1. (#1119) · d8970020

Illia Silin authored Jan 05, 2024

* add docker for rocm6.0.1 rc1

* modify the path to clang for test compilers in CI

* fix the hipcc/clang path for test compilers in CI

* fix the dockerfile for older rocm versions

d8970020

04 Jan, 2024 2 commits

Add missing copyrights in elementwise_permute examples (#1118) · 11e27522
Bartłomiej Kocot authored Jan 04, 2024

11e27522

Transpose profiler fix (#1114) · aa3e2d79

arai713 authored Jan 04, 2024



* added working example for 5D input using 1D kernel

* example with 5D input tensor and 2d kernel - not working: issues with arguments

* added updated version of 3d device op - changed descriptors/dims

* added example file to check kernel

* fixed descriptor and isSupportedArgument stride problem

* added and modified kernel for 3d - updated tids/loop

* adding some more 5d example files

* fixed some issues

* changes made for testing

* working version: fixed error in stride for A, still a bit inefficient

* cleaned up formatting/comments

* updating formatting

* more formatting fixes

* fixing cmake, adding back gpu targets in cmake script

* adding client example

* added instances for client example

* fixed errors in client example

* implemented client ex with device_elementwise.hpp and device_elementwise_3d_impl.hpp

* removed extra files

* minor formatting and naming fixes

* adding test files and profiler

* fixing minor error

* minor fix

* removed unneccesary comments, renamed files

* updated instance list for client example, added different layout example

* removing instances

* fixed error in instance generation

* remove comments

* update profiler and client example tensor layouts

* fixed errors in test/profiler

* updated vector dim access to enable vector load

* updated test/profiler files

* updated example with 1d kernel

* updating profiler

* renamed files

* disabled device op for MI300

* skip  elementwise_permute_2d on gfx94x

* Update CMakeLists.txt

* fixing CMake - disabling some GPU targets

* added transpose profiler to CMake

* fixed transpose profiler errors

* fixed instances for tests/profiler

* cleaned up code in transpose profiler source code

* added some comments, updated copyright

* made function arguments const where possible

---------
Co-authored-by: Jing Zhang <jizha@amd.com>
Co-authored-by: Jing Zhang <jizhan@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

aa3e2d79

03 Jan, 2024 2 commits
- fix the cmake option syntax (#1117) · fbf31a2e
  Illia Silin authored Jan 03, 2024
  
  fbf31a2e
- Add tensor partition and generic copy for ck wrapper (#1108) · 4234b3a6
  Bartłomiej Kocot authored Jan 03, 2024
```
* Add tensor partition and generic copy for ck wrapper

* Update changelog

* Stylistic fixes

* Change shape/strides logic to descriptor transforms

* Fixes

* Fix client example

* Fix comments
```
  4234b3a6
02 Jan, 2024 3 commits
- adding -Wno-switch-default compiler flag (#1115) · b268f273
  Illia Silin authored Jan 02, 2024
  
  b268f273
- change the googletest cmake syntax for older cmake versions (#1116) · 0e07dfde
  Illia Silin authored Jan 02, 2024
  
  0e07dfde
- Revert "[SWDEV-435347] disable instances failed with mainlien compiler (#1077)" (#1101) · a35e466c
  Bartłomiej Kocot authored Jan 02, 2024
```
This reverts commit ff24b537.
```
  a35e466c
23 Dec, 2023 1 commit
- Fix results verify in test_tensor (#1109) · 20b1ae7c
  Bartłomiej Kocot authored Dec 23, 2023
  
  20b1ae7c
20 Dec, 2023 2 commits

Bump rocm-docs-core from 0.30.2 to 0.30.3 in /docs/sphinx (#1107) · 78eb3f0b

dependabot[bot] authored Dec 20, 2023

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.2 to 0.30.3.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.2...v0.30.3

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

78eb3f0b

enable compilation of INSTANCES_ONLY for Windows (#1082) · fb5bd51b

Artur Wojcik authored Dec 20, 2023



* enable compilation of INSTANCES_ONLY for Windows

* suppress ROCMChecks warnings on GoogleTests

* suppress -Wfloat-equal warning on GoogleTests

---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

fb5bd51b

19 Dec, 2023 5 commits

Remove index tensor in avgpool (#1093) · b305a29e

rocking authored Dec 19, 2023



* Remove index tensor

* fix syntax

---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>

b305a29e

Bump rocm-docs-core from 0.30.1 to 0.30.2 in /docs/sphinx (#1104) · a167e3c7

dependabot[bot] authored Dec 19, 2023

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.1 to 0.30.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.1...v0.30.2

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

a167e3c7

ROCm 6.0 replaces all __HIP_PLATFORM_HCC__ with __HIP_PLATFORM_AMD__ (#1106) · 3ab1838f

Jun Liu authored Dec 19, 2023

* ROCm 6.0 replaces all __HIP_PLATFORM_HCC__ with __HIP_PLATFORM_AMD__

* make it backward compatible

* Update .clang-tidy

* Update ClangTidy.cmake

3ab1838f

add -Wno-pass-failed compiler flag (#1105) · 3726a173
Illia Silin authored Dec 19, 2023

3726a173

Hip tensor permute unit test (#1068) · 12a8883c

arai713 authored Dec 18, 2023

* adding files for F32 example

* adding functioning implementation with scalar multiplication and unary operator support

* added fp 16 type check in unary square

* updating scalar multiplication as an operator

* functioning version with scalar operator

* changing strides for col major

* updated column major implementation

* working column major implementation

* cleaned up comments, rearranged/renamed files

* small edits to 3d transpose profiler

* adding test/profiler/instance files for hipTensor permute unit test

* added more test instances

* cleaned up errors, randomized input tensor, added more instances

* turned off time printouts

* removed conflicting transpose profiler

* rearranged some files

12a8883c

18 Dec, 2023 2 commits

layernorm and groupnorm backward data (#1083) · a69aa2a1

rocking authored Dec 19, 2023

* rename folder

* Add type string

* Remove typo

* Add deviceOp to backward x

* Add comment to describe the behavior of backward normalization

* Add kernel function, prepare to implement

* implement generic kernel

* Check vector size

* Add sweep once pipeline for small reduce size

* Fix bug of KRaw_ error

* Fix bug of dx stride

* sanity check for mean and rstd

* backward x for groupnorm

* Add bwd x instance

* add layernorm 2d bwd gamma beta instances

* Change save mean var type from f32 to f16 in f16 mode

* Change the example to f16

* Add groupnorm bwd gamma beta instance

* Add groupnorm bwd x instance

* Fix naming

* Add layernorm bwd x ckprofiler

* Add groupnorm bwd x profiler

* clang format

* Rename bwd x to bwd data

* Fix bug of verification in profiler

* Add test of layernorm and groupnorm bwd data

* Add missing cmake

* Add layernorm2d bwd data

* rename fwd example

* Add groupnorm client example

* Fix typo. replace Invarient with Invariant

* Add checking before running the best instance

a69aa2a1

Optimize fp16 direct load GEMM instances (#1086) · ad0a8e4c

Bartlomiej Wroblewski authored Dec 18, 2023

This PR optimizes fp16 instances of direct load GEMM kernel introduced in #999 and #1052.

Measured the performance of new instances on CDNA2 GPU and compared it against the performance of the best non-direct-load GEMM instances. Used 76 different GEMM problems.
On average, this change improves the performance of the tested problems by 47%. For cases known as latency-bound, the speedup is around 126%.

ad0a8e4c

16 Dec, 2023 1 commit

Upgrade the default compiler to ROCm6.0 release. (#1103) · dcedf363

Illia Silin authored Dec 16, 2023

* upgrade to rocm6.0 compiler

* move rocm6.0 from private to public repo

* switch to testing hipTensor mainline in CI

dcedf363

15 Dec, 2023 3 commits

Adding Issue Template (#1094) · 3246d1f6

abhimeda authored Dec 15, 2023



* Add files via upload

* fixed extra space typo

* add mi300 GPU architectures and rocm versions 5.6.1 and 6.0.0

---------
Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

3246d1f6

Add tensor structure to wrapper (#1098) · 07092d68
Bartłomiej Kocot authored Dec 15, 2023
```
* Add tensor structure to wrapper

* update changelog

* Fix names

* Comment fixes
```
07092d68

cmake: Add CK_PARALLEL_LINK_JOBS and CK_PARALLEL_COMPILE_JOBS options (#1063) · efaf3106

trixirt authored Dec 14, 2023



Copied from the llvm-project LLVM_PARALLEL_*_JOBS

Concurrent linking can break the build as well as having too many
compile jobs for the avaiable memory.  These options allow the user
to fine tune the build to fit within their machines memory
constraints.

An example use on linux is
COMPILE_JOBS=`cat /proc/cpuinfo | grep -m 1 'cpu cores' | awk '{ print $4 }'`
if [ ${COMPILE_JOBS}x = x ]; then
  COMPILE_JOBS=1
fi
BUILD_MEM=4
MEM_KB=0
MEM_KB=`cat /proc/meminfo | grep MemTotal | awk '{ print $2 }'`
MEM_MB=`eval "expr ${MEM_KB} / 1024"`
MEM_GB=`eval "expr ${MEM_MB} / 1024"`
COMPILE_JOBS_MEM=`eval "expr 1 + ${MEM_GB} / ${BUILD_MEM}"`
if [ "$COMPILE_JOBS_MEM" -lt "$COMPILE_JOBS" ]; then
  COMPILE_JOBS=$COMPILE_JOBS_MEM
fi
LINK_MEM=32
LINK_JOBS=`eval "expr 1 + ${MEM_GB} / ${LINK_MEM}"`

cmake -G Ninja -DCK_PARALLEL_LINK_JOBS=$LINK_JOBS
               -DCK_PARALLEL_COMPILE_JOBS=$COMPILE_JOBS
Signed-off-by: Tom Rix <trix@redhat.com>

efaf3106

14 Dec, 2023 1 commit

fix typo (#1067) · 281f8369

Lisa authored Dec 14, 2023


Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

281f8369

13 Dec, 2023 2 commits
- [Doc][Werror] Fix security alerts and sync with MIOpen (#1085) · 3a3b98ef
  Jun Liu authored Dec 13, 2023
```
* fix Werror unused-parameter

* sync doc requirements

* fix blank space format

* fix dependency issue
```
  3a3b98ef
- Fix the bugs (#1099) · 6891e4d1
  Rostyslav Geyyer authored Dec 13, 2023
  
  6891e4d1
12 Dec, 2023 1 commit

disabling some fp8 gemm instances to reduce build time (#1084) · c004e0d9

Illia Silin authored Dec 11, 2023

* disabling some fp8 gemm instances to reduce build time

* disable fp8 gemm instances to reduce build time

* remove the unused variable

* build fp8 gemm default and padded instances separately

* fix include pathsc

c004e0d9

11 Dec, 2023 1 commit

Fix IsSupported check in the contraction op (#1066) · 89ee4746

Bartlomiej Wroblewski authored Dec 11, 2023

Current implementation of IsSupported method in contraction ops does not cover a lot of possible cases in which ScalarPerVector cannot really be used to read A, B or D, or write E.

This PR extends both the regular and multiABD contraction ops with improved checks and also adds new instances with smaller values of ScalarPerVector to support instances that are not supported by other instances.

89ee4746

08 Dec, 2023 1 commit
- fix clang format (#1095) · f199035b
  Illia Silin authored Dec 08, 2023
  
  f199035b