Commits · 9a7fa123fdade18ffa125b8d1647d24dda6f889d · yangql / composable_kernel

17 May, 2022 2 commits
- support gcc with cpu only compile · 9a7fa123
  carlushuang authored May 17, 2022
  
  9a7fa123
- add kyxck8 · ad09ebdb
  carlushuang authored May 17, 2022
  
  ad09ebdb
16 May, 2022 3 commits
- refactor Run to use slice length as block size. Fix a bug in general input copy · d6d37ea9
  carlushuang authored May 16, 2022
  
  d6d37ea9
- refactor length/index setting in gridwise gemm · 2e414b7c
  carlushuang authored May 16, 2022
  
  2e414b7c
- Merge remote-tracking branch 'origin/develop' into cpu_avx2 · b134b7d6
  carlushuang authored May 16, 2022
  
  b134b7d6
15 May, 2022 1 commit
- add elementwise fusion support · 090ba885
  carlushuang authored May 15, 2022
  
  090ba885
13 May, 2022 1 commit

Validate examples in CI (#233) · 9f71ff48

Anthony Chang authored May 14, 2022



* validate examples in ctest runs

* format

* fix usage of check_err

* amend

* add example codes to custom target 'check'
Co-authored-by: Chao Liu <chao.liu2@amd.com>

9f71ff48

12 May, 2022 2 commits

Add host API (#220) · cec69bc3

JD authored May 12, 2022



* Add host API

* manually rebase on develop

* clean

* manually rebase on develop

* exclude tests from all target

* address review comments

* update client app name

* fix missing lib name

* clang-format update

* refactor

* refactor

* refactor

* refactor

* refactor

* fix test issue

* refactor

* refactor

* refactor

* upate cmake and readme
Co-authored-by: Chao Liu <chao.liu2@amd.com>

cec69bc3

enable convnd bwd data test (#234) · 0f912e20
ltqin authored May 12, 2022

0f912e20

11 May, 2022 1 commit

Manual control of MAC cluster for improved interwave performance (#184) · 76764d8c

Anthony Chang authored May 11, 2022

* manual control of MAC cluster for improved 2-wave performance

ensure setprio's order; ensure inner loop size >= local read size

synchronize when single mac cluster

* format

* use value field from ck::integral_constant

* roll out inter-wave loop scheduler to c-shuffle gemm variants

will gradually roll out to other applicable device ops when occasional reg spill is resolved

* additional comments

* format

* fix mismatch between inter-wave pipeline and interwave blockwise gemm

* address review feedback

* amend

76764d8c

10 May, 2022 1 commit

Post PR183 review fixes. (#224) · 712e464c

Adam Osewski authored May 10, 2022



* Suppress additional warnings for googltest.

* Rename file conv_fwd_util to conv_util.

* Update includes and ConvParams member access.

* Formatting.

* Change conv_fwd_util target to conv_util

* Fix compiler errors.

* Fix leftovers.
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

712e464c

09 May, 2022 3 commits

Resolution of issue #153: Add compiler warning on comparing int and size_t (#212) · f03a1738

myamlak authored May 09, 2022



* Turning compare warnings on

* Cleaning part I

* Cleaning part II

* Explicit static_cast to ck::type_convert

* Resolving large tensor size issue.

* format

* revert change to tensor descriptor; promote lementSpaceSize to 64bit

* use integer value for GEMM test

* Review remarks

* Review remarks + issues with (un)signed arithmetic

* Format fix

* Format

* Clang-format.

* fix 2gb limit issue
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Adam Osewski <aosewski@amd.com>

f03a1738

Update README.md (#228) · 968bd932
Wen-Heng (Jack) Chung authored May 09, 2022

968bd932

Code refactor (#175) · ec7c2e91

Chao Liu authored May 09, 2022

* format

* improving pipeline

* fix typo

* format

* adding thread group

* adding thread group

* adding thread group

* adding gemm pipeline

* tweak

* refactor

* refactor

* add missing type convert

* refactor

* refactor

* refactor

* clean

* fix build

* refactor

* format

* clean up

* use remove_cvref_t

* clean

* clean up

* clean up

* clean up

ec7c2e91

08 May, 2022 1 commit

Add Benchmark test into CI (#226) · a3c910ac

Illia Silin authored May 08, 2022



* add performance test to jenkins pipeline

* fix typo

* fix the syntax in conv_fwd_util.cpp

* fix the error message syntax spacing

* fix the error message syntax spacing again

* run profile_gemm and archive results

* fix typo

* try to figure out the paths

* try to figure out the paths one more time

* skip the copying step

* build ckProfiler release only once

* change directory using dir

* fix dir syntax

* change the gemm parameters

* do not pipe script output to file

* try running ckProfiler directly

* fix typo

* use set +e

* run profile_gemm.sh || true

* run multiple gemms and parse results

* fix typo in jenkinsfile

* fix syntax

* add new gemm sizes, update scripts

* put all jenkins steps in original order
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>

a3c910ac

01 May, 2022 1 commit
- remove useless comment, add several new config for multi thread · 8ce9fe57
  carlushuang authored May 01, 2022
  
  8ce9fe57
30 Apr, 2022 3 commits

Introduce GoogleTest framework. (#204) · 8eca05a6

Adam Osewski authored Apr 30, 2022



* Use googletest for tests. Add conv2d_fwd UT.

* Add conv1D/3D to gtest UT.

* Fix: not duplicate test with CTest.

* Convert more tests to googltests.

* Fix: GIT_SHALLOW is not allowed for git commit hash.

* Clang-format

* use integer value for GEMM test
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>

8eca05a6

use integer value for GEMM test (#219) · 8a2c69ee
Chao Liu authored Apr 30, 2022

8a2c69ee
support multi-thread · b8ba0239
carlushuang authored Apr 30, 2022

b8ba0239

29 Apr, 2022 3 commits

Update to gemm_reduce and batched_gemm_reduce (#213) · c77ae65d

Qianfeng authored Apr 30, 2022

* [Experimental] Change to gemm+reduce and batched-gemm+reduce

* Use threadwise-reduce function to improve the gridwise_gemm_reduce_xdl_cshuffle kernel

* Tiny fix in device_batched_gemm_xdl.hpp

* clang-format library/src/utility/conv_fwd_util.cpp

c77ae65d

Add gfx90a CI stage for tests (#208) · 97d8c504
JD authored Apr 29, 2022
```
* Add gfx90a CI stage

* upgrade to ROCm 5.1 and fix formatting
```
97d8c504
Hotfix for gemm test (#214) · 95e93430
Anthony Chang authored Apr 29, 2022
```
* pass by ref to avoid throwing away initialization results

* EOL CRLF -> LF
```
95e93430

28 Apr, 2022 1 commit
- add more tile, fix a bug in 6x16 kernel · e06b9871
  carlushuang authored Apr 28, 2022
  
  e06b9871
27 Apr, 2022 1 commit
- fix a bug in general index calculation · 5771a040
  carlushuang authored Apr 27, 2022
  
  5771a040
26 Apr, 2022 1 commit
- Merge remote-tracking branch 'origin/develop' into cpu_avx2 · 5e6cca6f
  carlushuang authored Apr 26, 2022
  
  5e6cca6f
25 Apr, 2022 1 commit

add comments to batched_gemm (#186) · 3956085d

Jianfeng Yan authored Apr 25, 2022

* add comments to batched_gemm

* formatting

* fix a typo in batched_gemm_documentation

* fix naming

3956085d

24 Apr, 2022 1 commit
- avx2 gemm now works for single thread · afc7d431
  carlushuang authored Apr 24, 2022
  
  afc7d431
22 Apr, 2022 3 commits
- profiler: fix fp32 c-shuffle gemm tuning parameter (#194) · 7c0b1498
  Anthony Chang authored Apr 23, 2022
  
  7c0b1498
- Clang-format only modified files. (#181) · 31d869ad
  Adam Osewski authored Apr 22, 2022
  
  31d869ad
- use inline asm for 4x4 int8 transposition (#187) · 08a979f1
  Anthony Chang authored Apr 23, 2022
  
  08a979f1
21 Apr, 2022 4 commits

Convolution FWD profiler refactor. (#183) · 1a0cd5d1

Adam Osewski authored Apr 22, 2022

* Convolution ND

* Code unification across dimensions for generating tensor descriptors.
* Example
* Instances

* Move convnd f32 instance file to comply with repo structure.

* Conv 1D tensor layouts.

* Formatting and use ReferenceConv

* Reference ConvFwd supporting 1D and 2D convolution.

* Debug printing TensorLayout name.

* Conv fwd 1D instance f32

* Refactor conv ND example.

Needed to support various conv dimensio.

Needed to support various conv dimensions

* Rename conv nd example director to prevent conflicts.

* Refactor some common utility to single file.

Plus some tests.

* Refactor GetHostTensorDescriptor + UT.

* Add 1D test case.

* Test reference convolution 1d/2d

* Remove some leftovers.

* Fix convolution example error for 1D

* Refactor test check errors utility function.

* Test Conv2D Fwd XDL

* More UT for 1D case.

* Parameterize input & weight initializers.

* Rename example to prevent ...

1a0cd5d1

Fix `clang-format` (#189) · 7353ec0c
JD authored Apr 21, 2022
```
* Fix clang-format filepath

* update docker and fix format
```
7353ec0c
removed unused lds loads (#196) · 860e291c
zjing14 authored Apr 20, 2022

860e291c

Use ck::half_t for Host Reduction (#195) · c1ef7319

Qianfeng authored Apr 21, 2022

* Add math functions for host

* Change to host reduction to use ck::math:

* Remove the using of half_float::half and half.hpp from reduction example/profiler/ctest

c1ef7319

15 Apr, 2022 1 commit

Compile CK for all targets (#188) · 4221505d

Illia Silin authored Apr 15, 2022



* compile ck for all targets

* update the target criteria

* change the target condition

* fixed some typos

* fixed missed file

* revert changes in README

* revert device_conv3d_fwd_xdl_...

* update device_conv3d_fwd_xdl_...

* update device_batched_gemm_reduce...

* test the unused arguments fix

* test the warning suppression

* try suppress warnings in device_batched_gemm_reduce_xdl...

* fix the last warnings

* replace UNUSED with std::ignore

* fix a typo

* replaced std::ignore with ignore

* add igonre header to common_header

* refactor atomicAdd
Co-authored-by: Chao Liu <chao.liu2@amd.com>

4221505d

14 Apr, 2022 3 commits
- fix compile error after merge develop · 07af8343
  carlushuang authored Apr 14, 2022
  
  07af8343
- Merge remote-tracking branch 'origin/develop' into cpu_avx2 · 07a673c6
  carlushuang authored Apr 14, 2022
  
  07a673c6
- add test threadwise transfer. currently static_ford in threadwise transfer can... · c0f698d5
  carlushuang authored Apr 14, 2022
```
add test threadwise transfer. currently static_ford in threadwise transfer can not support large MC*KC tile size
```
  c0f698d5
07 Apr, 2022 1 commit
- Fix typo in batched gemm profiler (#176) · ac0d8066
  Jianfeng Yan authored Apr 07, 2022
```
* forgot passing BatchedCount in some profiler_batched_gemm

* delete default BatchCount
```
  ac0d8066
05 Apr, 2022 1 commit

Common forward convolution utility refactor. (#141) · abf4bdb9

Adam Osewski authored Apr 05, 2022

* Convolution ND

* Code unification across dimensions for generating tensor descriptors.
* Example
* Instances

* Move convnd f32 instance file to comply with repo structure.

* Conv 1D tensor layouts.

* Formatting and use ReferenceConv

* Reference ConvFwd supporting 1D and 2D convolution.

* Debug printing TensorLayout name.

* Conv fwd 1D instance f32

* Refactor conv ND example.

Needed to support various conv dimensio.

Needed to support various conv dimensions

* Rename conv nd example director to prevent conflicts.

* Refactor some common utility to single file.

Plus some tests.

* Refactor GetHostTensorDescriptor + UT.

* Add 1D test case.

* Test reference convolution 1d/2d

* Remove some leftovers.

* Fix convolution example error for 1D

* Refactor test check errors utility function.

* Test Conv2D Fwd XDL

* More UT for 1D case.

* Parameterize input & weight initializers.

* Rename example t...

abf4bdb9