Commits · compiler_wrong_result · gaoqiong / composable_kernel

11 Jul, 2022 1 commit
- Reproduce compiler wrong result · 90f368bb
  rocking authored Jul 11, 2022
  
  90f368bb
08 Jul, 2022 3 commits
- Merge branch 'standalone-layernorm' of... · e48ddb6a
  rocking authored Jul 08, 2022
```
Merge branch 'standalone-layernorm' of https://github.com/ROCmSoftwarePlatform/composable_kernel into standalone-layernorm
```
  e48ddb6a
- Add test for fp32 and fp16 · a7359664
  rocking authored Jul 08, 2022
  
  a7359664
- Gamma and beta can share the VGPR. · 39313533
  rocking authored Jul 08, 2022
  
  39313533
07 Jul, 2022 2 commits

Merge remote-tracking branch 'origin/develop' into standalone-layernorm · 6f393c3d
Chao Liu authored Jul 07, 2022

6f393c3d

N-D Tensor Contraction example, instance, and client example (#270) · 4fe9c393

Chao Liu authored Jul 07, 2022

* adding contraction

* add contraction example

* update examle

* update example

* format

* update readme

* clean header

* clean header

* contraction with multiple D

* rename

* fix naming issue; add instances for contraction+bilinear

* change assumed virtual layout of contraction; add client example

* update example

* update

* contraction+scale

* use type_convert

* rename

4fe9c393

06 Jul, 2022 7 commits
- Merge commit '334361cb ' into standalone-layernorm · af055f4b
  rocking authored Jul 06, 2022
  
  af055f4b
- Rename XSrcVectorDim to XYSrcVectorDim. Because we use same parameter in deviceOp · fe9b4d9a
  rocking authored Jul 06, 2022
  
  fe9b4d9a
- Batched Gemm with C Permute (#305) · 334361cb
  zjing14 authored Jul 06, 2022
```
* init commit

* add c_permute

* add mnk padding

* fixed comments

* Fixed comments
Co-authored-by: Chao Liu <chao.liu2@amd.com>
```
  334361cb
- Support different YVectorDim in GridwiseLayernorm · cfce1f11
  rocking authored Jul 06, 2022
  
  cfce1f11
- Extract layernorm host code · 53607c7d
  rocking authored Jul 06, 2022
  
  53607c7d
- Add accElementwiseOp · 3e38e358
  rocking authored Jul 06, 2022
  
  3e38e358
- Use 1d descriptor for gamma and beta · 6ed9ab3a
  rocking authored Jul 05, 2022
  
  6ed9ab3a
05 Jul, 2022 2 commits
- We only use one block in K dimension. · 3bb0cbe7
  rocking authored Jul 05, 2022
```
Hence, we can simplify the indexing of global R/W.
```
  3bb0cbe7
- [What] Get length from upper length. · 6d3ad8cd
  rocking authored Jul 05, 2022
```
[Why] if we get length directly, we may get length after padding.
```
  6d3ad8cd
04 Jul, 2022 2 commits
- Support sweep once mode if we can put k dimension data inside one block · 0a2a25e3
  rocking authored Jul 04, 2022
  
  0a2a25e3
- Sync the naming · eb6405ee
  rocking authored Jul 04, 2022
  
  eb6405ee
02 Jul, 2022 3 commits
- clean · e9a41755
  Chao Liu authored Jul 02, 2022
  
  e9a41755
- Merge remote-tracking branch 'origin/develop' into standalone-layernorm · 28d60825
  Chao Liu authored Jul 02, 2022
  
  28d60825
- Gemm+Bilinear (#316) · 9e4429f9
  Chao Liu authored Jul 02, 2022
```
* refactor

* update example

* update example

* gemm bilinear

* clean

* update
```
  9e4429f9
01 Jul, 2022 9 commits

Merge branch 'develop' into standalone-layernorm · c6891e12
rocking authored Jul 01, 2022

c6891e12
1. Separate gamma aand beta from affine · f591ad27
rocking authored Jul 01, 2022
```
2. Check if argument is valid
```
f591ad27
verify gpu kernel with host code · 8e2d0ae7
rocking authored Jul 01, 2022

8e2d0ae7
Implement layernorm kernel and deviceOp · 8b7aeb35
rocking authored Jun 29, 2022

8b7aeb35

modified grouped gemm addressing method (#307) · 8e374781

guangzlu authored Jul 01, 2022



* modified grouped gemm addressing method

* modified addressing method in device_grouped_gemm_xdl.hpp
Co-authored-by: root <root@dc-smc-13.amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

8e374781

Single-kernel GEMM + layernorm (#263) · 63fd5da6

Anthony Chang authored Jul 01, 2022



* dump lds content in appropriate precision type

* add squared add reduction op; allows sq sum

* initial stub from regular gemm impl

* layernorm example code & host verification

* initial layernorm implementation

* tidy up

* make C0 precision type consistent with C

* clang-tidy and additional comments

* tighten up example code

* account for extra flops/bytes from normalization

* clang-format

* c0 bias/beta/gamma now have its own precision type

* AccElemOp for gemm outputs prior to feeding to layernorm

* update workgroup mapping

* rename kernel template param to reflect its dual use

* use LDS mem pool for reduction workspace

* change cshuffle precision type to f16; clean up

* clang-format

* correct naming

* explicit cast

* fully implemented gemm + bias + activation + add + norm

* activation in correct order

* reflect reduction API's recent change

* amend

* clean up; add comment

* keep up with recent changes in reduction API

* format

* resolve merge conflicts
Co-authored-by: Chao Liu <chao.liu2@amd.com>

63fd5da6

add batch_stride into batched gemm (#314) · 1c8126a4
zjing14 authored Jul 01, 2022
```
* add batch_stride

* fixed test
Co-authored-by: Chao Liu <chao.liu2@amd.com>
```
1c8126a4

Improve external interface for GEMM and GEMM+add+add+fastgelu (#311) · 0dcb3496

Chao Liu authored Jun 30, 2022

* interface for GEMM and GEMM+add+add+fastgelu

* rename namespace

* instance factory

* fix build

* fix build; add GEMM client example

* clean

0dcb3496

Gemm + bias + c_permute (#312) · fa9a0a5c
zjing14 authored Jun 30, 2022
```
* init commit

* add desc

* finished c permute

* fixed vector lens
```
fa9a0a5c

30 Jun, 2022 3 commits

Grouped Gemm ckProfiler hotfix (#313) · ab6c82c9
zjing14 authored Jun 30, 2022
```
* add setWorkspace in profiler

* fix
```
ab6c82c9

Standalone sweep once softmax kernel w/ ckProfiler (#295) · 93c99f3d

Anthony Chang authored Jul 01, 2022

* use 'sweep once' softmax kernel where applicable

* threadwise copy's dst buffer can specify invalid element value

* add int8 in/out float compute softmax support

give a bit of leeway for int absolute tolerance as there's a single data point of all test cases showing off-by-1 error

* format

* softmax inherits DeviceNormalization

* softmax profiler stub

* tighten up reference softmax interface

* example prints tensor dimension

* add fp32 to softmax profiler

* rename header

* hook with ckProfiler

* format

* resolve merge conflict

* resolve merge conflicts

* update normalization profiler help string

* resolve conflict

* typo

* remove residual

* softmax profiler: address feedback

* test for mixed precision input/output

* fully qualify ck::math::isnan

* add comment for device normalization interface

* revise wording

* constness for alpha/beta scaler pointer

93c99f3d

Remove incorrect old packaging statement (#308) · eccf8773
Liam Wrubleski authored Jun 30, 2022

eccf8773

27 Jun, 2022 2 commits

external api for gemm + layernorm (#285) · 12235112

rocking5566 authored Jun 28, 2022

* Extract base class for elementwise

* Refactor interface of DeviceGemmReduce. Do not use tuple in interface

* [What] Rename d into reduce in gemm + reduction related code
[Why] Prepare to add d term for add

* Unify base class of gemm + reduce and gemm + bias + add + reduce

* 1. Rename gemm_bias_add_reduce for external api
 2. Refine cmake

* Add normalize device operation

* [What] Reorder the argument
[Why] Because d0 is also the input of c.

* Add type string

* Add example of gemm_bias_add_layernorm  via external api

* Refactor example code

* clang-format

* Fix compile error

* clang-format

* Add external api for gemm_add_add_layernorm and normalize

* Add client example

* clang-format

12235112

External Interface (#304) · aebd211c

Chao Liu authored Jun 26, 2022

* add client example

* clean

* clean

* reorg

* clean up profiler

* reorg

* clea

* fix profiler

* function for getinstances

* update client example

* update client example

* update client example

* update

* update example

* update Jenkins file

* update cmake

* update Jenkins

aebd211c

25 Jun, 2022 3 commits

Switch to standard ROCm packaging (#301) · b653c5eb

Liam Wrubleski authored Jun 25, 2022



* Switch to standard ROCm packaging

* Revert .gitignore changes

* install new rocm-cmake version

* update readme
Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

b653c5eb

add license in file (#303) · d3051d75
Chao Liu authored Jun 24, 2022

d3051d75

Absolute include path (#281) · d1db6a0c

Chao Liu authored Jun 24, 2022

* ad gelu and fast_gelu

* added GeLU and fast GeLU

* clean up

* add gemm+fastgelu example

* add gemm+gelu instances

* update profiler

* clean up

* clean up

* adding gemm+bias+activation

* clean

* adding bias

* clean

* adding gemm multiple d

* debugging

* add gemm bias add fastgelu

* rename, clean

* refactoring; add readme

* refactor

* refactor

* refactor

* refactor

* refactor

* refactor

* fix

* fix

* update example

* update example

* rename

* update example

* add ckProfiler

* clean

* clean

* clean

* clean

* add client app example

* update readme

* delete obselete files

* remove old client app

* delete old file

* cleaning

* clean

* remove half

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path for all examples

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* revert client app example

* clean build

* fix build

* temporary disable client test on Jenkins

* clean

* clean

* clean

d1db6a0c

23 Jun, 2022 2 commits

update license (#297) · a49115b9

Chao Liu authored Jun 23, 2022

* update license

* update license

* update license

* update license

a49115b9

Testing all fwd convolution specializations. (#259) · a2edd7d8

Adam Osewski authored Jun 23, 2022



* UniforFill with integer values.

* Log tested instance type string.

* Add UT for all convolution specializations.

* debugging conv

* Fix dangling reference bug.

* Small refinements.

* Fix call to error checking function.

* Small refinements to tests.

* Configure error tolerance
* Change problem size.
* Remove OddC case from types that do not support it.

* Add helper traits for AccumulatorDataType.

* Print first 5 errs in check_err for integral types.

* Rename FillUniform to FillUniformDistribution

* Refactor

* Do not use typed tests.
* Instead use plain fixture class with templatized member functions.
* Initialize tensors with integer values.

* Refine test instances.

* Properly set accumulator data type.
* Add another "big" instance.

* Refactor convolution tests.

* Revert "debugging conv"

This reverts commit b109516455631ff8fd6dce99cf7c14bf8e323ebb.

* Add pragma once + format + small refinement.

* Fix some unwanted changes.

* Clang-format

* Fix profile_convnd to use renamed tensor initializer.

* Add instances for ConvFWDND kernel case 2D

* Helpers to get ConvNDFwd 2D instances.

* Refactoring.

* Remove "small block" instance as it was generating compiler errors.
* Remove default template parameters values.

* Refine and fix test.

* Fix problem with default template parameter types.
* Adjust error thresholds for floating point values test.
* Use integer values initialization for instances test.
* Add tests for ConvNDFwd 2D case.

* Remove AccumulatorDataType type trait.

* Update unit-tests.

* Remove operator<< overload.

* Unlock conv1d/3d nd fwd instances.

* Enable skipping calculating reference using flag.

* Fix number of channels for first ResNet50 layer.

* Clang-format.
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

a2edd7d8

21 Jun, 2022 1 commit
- fix Issue 291 (#294) · 4634b120
  Shaojie WANG authored Jun 22, 2022
```
* rename for typeconvert functor

* refine code
```
  4634b120