Commits · 976815e539cf6e67bd742e2521aca7ddd60f6024 · gaoqiong / composable_kernel

28 Apr, 2022 1 commit
- Prevent compile error when user pass rvalue, eg {3, 4} · 976815e5
  rocking authored Apr 28, 2022
  
  976815e5
26 Apr, 2022 1 commit
- Move threadPerBlock to argument · f919809d
  rocking authored Apr 26, 2022
  
  f919809d
25 Apr, 2022 1 commit

rocking authored Apr 25, 2022

2. Use DeviceGemm_Xdl_CShuffle instead of deprecated DeviceGemmXdl_C_Shuffle

a41f5481

22 Apr, 2022 1 commit
- Fix the meaning of broadcast dim parameter · 680cfaa7
  rocking authored Apr 22, 2022
  
  680cfaa7
21 Apr, 2022 4 commits
- Rewrite the elementwise operation. · 5d36f7a2
  rocking authored Apr 21, 2022
```
Let memory coalesce between block
```
  5d36f7a2
- Merge remote-tracking branch 'origin/develop' into gemm_softmax · 88d621ac
  rocking authored Apr 21, 2022
  
  88d621ac
- removed unused lds loads (#196) · 860e291c
  zjing14 authored Apr 20, 2022
  
  860e291c
- Use ck::half_t for Host Reduction (#195) · c1ef7319
  Qianfeng authored Apr 21, 2022
```
* Add math functions for host

* Change to host reduction to use ck::math:

* Remove the using of half_float::half and half.hpp from reduction example/profiler/ctest
```
  c1ef7319
20 Apr, 2022 5 commits
- Fix the padding · d7112d37
  rocking authored Apr 20, 2022
  
  d7112d37
- Rename elementwise p[ to binary elementwise · 0e6bf342
  rocking authored Apr 20, 2022
  
  0e6bf342
- Add padding · 5fa209af
  rocking authored Apr 20, 2022
  
  5fa209af
- [What] Add ComputeDataType to the eltwise kernel · 0f421d6f
  rocking authored Apr 20, 2022
```
[Why] Similar to acc datatype, it increase precision
```
  0f421d6f
- [What] Use F32 as the acc of reduce sum · cf326690
  rocking authored Apr 20, 2022
```
[Why] Prevent loss of precision
```
  cf326690
18 Apr, 2022 3 commits
- Merge remote-tracking branch 'origin/develop' into gemm_softmax · c16f789d
  rocking authored Apr 18, 2022
  
  c16f789d
- [What] Sync input of each host kernel and device kernel · 21802fda
  rocking authored Apr 18, 2022
```
[Why] Prevent error propogation
```
  21802fda
- [What] Use half_float::half instead of ck::half_t for host reduction · e83b22e0
  rocking authored Apr 18, 2022
```
[Why]  std::numeric_limits<_Float16>::lowest() will return zero
```
  e83b22e0
15 Apr, 2022 2 commits

Compile CK for all targets (#188) · 4221505d

Illia Silin authored Apr 15, 2022



* compile ck for all targets

* update the target criteria

* change the target condition

* fixed some typos

* fixed missed file

* revert changes in README

* revert device_conv3d_fwd_xdl_...

* update device_conv3d_fwd_xdl_...

* update device_batched_gemm_reduce...

* test the unused arguments fix

* test the warning suppression

* try suppress warnings in device_batched_gemm_reduce_xdl...

* fix the last warnings

* replace UNUSED with std::ignore

* fix a typo

* replaced std::ignore with ignore

* add igonre header to common_header

* refactor atomicAdd
Co-authored-by: Chao Liu <chao.liu2@amd.com>

4221505d

Add verication of softmax · fe659502
rocking authored Apr 15, 2022

fe659502

14 Apr, 2022 1 commit
- Rewrite the gridwise_elementwise_ · dba65b1c
  rocking authored Apr 14, 2022
```
2d as 1d version
```
  dba65b1c
13 Apr, 2022 5 commits
- Add broadcast div, the final step of softmax · 6a781e51
  rocking authored Apr 13, 2022
  
  6a781e51
- Add reduce sum for denominator of softmax · b05a594e
  rocking authored Apr 13, 2022
  
  b05a594e
- [What] Refine naming · 30348daa
  rocking authored Apr 13, 2022
```
[Why] Prepare to add reduceSum
```
  30348daa
- Add exponential · f2540aa5
  rocking authored Apr 13, 2022
  
  f2540aa5
- Add global write · c8b4ac22
  rocking authored Apr 13, 2022
  
  c8b4ac22
12 Apr, 2022 1 commit
- A kernel of elementwise_2d (except global store) · a760a732
  rocking authored Apr 12, 2022
  
  a760a732
11 Apr, 2022 2 commits
- Merge remote-tracking branch 'origin/develop' into gemm_softmax · cb1c4731
  rocking authored Apr 11, 2022
  
  cb1c4731
- Add gridwise_elementwise_2d api · e3a09b57
  rocking authored Apr 11, 2022
  
  e3a09b57
10 Apr, 2022 3 commits
- Fix compile error · 6818b58c
  rocking authored Apr 10, 2022
  
  6818b58c
- Merge branch 'develop' into gemm_softmax · 0277c89e
  rocking authored Apr 10, 2022
  
  0277c89e
- Add device op for elementwise 2d · 3e811ccf
  rocking authored Apr 10, 2022
  
  3e811ccf
07 Apr, 2022 1 commit
- Fix typo in batched gemm profiler (#176) · ac0d8066
  Jianfeng Yan authored Apr 07, 2022
```
* forgot passing BatchedCount in some profiler_batched_gemm

* delete default BatchCount
```
  ac0d8066
06 Apr, 2022 1 commit
- Refine the comment · cbbc7e52
  rocking authored Apr 06, 2022
  
  cbbc7e52
05 Apr, 2022 4 commits

Common forward convolution utility refactor. (#141) · abf4bdb9

Adam Osewski authored Apr 05, 2022



* Convolution ND

* Code unification across dimensions for generating tensor descriptors.
* Example
* Instances

* Move convnd f32 instance file to comply with repo structure.

* Conv 1D tensor layouts.

* Formatting and use ReferenceConv

* Reference ConvFwd supporting 1D and 2D convolution.

* Debug printing TensorLayout name.

* Conv fwd 1D instance f32

* Refactor conv ND example.

Needed to support various conv dimensio.

Needed to support various conv dimensions

* Rename conv nd example director to prevent conflicts.

* Refactor some common utility to single file.

Plus some tests.

* Refactor GetHostTensorDescriptor + UT.

* Add 1D test case.

* Test reference convolution 1d/2d

* Remove some leftovers.

* Fix convolution example error for 1D

* Refactor test check errors utility function.

* Test Conv2D Fwd XDL

* More UT for 1D case.

* Parameterize input & weight initializers.

* Rename example to prevent conflicts.

* Split convnd instance into separate files for 1d/2d

* Address review comments.

* Fix data type for flops/gbytes calculations.

* Assign example number 11.

* 3D cases for convolution utility functions.

* 3D reference convolution.

* Add support for 3D convolution.

* Check for inputs bigger than  2GB.

* Formatting

* Support for bf16/f16/f32/i8 - conv instances + UT.

* Use check_err from test_util.hpp.

* Split convnd test into separate files for each dim.

* Fix data generation and use proper instances.

* Formatting

* Skip tensor initialization if not necessary.

* Fix CMakefiles.

* Remove redundant conv2d_fwd test.

* Lower problem size for conv3D UT.

* 3D case for convnd example.

* Remove leftovers after merge.

* Add Conv Specialization string to GetTypeString

* Skip instance causing numerical errors.

* Small fixes.

* Remove redundant includes.

* Fix namespace name error.

* Script for automatic testing and logging convolution fwd UTs

* Comment out numactl cmd.

* Refine weights initalization and relax rtol for fp16

* Move test_util.hpp to check_err.hpp

* Refine weights initalization and relax rtol for fp16

* Refactor common part of test conv utils.

* Move utility function to single common place.

* Add additional common functions to utility.

* Refactor convnd_fwd_xdl examples.

* Remove redundant files.
* Unify structure.

* Add constructor to ConvParams.

* And add input parameters validation.

* Modify conv examples to use single utility file.

* Remove check_error from host_tensor.hpp

* Get rid of check_indices function.

* Remove bf16_to_f32 function overload for scalars.

* Fix namespace.

* Add half_float::half for check_err.

* Fix conv params size in UT.

* Fix weights initialization for int8.

* Fix weights initialization for int8.

* Add type_convert when store output in ref conv 1D.

* Get back old conv2d_fwd_xdl operation.

* Silence conv debug print.

* format

* clean

* clean

* Fix merge.

* Fix namespace for check_err

* Formatting.

* Fix merge artifacts.

* Remove deleted header.

* Fix some includes and use ck::utils::check_err.

* Remove unused check_indices restored by previous merge.

* Fix namespaces after merge.

* Fix compilation error.

* Small fixes.

* Use common functions.
* Fix filename
* Fix namespaces.

* Fix merge artifact - retrieve removed by accident fun.

* Fix ConvForwardSpecialization.

* Adhere to coding style rules.

* Fix merge artifacts.
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

abf4bdb9

Patch for bwd data comments (#174) · 6717168c
ltqin authored Apr 05, 2022
```
* change function name and way to set input zero

* change enable if
```
6717168c

NHWC Conv2d Bwd weight fp16 ckprofiler and test (#166) · 781cacd2

ltqin authored Apr 05, 2022

* change backward weight name

* start add bwd weight lib and profiler

* change tuning paramter

* change output info

* add bwd weight test

* change test info

* using conv_util

* change wgt to weight

* add }

* add fp32

781cacd2

Improve Reduction kernel api (#152) · 82c8b9f8

Qianfeng authored Apr 05, 2022

* Add ThreadwiseReduction functor as per-thread reduction api

* Using ThreadwiseReduce api and some change in using PartitionedBlockwiseReduction api to simply the kernels

* Add comments and remove useless declarations in the kernels

* Tiny updates

82c8b9f8

03 Apr, 2022 1 commit
- Part of gemm + softmax, Add gemm + reduceMax · d6e053a3
  rocking authored Apr 04, 2022
  
  d6e053a3
01 Apr, 2022 1 commit
- fix build (#171) · 64687816
  Chao Liu authored Mar 31, 2022
  
  64687816
31 Mar, 2022 2 commits

Tune & add conflict-free LDS gemm kernels (#159) · 7db48f90

Anthony Chang authored Apr 01, 2022

* retune & add conflict-free bf16/fp16 c-shuffle gemm instances

amend wrong K1 value in some fp16/bf16 kernel instances

* make gemm cshuffle's timing behavior consistent with all other functions

* clang-format

* retune & add conflict-free fp32 c-shuffle gemm instances

* retune & add conflict-free int8 c-shuffle gemm instances

* update the underlying gridwise gemm of all c-shuffle gemm kernels

* typo

7db48f90

Patch for bwd data #134 (#168) · c0e95f62

ltqin authored Apr 01, 2022

* remove switch for NDimSpatial

* change in, out and wei name

* rename reference thumb function name

* remove test

c0e95f62