Commits · 4379d8d1f2350053082b24ae2a1f00cf48851133 · gaoqiong / composable_kernel

23 May, 2022 5 commits
- Merge remote-tracking branch 'origin/fix_build_0521' into myamlak/cgemm · 4379d8d1
  myamlak authored May 23, 2022
  
  4379d8d1
- Merge remote-tracking branch 'origin/develop' into myamlak/cgemm · 326d331c
  myamlak authored May 23, 2022
  
  326d331c
- amend · 1cfff19f
  Anthony Chang authored May 23, 2022
  
  1cfff19f
- post PR #235 merge fix · 3f8e846a
  Anthony Chang authored May 23, 2022
  
  3f8e846a
- Revert "fix build" · d00ecab5
  Anthony Chang authored May 23, 2022
```
This reverts commit d7310238.
```
  d00ecab5
21 May, 2022 1 commit
- fix build · d7310238
  Chao Liu authored May 21, 2022
  
  d7310238
20 May, 2022 10 commits

example of conv bwd weight 1d/2d/3d fp32/fp16/bf16 xdl (#244) · ac543313

Shaojie WANG authored May 21, 2022



* enable example of conv 1d/3d for bwd weight

* make bf16 kernel do not use atomic add

* using new gridwise gemm for bwd weight on convnd bwd weight
Co-authored-by: Chao Liu <chao.liu2@amd.com>

ac543313

remove options.hpp.in (#240) · 44943e0e
Chao Liu authored May 20, 2022

44943e0e

Refactor block to C tile map (#235) · a054f7d6

Anthony Chang authored May 21, 2022

* refactor block-to-ctile-map

* gridwise gemm block2ctile generic validity check

* format

* amend split-k gemm block2ctile map refactor

* add test

* format

* amend

* revert to calculating batch index in kernel instead of passing as block_id_z

* move file

* add valid ctile index check to gridwise v2r4

a054f7d6

[conv bwd-weight]Binding gemm k1 to conv n (#202) · 070619fb

Shaojie WANG authored May 21, 2022



* add some instance to develop

* avoid bank conflicts for wrw for all instance

* add small K1 test

* delete some unused instance

* binding gemm k1 to conv n

* try using half_4 to do ds_read

* reset buffer load oob and ds memcpy to default option

* remove useless instances

* remove redandunt space

* remove printf code

* clang-format-10 change

* use fastest config

* fix clang format for the other files

* remove gemmk0 pad for output

* add gemmk padding macro

* add bank length computation

* add template to distinguish the instance that need lds padding for wrw

* use rocm5.1 as docker

* use integer value for GEMM test

* add Right padding macro

* add 2 test asm code

* using 256x256x32 tile size

* 1. move dedicated transform into gridwisegemm's head file. 2. make lds tensor params a struct templete. 3. remove useless code

* using small vec

* 256*128 kernel size for example

* remove asm files

* use a new gridwise gemm header for bwd-weight

* revert gridwise gemm v2r4r2

* change foramt

* reset gridwise gemm v2r4r2

* remove unused code

* revert instance file

* revert example instance

* format file

* remove macros

* resolve compile error

* rename wrw kernel invoker

* use gridwisegemm pipeline struct instead of implement run fucntion in the same header
Co-authored-by: Chao Liu <chao.liu2@amd.com>

070619fb

remove unused conv bwd data profiler header and cpp (#245) · b31b588d
Shaojie WANG authored May 21, 2022

b31b588d
Fix + test reenabled · 5fd5daab
myamlak authored May 20, 2022

5fd5daab
Revert "Enabling bf16 test" · 18125c3b
myamlak authored May 20, 2022
```
This reverts commit f497e2ba.
```
18125c3b

[Perf][Bwd-weights]Lds re-layout to avoid ds read/write bank conflict and... · b9b9c3b8

Shaojie WANG authored May 20, 2022


[Perf][Bwd-weights]Lds re-layout to avoid ds read/write bank conflict and balance ds ops with address calculations (#190)

* add some instance to develop

* avoid bank conflicts for wrw for all instance

* add small K1 test

* delete some unused instance

* reset buffer load oob and ds memcpy to default option

* remove useless instances

* remove redandunt space

* remove printf code

* clang-format-10 change

* fix clang format for the other files

* add bank length computation

* add template to distinguish the instance that need lds padding for wrw

* use rocm5.1 as docker

* use integer value for GEMM test

* 1. move dedicated transform into gridwisegemm's head file. 2. make lds tensor params a struct templete. 3. remove useless code

* use a new gridwise gemm header for bwd-weight

* revert gridwise gemm v2r4r2

* change foramt

* rename kernel invoker
Co-authored-by: Chao Liu <chao.liu2@amd.com>

b9b9c3b8

Hotfix eltiwseop (#242) · bb4b82a9

rocking5566 authored May 20, 2022



* Use vector constructor instead

* Fix typo

* Move blockSize to the MakeArgumentPointer

* Fix naming

* Fix clang format

* remove blockSize from DeviceBinaryElementwise::Argument()
Co-authored-by: rocking <chunylai@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

bb4b82a9

Gemm reduce max (#209) · 0ffe956a

rocking5566 authored May 20, 2022



* [What] Rename the example
[Why] Prepare to add unary reduction

* Add global oparation to the parameter

* Add atomicmax

* Fix compile error

* Support atomicMax (hip library)

* Rename the reduction example

* Fix target name

* use p_d1_grid as the indicator directly

* Prevent performance issue. Let passthrough handle it.

* Implement the function template the specialize the float2

* No need to separate into two lines

* Remove empty line

* add comment

* Fix compile error due to merge from develop

* make the implementation of atomic_max / atomic_add explicit for each datatype

* Refine typo

* For future CI test

* Fix compiler error in ckProfiler

* Merge commit 'de2769e3a6695b38a20529261273ddc5cdaab2fe'

* simply use remove_pointer

* Rename type and var

* Refine example

* Modify reducemax example

* Fix bug in reduction

* Change initialize range

* Implement F64 version of atomicMax

* Move reduction  code together

* Add buffer atomic_max

* Fix coding style by clang-format

* Integrate new api of DeviceGemmReduce_Xdl_CShuffle

* Integrate Batch gemm reduction

* Fix example

* fix example

* clean up

* Fix batch gemm tensor operation

* Fix coding style

* Fix template augument

* Fix clang format

* Keep flexible of different stride for each D tensor

* Fix compile error for ckProfiler

* Fix typo

* [What] Fix naming
[Why] Prepare to add out elementop

* Add DoutElementOp
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: rocking <chunylai@amd.com>

0ffe956a

19 May, 2022 4 commits

Enabling bf16 test · f497e2ba
myamlak authored May 19, 2022

f497e2ba
Format · f63ca8e8
myamlak authored May 19, 2022

f63ca8e8
Merge remote-tracking branch 'origin/develop' into myamlak/cgemm · a7676df9
myamlak authored May 19, 2022

a7676df9

elementwise op (#238) · aafc3ac2

rocking5566 authored May 19, 2022



* Add elementwise operation kernel and example

* Add comment

* Add template argument of dim . Prepare to support multiple dimension

* Rename example

* Support 1 dimension

* Add static assert

* Add comment

* Extract pad

* Remove redundant argument

* Support any dimension for elementwise operation

* Remove line

* Let it be the multiple number of CU

* Move thread per block to the parameter of constructor

* rename threadPerBlock with blockSize

* Support double

* rename kernel function name

* remove redundant include header

* Refine type

* Need to the final dimension

* Refine variable name

* Refine type

* Use index_t instead of int in API
Co-authored-by: rocking <chunylai@amd.com>

aafc3ac2

18 May, 2022 5 commits
- Fix + cosmetics + bf16 test commented out temporarily · 6ebcb667
  myamlak authored May 18, 2022
  
  6ebcb667
- Consuming binary ops to do A+B / A-B · 208ac1a5
  myamlak authored May 18, 2022
  
  208ac1a5
- Merge remote-tracking branch 'origin/eltwise_op' into myamlak/cgemm · 5e104742
  myamlak authored May 18, 2022
  
  5e104742
- Move thread per block to the parameter of constructor · c4d610be
  rocking authored May 18, 2022
  
  c4d610be
- Let it be the multiple number of CU · 83f75313
  rocking authored May 18, 2022
  
  83f75313
17 May, 2022 13 commits
- Remove line · b7a82d29
  rocking authored May 18, 2022
  
  b7a82d29
- Support any dimension for elementwise operation · 7d44e782
  rocking authored May 18, 2022
  
  7d44e782
- Remove redundant argument · 06e52d90
  rocking authored May 18, 2022
  
  06e52d90
- Extract pad · 0f840256
  rocking authored May 18, 2022
  
  0f840256
- Second auxiliary buffer added · 5ae304df
  myamlak authored May 17, 2022
  
  5ae304df
- Add comment · ecdfe960
  rocking authored May 17, 2022
  
  ecdfe960
- Add static assert · 492da459
  rocking authored May 17, 2022
  
  492da459
- Support 1 dimension · 4af77e1f
  rocking authored May 17, 2022
  
  4af77e1f
- Rename example · 0d26477a
  rocking authored May 17, 2022
  
  0d26477a
- Add template argument of dim . Prepare to support multiple dimension · b456d5e5
  rocking authored May 17, 2022
  
  b456d5e5
- Merge remote-tracking branch 'origin/eltwise_op' into myamlak/cgemm · b3767dbe
  myamlak authored May 17, 2022
  
  b3767dbe
- Merge remote-tracking branch 'origin/develop' into myamlak/cgemm · e00a943e
  myamlak authored May 17, 2022
  
  e00a943e
- Add comment · c2626122
  rocking authored May 17, 2022
  
  c2626122
16 May, 2022 2 commits
- Add elementwise operation kernel and example · a61f34f7
  rocking authored May 17, 2022
  
  a61f34f7
- Cosmetics · ffe12e2e
  myamlak authored May 16, 2022
  
  ffe12e2e