Commits · ba1756e3700993325a99ed8bd3498f392d5c273e · gaoqiong / composable_kernel

20 Sep, 2022 1 commit
- Add batched attention special kernel instances (#424) · 7c788e10
  Anthony Chang authored Sep 20, 2022
```
* sanity check

* add attribution

* add irrgular k tile size for batched attention

* format
```
  7c788e10
19 Sep, 2022 4 commits

Remove template BaseInvokerCRTP<> · ea23062c
Po-Yen, Chen authored Sep 19, 2022

ea23062c
work around inline asm potential hazard using intrinsic (#416) · c6b8b472
Anthony Chang authored Sep 20, 2022

c6b8b472

Grouped batched attention + permute (#412) · 9287b7c6

Anthony Chang authored Sep 20, 2022

* grouped attn without batch validates; now move toward grouped batched attn

* grouped batched attention

* working

* remove debug logging

clean up

clean up

* reintroduce g_ prefix back to host tensor variables

* format

* rename file

* restore old file

* rename

* consolidate padded/non-padded attention example

* harmonize padding specialization in attn examples

9287b7c6

Conv bwd data multiple d (#404) · 27858374

Shaojie WANG authored Sep 20, 2022



* init commit of convnd bwd data

* begin compiling example

* have a first version that produce a right result

* refine device level launch kernel code

* add more instances in example and get right results

* clang-format

* format example file

* add more instances

* fix instances

* adding conv_bwd_data multile_d

* adding conv_bwd_data multile_d

* adding conv_bwd multiple d

* adding conv_bwd multiple d

* adding conv_bwd multiple d

* refactor

* refactor

* adding conv bwd data multiple d

* adding conv bwd data multiple d

* adding conv bwd data multiple d

* adding conv bwd data multiple d

* adding conv bwd data multiple d

* adding conv bwd data multiple d

* adding conv bwd data multiple d

* refactor

* update conv fwd's bias impl

* refactor

* reorg file

* clean up cmake

* clean

* clean

* clean
Co-authored-by: Chao Liu <lc.roy86@gmail.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

27858374

16 Sep, 2022 1 commit
- disable print for group conv multiple D (#421) · 43c898f6
  Chao Liu authored Sep 16, 2022
  
  43c898f6
15 Sep, 2022 9 commits
- Add comments in 'GridwisePermute' · 5cfa0368
  Po-Yen, Chen authored Sep 15, 2022
  
  5cfa0368
- Unify naming style in 'DevicePermute' · 057ffb90
  Po-Yen, Chen authored Sep 15, 2022
  
  057ffb90
- Use type alias to reduce code · ee40f5a9
  Po-Yen, Chen authored Sep 15, 2022
  
  ee40f5a9
- Move 'Block2TileMap' definition into 'GridwisePermute' · f17fa4d7
  Po-Yen, Chen authored Sep 15, 2022
  
  f17fa4d7
- Add 'noexcept' specifier to CRTP generated method · b681fc26
  Po-Yen, Chen authored Sep 15, 2022
  
  b681fc26
- Rename 'DevicePermute' to 'DevicePermuteImpl' · 734a12da
  Po-Yen, Chen authored Sep 15, 2022
  
  734a12da
- Move 'NumDim' template param to the first · 16b116a9
  Po-Yen, Chen authored Sep 15, 2022
  
  16b116a9
- Create new base type for 'DervicePermute' implementations · b56ddad3
  Po-Yen, Chen authored Sep 15, 2022
  
  b56ddad3
- Add BaseInvokerCRTP<> class template to generate method · b4e2b28c
  Po-Yen, Chen authored Sep 15, 2022
  
  b4e2b28c
14 Sep, 2022 1 commit

batched_gemm + multiple_d + gemm + multiple_d (#394) · 370efa6c

ltqin authored Sep 15, 2022



* refactor

* start

* add device gemm file

* add BatchStrideD0

* add stridd0

* add gridwise file

* add d0 parameters to gridwise gemm

* add c layout transformer

* add d0 threadwise copy

* init kernel

* init kernel

* regular code

* nm desc put to out

* kernel parameter can not use reference

* host add bias+gelu

* run right for bias+gelu

* change AddFastGelu into another file

* interface add d1 bias parameters

* add d1 parameter to argument

* add d1 parameter to gridwise

* first all code,not verify

* gelu change to relu and GetElementSpaceSize bug

* add instance

* start add to ckprofiler

* ckprofiler finish code

* change input parameter for ckProfiler

* fix host bias+gelu bug

* show help for ckProfiler

* fix bug for lunch kernel ignore parametes

* add pad and fix about bug

* mutiple d0

* add dynamic d0_element_op

* change profiler and  instance to mutiple d0

* example have 2 d0

* remove some comments not using

* change 2 d0 have self  parameters

* change d element_op name

* change class name(multiple_d)

* fix bug

* fix bug that don't find file

* update profiler

* refactor

* update profiler

* clean

* revert example change

* add gon layout

* optimize parameter for gno

* add gon to gemm+gemm

* change helping input parameters

* change to GemmPadder_v2

* using ForEach

* fix gb_per_sec
Co-authored-by: Chao Liu <lc.roy86@gmail.com>
Co-authored-by: ltqin <letaoqin@amd.com>

370efa6c

13 Sep, 2022 1 commit
- Remove global load/store loop in kernel code · f650c019
  Po-Yen, Chen authored Sep 13, 2022
  
  f650c019
12 Sep, 2022 7 commits
- Embed shape info in name of descriptor constructor · 5502bac2
  Po-Yen, Chen authored Sep 12, 2022
  
  5502bac2
- Use fixed 'VectorDim' & 'ScalarPerVector' for LDS · 5a7c738d
  Po-Yen, Chen authored Sep 12, 2022
  
  5a7c738d
- Use more verbose name to avoid name collision · 23c3a395
  Po-Yen, Chen authored Sep 12, 2022
  
  23c3a395
- Check scalar-per-vector with padded length · 910a26b4
  Po-Yen, Chen authored Sep 12, 2022
  
  910a26b4
- Remove redundant parameter in helper lambda function · b80b34f5
  Po-Yen, Chen authored Sep 12, 2022
  
  b80b34f5
- Fix ambiguous ctor call · f629397b
  Po-Yen, Chen authored Sep 12, 2022
  
  f629397b
- Add span<> class template · 3fdff5bb
  Po-Yen, Chen authored Sep 12, 2022
  
  3fdff5bb
09 Sep, 2022 3 commits

embedding fuse layernorm (#405) · efd1d257

carlushuang authored Sep 09, 2022



* add gridwise/device sparse embedding

* update code

* update code

* remove useless makefile

* code fix

* workable

* work properly

* emb add

* add more instance

* format

* remove useless code

* fix format

* fix clang-tidy

* clean

* fix a compile error
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>

efd1d257

Fix wrong descriptor creation logics · 1989df75
Po-Yen, Chen authored Sep 08, 2022

1989df75
Remove no-longer used template parameter 'NPerBlock' · 3672d070
Po-Yen, Chen authored Sep 08, 2022

3672d070

08 Sep, 2022 13 commits
- Let 'DstVectorDim' equals 'SrcVectorDim' after transpose out grid desc · ac9d0a67
  Po-Yen, Chen authored Sep 08, 2022
  
  ac9d0a67
- Add check for the 'VectorDim' & 'ScalarPerVector' template params · c3943745
  Po-Yen, Chen authored Sep 08, 2022
  
  c3943745
- Make sure 'SrcVectorDim' is not same as 'DstVectorDim' · a399b408
  Po-Yen, Chen authored Sep 08, 2022
  
  a399b408
- Add comment · 59fef16f
  Po-Yen, Chen authored Sep 08, 2022
  
  59fef16f
- Avoid too-large block id · 360383bb
  Po-Yen, Chen authored Sep 08, 2022
  
  360383bb
- Calculate new SrcVectorDim/DstVectorDim after merge descriptor dimensions · 33aa4d45
  Po-Yen, Chen authored Sep 08, 2022
  
  33aa4d45
- Add more template parameters (vector width related) · a70d9f63
  Po-Yen, Chen authored Sep 08, 2022
  
  a70d9f63
- Rename local type alias · 4eaa502b
  Po-Yen, Chen authored Sep 08, 2022
  
  4eaa502b
- Extract local types as template parameters · e5e7adbd
  Po-Yen, Chen authored Sep 08, 2022
  
  e5e7adbd
- Add GridwisePermute::CheckValidity() · 5e28dcda
  Po-Yen, Chen authored Sep 08, 2022
  
  5e28dcda
- Embed layout in the variable names · 0c23d6fa
  Po-Yen, Chen authored Sep 08, 2022
  
  0c23d6fa
- Remove no-longer used template parameters · ad1a639b
  Po-Yen, Chen authored Sep 08, 2022
  
  ad1a639b
- Re-arrange template arguments for blockwise copy · 48df84a4
  Po-Yen, Chen authored Sep 08, 2022
  
  48df84a4