Commits · feature/add-permute-device-op · gaoqiong / composable_kernel

20 Sep, 2022 2 commits
- Merge branch 'develop' into feature/add-permute-device-op · ba1756e3
  Po-Yen, Chen authored Sep 19, 2022
  
  ba1756e3
- Add batched attention special kernel instances (#424) · 7c788e10
  Anthony Chang authored Sep 20, 2022
```
* sanity check

* add attribution

* add irrgular k tile size for batched attention

* format
```
  7c788e10
19 Sep, 2022 8 commits

Remove template BaseInvokerCRTP<> · ea23062c
Po-Yen, Chen authored Sep 19, 2022

ea23062c
work around inline asm potential hazard using intrinsic (#416) · c6b8b472
Anthony Chang authored Sep 20, 2022

c6b8b472

Grouped batched attention + permute (#412) · 9287b7c6

Anthony Chang authored Sep 20, 2022

* grouped attn without batch validates; now move toward grouped batched attn

* grouped batched attention

* working

* remove debug logging

clean up

clean up

* reintroduce g_ prefix back to host tensor variables

* format

* rename file

* restore old file

* rename

* consolidate padded/non-padded attention example

* harmonize padding specialization in attn examples

9287b7c6

Remove opt-ed out assertion · 1ca0b97c
Po-Yen, Chen authored Sep 19, 2022

1ca0b97c
Make sure we use unsigned type for shape & indices · 00207401
Po-Yen, Chen authored Sep 19, 2022

00207401
Merge branch 'develop' into feature/add-permute-device-op · 69a88385
Po-Yen, Chen authored Sep 19, 2022

69a88385
Rename '38_permute' to '39_permute' · 420e3b84
Po-Yen, Chen authored Sep 19, 2022

420e3b84

Conv bwd data multiple d (#404) · 27858374

Shaojie WANG authored Sep 20, 2022



* init commit of convnd bwd data

* begin compiling example

* have a first version that produce a right result

* refine device level launch kernel code

* add more instances in example and get right results

* clang-format

* format example file

* add more instances

* fix instances

* adding conv_bwd_data multile_d

* adding conv_bwd_data multile_d

* adding conv_bwd multiple d

* adding conv_bwd multiple d

* adding conv_bwd multiple d

* refactor

* refactor

* adding conv bwd data multiple d

* adding conv bwd data multiple d

* adding conv bwd data multiple d

* adding conv bwd data multiple d

* adding conv bwd data multiple d

* adding conv bwd data multiple d

* adding conv bwd data multiple d

* refactor

* update conv fwd's bias impl

* refactor

* reorg file

* clean up cmake

* clean

* clean

* clean
Co-authored-by: Chao Liu <lc.roy86@gmail.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

27858374

16 Sep, 2022 3 commits
- Merge branch 'develop' into feature/add-permute-device-op · 738d585a
  Po-Yen, Chen authored Sep 16, 2022
  
  738d585a
- disable print for group conv multiple D (#421) · 43c898f6
  Chao Liu authored Sep 16, 2022
  
  43c898f6
- Use larger shape in examples · 4a6e2701
  Po-Yen, Chen authored Sep 15, 2022
  
  4a6e2701
15 Sep, 2022 15 commits
- Use std::cerr to report error · f56cef53
  Po-Yen, Chen authored Sep 15, 2022
  
  f56cef53
- Merge branch 'develop' into feature/add-permute-device-op · 7b6fb72b
  Po-Yen, Chen authored Sep 15, 2022
  
  7b6fb72b
- Rename permute example folder · 6ba38dd0
  Po-Yen, Chen authored Sep 15, 2022
  
  6ba38dd0
- Add comments in 'GridwisePermute' · 5cfa0368
  Po-Yen, Chen authored Sep 15, 2022
  
  5cfa0368
- Unify naming style in 'DevicePermute' · 057ffb90
  Po-Yen, Chen authored Sep 15, 2022
  
  057ffb90
- Use type alias to reduce code · ee40f5a9
  Po-Yen, Chen authored Sep 15, 2022
  
  ee40f5a9
- Move 'Block2TileMap' definition into 'GridwisePermute' · f17fa4d7
  Po-Yen, Chen authored Sep 15, 2022
  
  f17fa4d7
- Add 'noexcept' specifier to CRTP generated method · b681fc26
  Po-Yen, Chen authored Sep 15, 2022
  
  b681fc26
- Rename 'DevicePermute' to 'DevicePermuteImpl' · 734a12da
  Po-Yen, Chen authored Sep 15, 2022
  
  734a12da
- Move 'NumDim' template param to the first · 16b116a9
  Po-Yen, Chen authored Sep 15, 2022
  
  16b116a9
- Create new base type for 'DervicePermute' implementations · b56ddad3
  Po-Yen, Chen authored Sep 15, 2022
  
  b56ddad3
- Add BaseInvokerCRTP<> class template to generate method · b4e2b28c
  Po-Yen, Chen authored Sep 15, 2022
  
  b4e2b28c
- Move long return type as tailing return type · 68a2443a
  Po-Yen, Chen authored Sep 14, 2022
  
  68a2443a
- Declare variable right before first use · 10b99e51
  Po-Yen, Chen authored Sep 14, 2022
  
  10b99e51
- Add check for range types · 78f72412
  Po-Yen, Chen authored Sep 14, 2022
  
  78f72412
14 Sep, 2022 12 commits

batched_gemm + multiple_d + gemm + multiple_d (#394) · 370efa6c

ltqin authored Sep 15, 2022



* refactor

* start

* add device gemm file

* add BatchStrideD0

* add stridd0

* add gridwise file

* add d0 parameters to gridwise gemm

* add c layout transformer

* add d0 threadwise copy

* init kernel

* init kernel

* regular code

* nm desc put to out

* kernel parameter can not use reference

* host add bias+gelu

* run right for bias+gelu

* change AddFastGelu into another file

* interface add d1 bias parameters

* add d1 parameter to argument

* add d1 parameter to gridwise

* first all code,not verify

* gelu change to relu and GetElementSpaceSize bug

* add instance

* start add to ckprofiler

* ckprofiler finish code

* change input parameter for ckProfiler

* fix host bias+gelu bug

* show help for ckProfiler

* fix bug for lunch kernel ignore parametes

* add pad and fix about bug

* mutiple d0

* add dynamic d0_element_op

* change profiler and  instance to mutiple d0

* example have 2 d0

* remove some comments not using

* change 2 d0 have self  parameters

* change d element_op name

* change class name(multiple_d)

* fix bug

* fix bug that don't find file

* update profiler

* refactor

* update profiler

* clean

* revert example change

* add gon layout

* optimize parameter for gno

* add gon to gemm+gemm

* change helping input parameters

* change to GemmPadder_v2

* using ForEach

* fix gb_per_sec
Co-authored-by: Chao Liu <lc.roy86@gmail.com>
Co-authored-by: ltqin <letaoqin@amd.com>

370efa6c

Remove useless static_assert() · 515b4c8d
Po-Yen, Chen authored Sep 14, 2022

515b4c8d
Remove redudant variables · 5a19b884
Po-Yen, Chen authored Sep 14, 2022

5a19b884
Move 'using' directive to proper code position · 2ac1a6b9
Po-Yen, Chen authored Sep 14, 2022

2ac1a6b9
Remove no-longer-used 'using' directives · e81f54bb
Po-Yen, Chen authored Sep 14, 2022

e81f54bb
Use AsSpan() to shorten check_err() calls · 0fa35b29
Po-Yen, Chen authored Sep 14, 2022

0fa35b29
Add Tensor<>::AsSpan<>() to create view of tensor values · c0c1d247
Po-Yen, Chen authored Sep 14, 2022

c0c1d247
Add to_array() conversion tool to eliminate more variables · b85e8611
Po-Yen, Chen authored Sep 14, 2022

b85e8611
Use function return value directly to eliminate variables · fb05bd38
Po-Yen, Chen authored Sep 14, 2022

fb05bd38
Use rangified copy() to copy elements · 097506c3
Po-Yen, Chen authored Sep 14, 2022

097506c3
Use more meaningful names in permute element examples · ff6a04fd
Po-Yen, Chen authored Sep 14, 2022

ff6a04fd
Use more meaningful names in permute bundle example · d53443d5
Po-Yen, Chen authored Sep 14, 2022

d53443d5