Commits · gemm_activation · gaoqiong / composable_kernel

02 Dec, 2021 15 commits
- Merge remote-tracking branch 'origin/develop' into gemm_activation · 1f5a319d
  Chao Liu authored Dec 02, 2021
  
  1f5a319d
- renaming/comments · d7a0a3f9
  Jing Zhang authored Dec 02, 2021
  
  d7a0a3f9
- clean up · 736adfe7
  Chao Liu authored Dec 02, 2021
  
  736adfe7
- Merge remote-tracking branch 'origin/develop' into gemm_activation · 3fb903fa
  Chao Liu authored Dec 02, 2021
  
  3fb903fa
- Update README.md · a8b539da
  Chao Liu authored Dec 02, 2021
  
  a8b539da
- Update README.md · 4361cffd
  Chao Liu authored Dec 02, 2021
```
update readme for example/3_conv_xdl
```
  4361cffd
- Update README.md · 7b24b5f0
  Chao Liu authored Dec 02, 2021
```
update readme for example/2_gemm_xdl_bias_relu_add
```
  7b24b5f0
- refactor · 28bfc452
  Chao Liu authored Dec 02, 2021
  
  28bfc452
- tweak · 40b2fad3
  Chao Liu authored Dec 02, 2021
  
  40b2fad3
- tweak · 788f4786
  Chao Liu authored Dec 02, 2021
  
  788f4786
- add conv+bias+relu+add, but has register spill issue · 5a79ff1e
  Chao Liu authored Dec 02, 2021
  
  5a79ff1e
- add static_buffer_v2 zero out · 2cbb8976
  Jing Zhang authored Dec 02, 2021
  
  2cbb8976
- fixed c_buffer alloc · d798c9b8
  Jing Zhang authored Dec 02, 2021
  
  d798c9b8
- adding conv_xdl_bias_relu_add example · 25343b48
  Chao Liu authored Dec 01, 2021
  
  25343b48
- add conv_xdl example · 66f38a87
  Chao Liu authored Dec 01, 2021
  
  66f38a87
01 Dec, 2021 2 commits
- clean up · daac320c
  Chao Liu authored Dec 01, 2021
  
  daac320c
- Merge remote-tracking branch 'origin/develop' into gemm_activation · 4227fce6
  Chao Liu authored Dec 01, 2021
  
  4227fce6
30 Nov, 2021 2 commits
- fix layout naming convention (#56) · 4041850f
  Chao Liu authored Nov 30, 2021
  
  4041850f
- added test for magic number division (#58) · 237d4ca0
  Chao Liu authored Nov 30, 2021
  
  237d4ca0
24 Nov, 2021 1 commit
- add args for packed gemm (#54) · 567f5e9c
  zjing14 authored Nov 24, 2021
  
  567f5e9c
23 Nov, 2021 6 commits
- Update README.md · c24b6de6
  Chao Liu authored Nov 23, 2021
  
  c24b6de6
- Update README.md · f2fe14c1
  Chao Liu authored Nov 23, 2021
  
  f2fe14c1
- Update README.md · 76818a8e
  Chao Liu authored Nov 23, 2021
  
  76818a8e
- clean up · 4810105c
  Chao Liu authored Nov 23, 2021
  
  4810105c
- clean up · a7361926
  Chao Liu authored Nov 23, 2021
  
  a7361926
- added bias add; worked around compiler issues · 81b26528
  Chao Liu authored Nov 22, 2021
  
  81b26528
22 Nov, 2021 3 commits
- adding bias add · 4f2c8bce
  Chao Liu authored Nov 21, 2021
  
  4f2c8bce
- adding bias add · 165e30cd
  Chao Liu authored Nov 21, 2021
  
  165e30cd
- adding bias add · d5679ea6
  Chao Liu authored Nov 21, 2021
  
  d5679ea6
20 Nov, 2021 4 commits
- update ckProfiler · 457c024d
  Chao Liu authored Nov 20, 2021
  
  457c024d
- add pointwise operation to A/B matrix · 2066a3d4
  Chao Liu authored Nov 20, 2021
  
  2066a3d4
- move C pointwise operation into threadwise copy · 496e2ec6
  Chao Liu authored Nov 19, 2021
  
  496e2ec6
- gemm+activation · f0201ead
  Chao Liu authored Nov 19, 2021
  
  f0201ead
18 Nov, 2021 3 commits

Use __builtin_memcpy to implement bit_cast and for accessing vector from pointer of scalars (#53) · 64350aff
Chao Liu authored Nov 18, 2021
```
* reworking vector_type

* use __builtin_memcpy for bit_cast and vector access of scalar pointer

* clean up
```
64350aff

v5r1 fusion kernels for inference (#49) · 970fa3e9

zjing14 authored Nov 18, 2021



* init

* refactor for 1x1

* rename e0_e1

* add e1 with bugs

* debug

* fixed

* fixed e1

* add timer

* imprve threadwise gemm with dot2

* add e2

* tuning

* seperate c2

* add nhwc

* restore nchwc

* clean

* opt

* fixed; tuning

* add BGlobalMoveSliceWindowStepHacks{}

* tuning

* repeat running

* adjust

* merge v5r1 nchwc

* add adaptors

* split k0 k1 in c_thread_grid

* split h and w

* remove v5r1 nhwc

* clean for pr

* remove host_conv_add

* clean code

* clean

* add dynamic support

* static mode

* test static

* add conv+add fusion

* fixed validation

* naming fix

* use activ_enum

* make static

* refactor conv_add for InMem::add

* add bias

* add conv_out

* add configurable makeddesc

* add maxpool fusion

* add maxpool host for validation

* enable static desc

* conv-only use v5r1_add

* test

* test

* for binary dumps

* fixed incorrect results due to typo

* clean

* debugging maxpool

* workaround with offset trick

* clean code

* modularize ops of fusion

* add gridwise_gemm_v3

* create seperate fusion fun

* enable dynamic mode of conv and conv+resize_add

* add dynamic mode of maxpool

* add pass by point

* add activ_type as arguments

* merge develop

* clean

* reset config to old default
Co-authored-by: Chao Liu <chao.liu2@amd.com>

970fa3e9

Fixed bfp16 host_conv_fwd (#52) · a651ea4f

zjing14 authored Nov 18, 2021



* fixed bfloat16 issues

* refactor type_convert

* fixed host_convolution_forward for ushort
Co-authored-by: Chao Liu <chao.liu2@amd.com>

a651ea4f

16 Nov, 2021 2 commits
- fixed multiple definition issue of bfp16/fp32 conversion function when building ckProfiler (#51) · 0a66c54e
  zjing14 authored Nov 16, 2021
```
* fixed bfloat16 issues

* refactor type_convert
Co-authored-by: Chao Liu <chao.liu2@amd.com>
```
  0a66c54e
- updated bfloat16_to_float · 89e1ebd4
  Jing Zhang authored Nov 16, 2021
  
  89e1ebd4
15 Nov, 2021 2 commits

Add bfp16/int8 support into XDL GEMM operator (#50) · 3737bb03

zjing14 authored Nov 15, 2021



* init StaticBufferV2

* clean

* adopt old output stage for staticBufferV2

* clean

* remove hack

* clean

* clean

* add parameters

* clean code

* move c_buffer alloc into blockwise gemm

* add adaptors for m/n_thread_data_on_grid

* tweak gemm

* adjust blockwise_gemm_xdlops

* tweak

* update conv

* update script

* adding bwd 1x1

* update script

* adding 1x1 bwd

* debugging bwd 1x1 failure

* update script

* update script

* test

* test v100

* add bf16_1k

* clang-format

* clean

* add bfp16 for gfx908

* add verification

* clean up

* clean code

* restore bfl16

* clean

* add bfp16 support into gemm_driver

* apply new generator to other drivers

* add int8 support

* cleanb

* clean

* clean

* clean
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>
Co-authored-by: root <root@hayabusa6111.amd.com>

3737bb03

FP16 data in-register transpose (#41) · b491ebf3

Chao Liu authored Nov 15, 2021

* start fixing 16bit data packing

* adding StaticTensor

* adding StaticTensor

* adding StaticTensor

* add missing constexpr

* adding static tensor

* adding static tensor

* adding transpose

* add inline asm for transpose 2x2 of half_t

* add general transpose_vectors(), but have unnecessary register initialization using v_mov

* fix unnecessary register initialization in transpose_vector by using more pass-by-reference

* add hardcoded logic for NHWC wrw

* improve asm for v_pack

* make ThreadwiseTensorSliceTransfer_v3r2 support any tensor

* tweak

* reorganize file

b491ebf3