Commits · 65c56e56880e2b6720bba93136082aca4fc47f5f · gaoqiong / composable_kernel

25 Jul, 2022 3 commits
- update Tensor · 65c56e56
  Chao Liu authored Jul 25, 2022
  
  65c56e56
- adding group · 028171e9
  Chao Liu authored Jul 25, 2022
  
  028171e9
- adding group · 3549e344
  Chao Liu authored Jul 25, 2022
  
  3549e344
24 Jul, 2022 1 commit
- add G · 19173ab7
  Chao Liu authored Jul 24, 2022
  
  19173ab7
22 Jul, 2022 1 commit
- add bias · ae1b4ee6
  Chao Liu authored Jul 22, 2022
  
  ae1b4ee6
21 Jul, 2022 2 commits
- refactor · c71e140d
  Chao Liu authored Jul 21, 2022
  
  c71e140d
- refactor · 12585e57
  Chao Liu authored Jul 21, 2022
  
  12585e57
20 Jul, 2022 4 commits
- refactor · 90acba1d
  Chao Liu authored Jul 20, 2022
  
  90acba1d
- refactor · a22c7cf5
  Chao Liu authored Jul 20, 2022
  
  a22c7cf5
- adding group conv · 05c484e2
  Chao Liu authored Jul 20, 2022
  
  05c484e2
- add gemm padding to convnd · 3474c777
  Chao Liu authored Jul 20, 2022
  
  3474c777
19 Jul, 2022 2 commits
- add matrix padder · 7cc806d8
  Chao Liu authored Jul 19, 2022
  
  7cc806d8
- adding conv multiple D · 0b997ce4
  Chao Liu authored Jul 18, 2022
  
  0b997ce4
18 Jul, 2022 8 commits
- adding conv multiple d · 69d323de
  Chao Liu authored Jul 18, 2022
  
  69d323de
- clean · 2a94fa10
  Chao Liu authored Jul 18, 2022
  
  2a94fa10
- clean · 8e8ae66d
  Chao Liu authored Jul 18, 2022
  
  8e8ae66d
- clean · 0e81cc18
  Chao Liu authored Jul 18, 2022
  
  0e81cc18
- fix initialization issue · 74603261
  Chao Liu authored Jul 18, 2022
  
  74603261
- update examples · 360184cd
  Chao Liu authored Jul 18, 2022
  
  360184cd
- fix reference conv bwd data bug; update conv bwd data test · f6922d3f
  Chao Liu authored Jul 17, 2022
  
  f6922d3f
- update ckprofiler for conv bwd data · 8f722700
  Chao Liu authored Jul 17, 2022
  
  8f722700
17 Jul, 2022 3 commits
- update conv example · f5e3a6e8
  Chao Liu authored Jul 17, 2022
  
  f5e3a6e8
- clean · c9b86e0c
  Chao Liu authored Jul 17, 2022
  
  c9b86e0c
- update conv bwd weight · 0be1cf14
  Chao Liu authored Jul 17, 2022
  
  0be1cf14
14 Jul, 2022 3 commits
- update profiler for conv bwd data and weight · b054669b
  Chao Liu authored Jul 14, 2022
  
  b054669b
- update include path · 0a0c9527
  Chao Liu authored Jul 14, 2022
  
  0a0c9527
- update conv fwd profiler · 11edd0f0
  Chao Liu authored Jul 13, 2022
  
  11edd0f0
13 Jul, 2022 2 commits

rocking5566 authored Jul 14, 2022



* Implement layernorm kernel and deviceOp

* verify gpu kernel with host code

* 1. Separate gamma aand beta from affine
2. Check if argument is valid

* clean

* Sync the naming

* Support sweep once mode if we can put k dimension data inside one block

* [What] Get length from upper length.
[Why] if we get length directly, we may get length after padding.

* We only use one block in K dimension.
Hence, we can simplify the indexing of global R/W.

* Use 1d descriptor for gamma and beta

* Add accElementwiseOp

* Extract layernorm host code

* Support different YVectorDim in GridwiseLayernorm

* Rename XSrcVectorDim to XYSrcVectorDim. Because we use same parameter in deviceOp

* Gamma and beta can share the VGPR.

* Add test for fp32 and fp16

* Fix bug of concurrency and add test case which may fail orignally

* Propagate NaN for layernorm
Co-authored-by: Chao Liu <chao.liu2@amd.com>

7f216620

update reference conv · 0cb8ba92
Chao Liu authored Jul 13, 2022

0cb8ba92

12 Jul, 2022 3 commits
- updating refernce conv · ba816e69
  Chao Liu authored Jul 12, 2022
  
  ba816e69
- update example · 21892202
  Chao Liu authored Jul 12, 2022
  
  21892202
- update example · d789a53d
  Chao Liu authored Jul 12, 2022
  
  d789a53d
11 Jul, 2022 1 commit
- convnd_fwd fp16 example · 92a0945d
  Chao Liu authored Jul 11, 2022
  
  92a0945d
08 Jul, 2022 1 commit

GEMM pipeline v2 (#317) · 63914743

Po Yen Chen authored Jul 09, 2022



* format

* improving pipeline

* fix typo

* format

* adding thread group

* adding thread group

* adding thread group

* adding gemm pipeline

* tweak

* refactor

* refactor

* add missing type convert

* refactor

* refactor

* refactor

* clean

* fix build

* refactor

* format

* clean up

* use remove_cvref_t

* clean

* use pipeline_v2 for gemm kernel

* Remove inconsistent indent

* Fix compilation errors due to incomplete merge process

* Add missing include directives

* Fix compilation errors in currently unused files

* Add license in newly added files

* Re-format touched files by clang-format-10

* Fix wrong template argument count of DeviceGemm<>

* Use language construct to choose between types

* Use language construct to choose GEMM example instance

* Fix compilation error due to interface change

* Re-use type alias to avoid duplication

* Unify type alias usage in source file

* Only use v2 pipeline in one gridwise GEMM type

* Remove no-longer used include directives

* Add static_assert() to check pipeline type requirements

* Revert "Add static_assert() to check pipeline type requirements"

This reverts commit f0985f0a132671a1caaea92810c9f30dcf062bde.

* clean

* clean

* clean

* clean
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: shaojiewang <wsjmessi@163.com>

63914743

07 Jul, 2022 1 commit

N-D Tensor Contraction example, instance, and client example (#270) · 4fe9c393

Chao Liu authored Jul 07, 2022

* adding contraction

* add contraction example

* update examle

* update example

* format

* update readme

* clean header

* clean header

* contraction with multiple D

* rename

* fix naming issue; add instances for contraction+bilinear

* change assumed virtual layout of contraction; add client example

* update example

* update

* contraction+scale

* use type_convert

* rename

4fe9c393

06 Jul, 2022 1 commit

Batched Gemm with C Permute (#305) · 334361cb

zjing14 authored Jul 06, 2022



* init commit

* add c_permute

* add mnk padding

* fixed comments

* Fixed comments
Co-authored-by: Chao Liu <chao.liu2@amd.com>

334361cb

02 Jul, 2022 1 commit

Gemm+Bilinear (#316) · 9e4429f9

Chao Liu authored Jul 02, 2022

* refactor

* update example

* update example

* gemm bilinear

* clean

* update

9e4429f9

01 Jul, 2022 2 commits

Single-kernel GEMM + layernorm (#263) · 63fd5da6

Anthony Chang authored Jul 01, 2022



* dump lds content in appropriate precision type

* add squared add reduction op; allows sq sum

* initial stub from regular gemm impl

* layernorm example code & host verification

* initial layernorm implementation

* tidy up

* make C0 precision type consistent with C

* clang-tidy and additional comments

* tighten up example code

* account for extra flops/bytes from normalization

* clang-format

* c0 bias/beta/gamma now have its own precision type

* AccElemOp for gemm outputs prior to feeding to layernorm

* update workgroup mapping

* rename kernel template param to reflect its dual use

* use LDS mem pool for reduction workspace

* change cshuffle precision type to f16; clean up

* clang-format

* correct naming

* explicit cast

* fully implemented gemm + bias + activation + add + norm

* activation in correct order

* reflect reduction API's recent change

* amend

* clean up; add comment

* keep up with recent changes in reduction API

* format

* resolve merge conflicts
Co-authored-by: Chao Liu <chao.liu2@amd.com>

63fd5da6

Gemm + bias + c_permute (#312) · fa9a0a5c
zjing14 authored Jun 30, 2022
```
* init commit

* add desc

* finished c permute

* fixed vector lens
```
fa9a0a5c

30 Jun, 2022 1 commit

Standalone sweep once softmax kernel w/ ckProfiler (#295) · 93c99f3d

Anthony Chang authored Jul 01, 2022

* use 'sweep once' softmax kernel where applicable

* threadwise copy's dst buffer can specify invalid element value

* add int8 in/out float compute softmax support

give a bit of leeway for int absolute tolerance as there's a single data point of all test cases showing off-by-1 error

* format

* softmax inherits DeviceNormalization

* softmax profiler stub

* tighten up reference softmax interface

* example prints tensor dimension

* add fp32 to softmax profiler

* rename header

* hook with ckProfiler

* format

* resolve merge conflict

* resolve merge conflicts

* update normalization profiler help string

* resolve conflict

* typo

* remove residual

* softmax profiler: address feedback

* test for mixed precision input/output

* fully qualify ck::math::isnan

* add comment for device normalization interface

* revise wording

* constness for alpha/beta scaler pointer

93c99f3d