Commits · 8a370fbb0ed5aca20326f1768ad860c2716dd475 · gaoqiong / composable_kernel

"docs/archive_en_US/Tutorial/SetupNniDeveloperEnvironment.md" did not exist on "1d893dda8755fc2e29167584cd1fc18a44872c1b"

26 Jul, 2022 3 commits
- fix example build · d8fdd226
  Chao Liu authored Jul 26, 2022
  
  d8fdd226
- update gemm and batch gemm with e permute · ed3c27cc
  Chao Liu authored Jul 26, 2022
  
  ed3c27cc
- update conv bwd-data and bwd-weight · 809799bf
  Chao Liu authored Jul 26, 2022
  
  809799bf
25 Jul, 2022 6 commits
- update DeviceGemmMultipleD_Xdl_CShuffle · 71b69694
  Chao Liu authored Jul 25, 2022
  
  71b69694
- update example · 0e8d7ed3
  Chao Liu authored Jul 25, 2022
  
  0e8d7ed3
- clean · 61510f0a
  Chao Liu authored Jul 25, 2022
  
  61510f0a
- update Tensor · 65c56e56
  Chao Liu authored Jul 25, 2022
  
  65c56e56
- adding group · 028171e9
  Chao Liu authored Jul 25, 2022
  
  028171e9
- adding group · 3549e344
  Chao Liu authored Jul 25, 2022
  
  3549e344
24 Jul, 2022 1 commit
- add G · 19173ab7
  Chao Liu authored Jul 24, 2022
  
  19173ab7
22 Jul, 2022 2 commits

Batched Gemm with multiD (#329) · d7d78290

zjing14 authored Jul 22, 2022



* add batched_gemm_multiD

* add ds

* rename file

* add batched_gemm_bias example

* add batch_strides into bmm_c_permute

* clean

* rename example_28 to example_29
Co-authored-by: Chao Liu <chao.liu2@amd.com>

d7d78290

add bias · ae1b4ee6
Chao Liu authored Jul 22, 2022

ae1b4ee6

21 Jul, 2022 3 commits

refactor · c71e140d
Chao Liu authored Jul 21, 2022

c71e140d

Grouped Gemm device with multiD grid (#319) · 7959dad5

zjing14 authored Jul 21, 2022



* replace gridwise_v2r3 with multiD

* adjust parameters

* add instances

* fixed test_grouped_gemm

* fix standalone softmax race condition around blockwise reduction

* fixed ci

* fixed comment: remove redundant workspace

* use instanceFactory

* add test layout

* add empty Ds

* add bias example

* use array

* sperate examples
Co-authored-by: Anthony Chang <ac.chang@outlook.com>

7959dad5

refactor · 12585e57
Chao Liu authored Jul 21, 2022

12585e57

20 Jul, 2022 4 commits
- refactor · 90acba1d
  Chao Liu authored Jul 20, 2022
  
  90acba1d
- refactor · a22c7cf5
  Chao Liu authored Jul 20, 2022
  
  a22c7cf5
- adding group conv · 05c484e2
  Chao Liu authored Jul 20, 2022
  
  05c484e2
- add gemm padding to convnd · 3474c777
  Chao Liu authored Jul 20, 2022
  
  3474c777
19 Jul, 2022 2 commits
- add matrix padder · 7cc806d8
  Chao Liu authored Jul 19, 2022
  
  7cc806d8
- adding conv multiple D · 0b997ce4
  Chao Liu authored Jul 18, 2022
  
  0b997ce4
18 Jul, 2022 8 commits
- adding conv multiple d · 69d323de
  Chao Liu authored Jul 18, 2022
  
  69d323de
- clean · 2a94fa10
  Chao Liu authored Jul 18, 2022
  
  2a94fa10
- clean · 8e8ae66d
  Chao Liu authored Jul 18, 2022
  
  8e8ae66d
- clean · 0e81cc18
  Chao Liu authored Jul 18, 2022
  
  0e81cc18
- fix initialization issue · 74603261
  Chao Liu authored Jul 18, 2022
  
  74603261
- update examples · 360184cd
  Chao Liu authored Jul 18, 2022
  
  360184cd
- fix reference conv bwd data bug; update conv bwd data test · f6922d3f
  Chao Liu authored Jul 17, 2022
  
  f6922d3f
- update ckprofiler for conv bwd data · 8f722700
  Chao Liu authored Jul 17, 2022
  
  8f722700
17 Jul, 2022 3 commits
- update conv example · f5e3a6e8
  Chao Liu authored Jul 17, 2022
  
  f5e3a6e8
- clean · c9b86e0c
  Chao Liu authored Jul 17, 2022
  
  c9b86e0c
- update conv bwd weight · 0be1cf14
  Chao Liu authored Jul 17, 2022
  
  0be1cf14
14 Jul, 2022 3 commits
- update profiler for conv bwd data and weight · b054669b
  Chao Liu authored Jul 14, 2022
  
  b054669b
- update include path · 0a0c9527
  Chao Liu authored Jul 14, 2022
  
  0a0c9527
- update conv fwd profiler · 11edd0f0
  Chao Liu authored Jul 13, 2022
  
  11edd0f0
13 Jul, 2022 2 commits

Standalone layernorm (#315) · 7f216620

rocking5566 authored Jul 14, 2022



* Implement layernorm kernel and deviceOp

* verify gpu kernel with host code

* 1. Separate gamma aand beta from affine
2. Check if argument is valid

* clean

* Sync the naming

* Support sweep once mode if we can put k dimension data inside one block

* [What] Get length from upper length.
[Why] if we get length directly, we may get length after padding.

* We only use one block in K dimension.
Hence, we can simplify the indexing of global R/W.

* Use 1d descriptor for gamma and beta

* Add accElementwiseOp

* Extract layernorm host code

* Support different YVectorDim in GridwiseLayernorm

* Rename XSrcVectorDim to XYSrcVectorDim. Because we use same parameter in deviceOp

* Gamma and beta can share the VGPR.

* Add test for fp32 and fp16

* Fix bug of concurrency and add test case which may fail orignally

* Propagate NaN for layernorm
Co-authored-by: Chao Liu <chao.liu2@amd.com>

7f216620

update reference conv · 0cb8ba92
Chao Liu authored Jul 13, 2022

0cb8ba92

12 Jul, 2022 3 commits
- updating refernce conv · ba816e69
  Chao Liu authored Jul 12, 2022
  
  ba816e69
- update example · 21892202
  Chao Liu authored Jul 12, 2022
  
  21892202
- update example · d789a53d
  Chao Liu authored Jul 12, 2022
  
  d789a53d