Commits · 3672d0701b1f2cb1064697d56b32f5a0f5cae54f · gaoqiong / composable_kernel

09 Sep, 2022 1 commit
- Remove no-longer used template parameter 'NPerBlock' · 3672d070
  Po-Yen, Chen authored Sep 08, 2022
  
  3672d070
08 Sep, 2022 3 commits
- Re-formate example code · e017a9f0
  Po-Yen, Chen authored Sep 08, 2022
  
  e017a9f0
- Fill tensor values start from 1 · 7d5d9b8f
  Po-Yen, Chen authored Sep 08, 2022
  
  7d5d9b8f
- Fix gemm-softmax-gemm-permute padding cases (#409) · d6709dc3
  Anthony Chang authored Sep 08, 2022
```
* fix example; make padding on by default in example; fix argument checks

* fix Gemm1KPacK which has since regressed from PR #399
```
  d6709dc3
07 Sep, 2022 4 commits
- Add 'BlockSize' parameter to 'DevicePermute' · b3782d46
  Po-Yen, Chen authored Sep 07, 2022
  
  b3782d46
- Add N/H/WPerBlock template parameter to 'DevicePermute' · 18ba135f
  Po-Yen, Chen authored Sep 07, 2022
  
  18ba135f
- Add comment to indicate template argument location · b8abc3e3
  Po-Yen, Chen authored Sep 07, 2022
  
  b8abc3e3
- Check if input/output shape meet the requirement · 2e5d4f91
  Po-Yen, Chen authored Sep 07, 2022
  
  2e5d4f91
06 Sep, 2022 22 commits
- Remove no-longer used type argument · d356c871
  Po-Yen, Chen authored Sep 06, 2022
  
  d356c871
- Fused attention instances & padding tests (#395) · 868e5c55
  Anthony Chang authored Sep 07, 2022
```
* modify comment

* trim unnecessary check

* add gemm spec in kernel name

* add TNTT gemm_gemm + atten kernel instances

* refactor attention padding to better fit in unit tests

This streamlines usage where "ResetNaNToMinusInf" is now hidden from user facing device op.
Also added compile-time conditionals that load OOB value as NaN only after padding is enabled

* add adhoc padding test for atten

* shrink input value range for attention kernel validation to avoid occasional error by 1e-3

Still unsure whether this kind of deterministic floating point accurary issue is expected
or not. May want to try exact same approach as the GPU kernel in the host reference
GEMM+Softmax+GEMM function to see if the accuracy discrepancy goes away. Until then,
shrink the input value range as it is less likely to produce errors of around ~1e-3.

* attention kernel proper granular padding for all 4 dims

* IsSupportedArgument checks

* test more padded cases

* block PadK specialization in attention kernels

* workaround clang crash for gfx908

(gfx908 only) workaround for compiler crash in fused kernels on mainline #9110; #10738 seems ok
error message was "fatal error: error in backend: Error while trying to spill VGPR0 from class
VGPR_32: Cannot scavenge register without an emergency spill slot!"
this fall back to less ideal way of handle NPadding in fused attention kernel

* comment out kernels giving wrong results on MI100; MI200 doesn't seem affected
```
  868e5c55
- Passing 'axes' to 'DevicePermute' · 5b63400a
  Po-Yen, Chen authored Sep 06, 2022
  
  5b63400a
- Softmax client example (#396) · 3da5c19e
  Adam Osewski authored Sep 06, 2022
```
* Update Softmax device operation interface.

* Update ckProfiler.

* Update Softmax UT.

* Update example.

* Client example.

* Clang format
Co-authored-by: Adam Osewski <aosewski@amd.com>
```
  3da5c19e
- Distinguish input & output shape in 'DevicePermute' · 50f5ce49
  Po-Yen, Chen authored Sep 06, 2022
  
  50f5ce49
- Simplify 'DevicePermute' interface · 339e51d1
  Po-Yen, Chen authored Sep 06, 2022
  
  339e51d1
- Only accept single-input-single-output for 'DervicePermute' · 5ae42120
  Po-Yen, Chen authored Sep 06, 2022
  
  5ae42120
- Remove 'is_device_op<>' type traits · 179092df
  Po-Yen, Chen authored Sep 06, 2022
  
  179092df
- Use indirect base type to generate methods · 32a2d78b
  Po-Yen, Chen authored Sep 06, 2022
  
  32a2d78b
- Add static_assert() to check type constraints · ea343345
  Po-Yen, Chen authored Sep 06, 2022
  
  ea343345
- Add simple type traits to validate device op type · 70757860
  Po-Yen, Chen authored Sep 06, 2022
  
  70757860
- Remove 'elementwise' from file paths · 6c4268f9
  Po-Yen, Chen authored Sep 06, 2022
  
  6c4268f9
- Remove 'elementwise' from identifiers · 18781f56
  Po-Yen, Chen authored Sep 06, 2022
  
  18781f56
- Use 'DevicePermute' device op in example · 9c5dd6bf
  Po-Yen, Chen authored Sep 06, 2022
  
  9c5dd6bf
- Generalize variable naming in example code · 60ab70d8
  Po-Yen, Chen authored Sep 06, 2022
  
  60ab70d8
- Refine error message for check_err() · 31d758fb
  Po-Yen, Chen authored Sep 06, 2022
  
  31d758fb
- Remove debug messages · 43d4bd7a
  Po-Yen, Chen authored Sep 06, 2022
  
  43d4bd7a
- Add checks in helper functions · 7ebb1cbf
  Po-Yen, Chen authored Sep 06, 2022
  
  7ebb1cbf
- Use better name for tensor indices · e1f959fd
  Po-Yen, Chen authored Sep 06, 2022
  
  e1f959fd
- Generalize transpose utility functions · db32635c
  Po-Yen, Chen authored Sep 06, 2022
  
  db32635c
- Add transpose_shape() to generalize shape permute · 98498486
  Po-Yen, Chen authored Sep 06, 2022
  
  98498486
- Add check to template type argument · 185f7844
  Po-Yen, Chen authored Sep 06, 2022
  
  185f7844
05 Sep, 2022 7 commits
- Allow specify problem 'axes' through command line argument · 75831d9e
  Po-Yen, Chen authored Sep 05, 2022
  
  75831d9e
- Allow specify problem through command line argument · 8e71cad0
  Po-Yen, Chen authored Sep 05, 2022
  
  8e71cad0
- Use more specific method to write example · 19147f59
  Po-Yen, Chen authored Sep 05, 2022
  
  19147f59
- Use more strict input · 8a1ccdd4
  Po-Yen, Chen authored Sep 05, 2022
  
  8a1ccdd4
- Move common parts into common.hpp · 58945ac2
  Po-Yen, Chen authored Sep 05, 2022
  
  58945ac2
- Re-structure example files · ccd26cbd
  Po-Yen, Chen authored Sep 05, 2022
  
  ccd26cbd
- Add example folder for 'DeviceElementwise' · ef22508c
  Po-Yen, Chen authored Sep 05, 2022
  
  ef22508c
01 Sep, 2022 1 commit

add more datatype to gemm+gemm and conv+conv example (#397) · 204ef976

Chao Liu authored Sep 01, 2022

* refactor

* refactor

* adding int4/int8/fp16/bf16 for conv+conv and gemm+gemm

* adding int4/int8/fp16/bf16 for conv+conv and gemm+gemm

* clean

204ef976

31 Aug, 2022 2 commits

Add examples of Conv + reduction (data type: int4, int8, bf16, fp16, fp32) (#380) · 46a675aa

Po Yen Chen authored Sep 01, 2022



* Refactor the design of DeviceGemmMultipleDMultipleR_Xdl_CShuffle

* Add 'DeviceGroupedConvFwdMultipleDMultipleR' interface

* Add DeviceGroupedConvFwdMultipleDMultipleR_Xdl_CShuffle

* Remove 'GridwiseConvFwdMultipleDMultipleR_xdl_cshuffle'

* Add 'TransformConvFwdToGemm<>' utility class (from Chao)

* Use 'TransformConvFwdToGemm<>' to shorten code

* Fix ill-formed method declaration

* Re-implement MakeRGridDescriptor_M() function

* Change problem description

* Use macro to define layout types

* Define K-reduced output tensor layout types

* Let user to decide R output tensor layout

* Rename variables

* Add padding to the reduced output tensor if necessary

* Extract common code as helper method

* Remove debug message

* Add missing include directive

* Add partial fp16 Conv + Reduction example

* Add example verification code for 2D Conv problem

* Use type alias to simplify code

* Share code across different-dimension Conv problems

* Rename file/functions from run_conv_fwd* to run_convnd_fwd*

* Make example code more verbose

* Add code to support 1D & 3D Conv + Reduction on host

* Add more examples for data type: bf16, fp32

* Add example for int8

* Add custom target to group examples

* Use more general custom target name

* Change the description in error message

* Disable testing for example other than fp32

* Add examplel for int4 (just copy from int8)

* Fix wrong data type

* Use larger data type for intermediate tensors

* Finish int4 example

* Undefine macro PP_DEFINE_LAYOUT_TYPE() after use

* Use named variables to replace magic numbers

* Remove debug messages

* Use same A/B data type for host Conv in int4 example

* Add check for the 'RLayout' type argument

* Group same-dim-layouts together in 'LayoutSetting<>'

* Add 'final' specifier to utility classes

* Use different initialization method for examples

* Remove macro PP_DEFINE_LAYOUT_TYPE()

* Fix code-comment mismatch

* Use more reasonable initialization value for all data types

* Default use init_method=1 for all examples

* Remove never-used code

* Remove confusing out-of-date comments

* clean
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>

46a675aa

conv+conv (1x1 only) example using gemm+gemm (#393) · 4df6d93f
Chao Liu authored Aug 31, 2022
```
* refactor conv

* add conv+conv example, 1x1 only
```
4df6d93f