Commits · 2377c2e8628e7918fcc71aeb6f8dfb9af8192609 · gaoqiong / composable_kernel

06 Sep, 2022 24 commits
- Remove unnecessary include directives · 2377c2e8
  Po-Yen, Chen authored Sep 06, 2022
  
  2377c2e8
- Use CRTP to generate overridden virtual method · beed1068
  Po-Yen, Chen authored Sep 06, 2022
  
  beed1068
- Re-format 'DeviceElementwise' · c7aa455f
  Po-Yen, Chen authored Sep 06, 2022
  
  c7aa455f
- Simplify 'DevicePermute' interface · 339e51d1
  Po-Yen, Chen authored Sep 06, 2022
  
  339e51d1
- Only accept single-input-single-output for 'DervicePermute' · 5ae42120
  Po-Yen, Chen authored Sep 06, 2022
  
  5ae42120
- Remove 'is_device_op<>' type traits · 179092df
  Po-Yen, Chen authored Sep 06, 2022
  
  179092df
- Use indirect base type to generate methods · 32a2d78b
  Po-Yen, Chen authored Sep 06, 2022
  
  32a2d78b
- Create 'DevicePermuteBase' to generate methods · e53b50e8
  Po-Yen, Chen authored Sep 06, 2022
  
  e53b50e8
- Add static_assert() to check type constraints · ea343345
  Po-Yen, Chen authored Sep 06, 2022
  
  ea343345
- Add simple type traits to validate device op type · 70757860
  Po-Yen, Chen authored Sep 06, 2022
  
  70757860
- Let 'DevicePermute' inherit from 'BaseOperator' · bc26a2fa
  Po-Yen, Chen authored Sep 06, 2022
  
  bc26a2fa
- Remove base class of 'DevicePermute' · f015e568
  Po-Yen, Chen authored Sep 06, 2022
  
  f015e568
- Remove 'elementwise' from file paths · 6c4268f9
  Po-Yen, Chen authored Sep 06, 2022
  
  6c4268f9
- Remove 'elementwise' from identifiers · 18781f56
  Po-Yen, Chen authored Sep 06, 2022
  
  18781f56
- Use 'DevicePermute' device op in example · 9c5dd6bf
  Po-Yen, Chen authored Sep 06, 2022
  
  9c5dd6bf
- Add device op 'DevicePermute' · 1fdcf492
  Po-Yen, Chen authored Sep 06, 2022
```
This device op is clone of 'DeviceElementwise'
```
  1fdcf492
- Generalize variable naming in example code · 60ab70d8
  Po-Yen, Chen authored Sep 06, 2022
  
  60ab70d8
- Refine error message for check_err() · 31d758fb
  Po-Yen, Chen authored Sep 06, 2022
  
  31d758fb
- Remove debug messages · 43d4bd7a
  Po-Yen, Chen authored Sep 06, 2022
  
  43d4bd7a
- Add checks in helper functions · 7ebb1cbf
  Po-Yen, Chen authored Sep 06, 2022
  
  7ebb1cbf
- Use better name for tensor indices · e1f959fd
  Po-Yen, Chen authored Sep 06, 2022
  
  e1f959fd
- Generalize transpose utility functions · db32635c
  Po-Yen, Chen authored Sep 06, 2022
  
  db32635c
- Add transpose_shape() to generalize shape permute · 98498486
  Po-Yen, Chen authored Sep 06, 2022
  
  98498486
- Add check to template type argument · 185f7844
  Po-Yen, Chen authored Sep 06, 2022
  
  185f7844
05 Sep, 2022 8 commits
- Allow specify problem 'axes' through command line argument · 75831d9e
  Po-Yen, Chen authored Sep 05, 2022
  
  75831d9e
- Allow specify problem through command line argument · 8e71cad0
  Po-Yen, Chen authored Sep 05, 2022
  
  8e71cad0
- Use more specific method to write example · 19147f59
  Po-Yen, Chen authored Sep 05, 2022
  
  19147f59
- Add more helper methods in 'DeviceElementwise' · 665b73ff
  Po-Yen, Chen authored Sep 05, 2022
  
  665b73ff
- Use more strict input · 8a1ccdd4
  Po-Yen, Chen authored Sep 05, 2022
  
  8a1ccdd4
- Move common parts into common.hpp · 58945ac2
  Po-Yen, Chen authored Sep 05, 2022
  
  58945ac2
- Re-structure example files · ccd26cbd
  Po-Yen, Chen authored Sep 05, 2022
  
  ccd26cbd
- Add example folder for 'DeviceElementwise' · ef22508c
  Po-Yen, Chen authored Sep 05, 2022
  
  ef22508c
02 Sep, 2022 1 commit

[Hotfix] SplitK Gemm fp32 (#401) · 75891161

zjing14 authored Sep 02, 2022

* add scripts

* fixed splitK_gemm_fp32

* clean

* clean

* use gemm_xdl_splitK_c_shuffle into profiler

* remove device_gemm_xdl_splitk.hpp

75891161

01 Sep, 2022 1 commit

add more datatype to gemm+gemm and conv+conv example (#397) · 204ef976

Chao Liu authored Sep 01, 2022

* refactor

* refactor

* adding int4/int8/fp16/bf16 for conv+conv and gemm+gemm

* adding int4/int8/fp16/bf16 for conv+conv and gemm+gemm

* clean

204ef976

31 Aug, 2022 2 commits

Add examples of Conv + reduction (data type: int4, int8, bf16, fp16, fp32) (#380) · 46a675aa

Po Yen Chen authored Sep 01, 2022



* Refactor the design of DeviceGemmMultipleDMultipleR_Xdl_CShuffle

* Add 'DeviceGroupedConvFwdMultipleDMultipleR' interface

* Add DeviceGroupedConvFwdMultipleDMultipleR_Xdl_CShuffle

* Remove 'GridwiseConvFwdMultipleDMultipleR_xdl_cshuffle'

* Add 'TransformConvFwdToGemm<>' utility class (from Chao)

* Use 'TransformConvFwdToGemm<>' to shorten code

* Fix ill-formed method declaration

* Re-implement MakeRGridDescriptor_M() function

* Change problem description

* Use macro to define layout types

* Define K-reduced output tensor layout types

* Let user to decide R output tensor layout

* Rename variables

* Add padding to the reduced output tensor if necessary

* Extract common code as helper method

* Remove debug message

* Add missing include directive

* Add partial fp16 Conv + Reduction example

* Add example verification code for 2D Conv problem

* Use type alias to simplify code

* Share code across different-dimension Conv problems

* Rename file/functions from run_conv_fwd* to run_convnd_fwd*

* Make example code more verbose

* Add code to support 1D & 3D Conv + Reduction on host

* Add more examples for data type: bf16, fp32

* Add example for int8

* Add custom target to group examples

* Use more general custom target name

* Change the description in error message

* Disable testing for example other than fp32

* Add examplel for int4 (just copy from int8)

* Fix wrong data type

* Use larger data type for intermediate tensors

* Finish int4 example

* Undefine macro PP_DEFINE_LAYOUT_TYPE() after use

* Use named variables to replace magic numbers

* Remove debug messages

* Use same A/B data type for host Conv in int4 example

* Add check for the 'RLayout' type argument

* Group same-dim-layouts together in 'LayoutSetting<>'

* Add 'final' specifier to utility classes

* Use different initialization method for examples

* Remove macro PP_DEFINE_LAYOUT_TYPE()

* Fix code-comment mismatch

* Use more reasonable initialization value for all data types

* Default use init_method=1 for all examples

* Remove never-used code

* Remove confusing out-of-date comments

* clean
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>

46a675aa

conv+conv (1x1 only) example using gemm+gemm (#393) · 4df6d93f
Chao Liu authored Aug 31, 2022
```
* refactor conv

* add conv+conv example, 1x1 only
```
4df6d93f

30 Aug, 2022 2 commits

Gemm reduce examples int4/int8/fp32/bf16 (#368) · d00e6115

Adam Osewski authored Aug 30, 2022



* GEMM + Reduce max fp16+fp32

* GEmm + Max bf16 + int8

* Refactor common definitions.

* Refactor common func of mean meansquare example.

* More examples for mean meansquare.

* Update int8 examples and skip them cause of random errors.

* Int4 examples.

* Fix examples for max int4/8

* Tensor conversion for int4 input data for mean meansquare example.

* Remove int4 mean_meansquare example

* Fix int8 mean_meansquare example.

-All ReductionAccData and R<N>DataType have to be F32. The INT32 data
type is giving wrong results.

* Guard int4 with ifdef

* Change int8 example to add_addsquare due to div rounding err.

* Clang format

* Change the return type of common function.

* Get back int8 example with division.

* Remove int8 mean meansquare.

* Use proper cast for BF16 data type.

* Use ck::literals.

* Use proper data type for host tensors & reference.

- Use ReduceAccDataType for reference gemm output data type.
- Cast host reference output tensor to EDataType
- Fix ifdefs for int4.
Co-authored-by: Adam Osewski <aosewski@amd.com>

d00e6115

Padding for attention: bmm+scale+softmax+bmm kernel (#385) · 45adb736

Shaojie WANG authored Aug 31, 2022



* add padding algo for bmm+scale+softmax+bmm. Version for verification

* remove verification code

* remove comments

* add padded bmm scale softmax bmm example

* format

* refactor

* add comments for usages of padding bmm+scale+softmax+bmm
Co-authored-by: Chao Liu <lc.roy86@gmail.com>

45adb736

29 Aug, 2022 2 commits
- Try to workaround flaky GemmSoftmaxGemm tests (#386) · 138faf39
  Anthony Chang authored Aug 29, 2022
```
* avoid potential hazard; flaky test issue persists

* pin down the random seed to avoid flakiness
```
  138faf39
- Fix the slow cpu reference batched gemm kernels. (#388) · 9061d39b
  Illia Silin authored Aug 29, 2022
```
* fix the performance of the batched gemm verification

* fix tabs
```
  9061d39b