Commits · 7719005848cd02cf7d58dbc4bd69fce795e0523e · gaoqiong / composable_kernel

07 Jun, 2023 3 commits
- Formatting · 77190058
  Alan Turner authored Jun 07, 2023
  
  77190058
- Correct namespace for instances · 19d207df
  Alan Turner authored Jun 07, 2023
  
  19d207df
- Remove extra namespaces from instace · ffc5c906
  Alan Turner authored Jun 07, 2023
  
  ffc5c906
06 Jun, 2023 1 commit
- Add unit tests · 8e0beb65
  Alan Turner authored Jun 06, 2023
  
  8e0beb65
02 Jun, 2023 1 commit
- Add missing header · 33f88fa8
  Paul authored Jun 02, 2023
  
  33f88fa8
01 Jun, 2023 2 commits

Updates to ck host library API (#731) · 59e2dc29

Paul Fultz II authored Jun 01, 2023

* Move functions to cpp file

* Move another function to cpp file

* Fix semicolon

* Move solution to common.hpp

* Fix compile errors

* Use enum for data types

* Remove -Werror

* Fix header install

* Fix relative path

* Fix header path

* Install all headers

59e2dc29

No commit message · 7295e38d
Alan Turner authored Jun 01, 2023
```
No commit message
```
7295e38d

31 May, 2023 2 commits
- update copyright headers (#726) · b94fd0b2
  Illia Silin authored May 31, 2023
  
  b94fd0b2
- Install all headers · 9bf51c4c
  Paul authored May 31, 2023
  
  9bf51c4c
30 May, 2023 2 commits

Multiple fixes to GroupedGemm+SplitK (#707) · 70e4eb56

Adam Osewski authored May 30, 2023



* Add license header.

* Reduce number of logged output. Add constant initialization.

* Add functional tests for grouped_gemm with different kbatch value.

* Add debug log informations + remove unused code.

* Don't pass kbatch to CalculateKPadded.

* Turn on logging in grouped gemm and gemm splitk profiler

* Debug: limit number of test cases to run;

* Log more information and initialize with constant value.

* Turn on DEBUG_LOG

* Add more debug log informations.

* Limit the number of instances to compile.

* Use GridwiseGemmPipeline

* Use KBatch to calculate K0

* Multiple DebugLog messages.

* Unit tests for multiple KBatch values.

* Refactoring

* Disable logging
* extract out of if statement KBatch update.

* Uncomment instances.

* Disable DebugLog.

* Use Kbatch when calculate KPadded.

* Fix CGridDesc padding.

* Use available helper functions.

* Uncomment code commented for debuggin.

* Remove unnecessary debug log messages.

* Uncomment previously commented code for debug purposes.

* Add KBatch info to profiler output summary log.

* Add gtests for gemm splitk using ckProfiler API.

* Add more test-cases for different data layout.

* Add more test cases for gemm splitk

* Remove old test.

* Unit tests for MKNK ggemm interface.

* Fix and add more unit-tests.

* Constepxr everything!

* Increase error threshold for fp16 and splitk.

Since we're using fp16 atomic add for splitk there's a
known precision loss.

---------
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

70e4eb56

Add instances for fp16/int8 Gemm kernels (Navi21) (#717) · c2d7a29d

Bartłomiej Kocot authored May 30, 2023

* Add instances for fp16/int8 Gemm kernels (Navi21)

* Extend instances with smaller tiles

* Fix SrcVectorTensor for km_kn_mn int8

c2d7a29d

25 May, 2023 11 commits
- Fix header path · 506798de
  Paul authored May 25, 2023
  
  506798de
- Fix relative path · cddcb856
  Paul authored May 25, 2023
  
  cddcb856
- Fix header install · f89f3440
  Paul authored May 25, 2023
  
  f89f3440
- Use enum for data types · 420c0312
  Paul authored May 25, 2023
  
  420c0312
- Fix compile errors · 3905f4a2
  Paul authored May 25, 2023
  
  3905f4a2
- Move solution to common.hpp · b155a0ac
  Paul authored May 25, 2023
  
  b155a0ac
- Fix semicolon · e42607a5
  Paul authored May 25, 2023
  
  e42607a5
- Move another function to cpp file · 856419e8
  Paul authored May 25, 2023
  
  856419e8
- Move functions to cpp file · dd6fd8bb
  Paul authored May 25, 2023
  
  dd6fd8bb
- Add edatatype and scalars_per_vector workaround · 61386bf9
  Alan Turner authored May 25, 2023
  
  61386bf9
- Add int8 instances · 6289e36f
  Alan Turner authored May 25, 2023
  
  6289e36f
24 May, 2023 2 commits

Use vectors for Ds types and layouts params · dc65f4c6
Alan Turner authored May 24, 2023

dc65f4c6

Pool3d fwd (#697) · 76ec0089

rocking authored May 24, 2023

* Expand the base class of pool2d, prepare to share base class with pool3d

* Add pool3d device op

* Add pool3d f16 example

* Refactor the base class. implement generic pooling in the future

* clang format

* get original index in max pooling

* Add outputindex to base class

* Fix dimension

* Add pooling instance

* Use indexType instead

* Remove useless header

* Extract IndexDataType to template

* Extract pooling reference code

* clang format

* clang format

* Fix typo

* Add tensor stride

* Add missing header

* Add index stride and output stride

* Refine naming

* Add type to base class

* Rename file

* Use proper size

* Fix typo

* Refine naming

* Modify the argument into vector.

* Add max pool profiler

* Refine naming

* Support f32 pool

* Fix typo

* Add avg pool2d fwd in profiler

* clang format

* Rename AccDatatype to ComputeDatatype

* Fix init

* test pool

* Extract variable

* Add client example

* Check the pooling dim

* clang format

* Connect argv and arg_parser

* Add found check

* Remove useless header

* Refine naming

* Adjust the order of device_pool_fwd

76ec0089

25 Apr, 2023 1 commit
- Renaming and fix top level cmakelists · 1ec96717
  Alan Turner authored Apr 25, 2023
  
  1ec96717
24 Apr, 2023 4 commits

Grouped Gemm + SplitK + simplified Kernel Args (#669) · 8bb2bb4a

Adam Osewski authored Apr 24, 2023



* simplify karg in device/grid split-k op

* fix mk_kn_mn instances

* add more instances

* B2C with 3D grid for KSplit

* Remove unused code.

* Use default B2C (3D grid) in grid gemm v2r4r2.

* Device gemm splitk use B2C map.

* Device GroupedGemmXdlSplitKCShuffle

* Example for GroupedGemm Xdl SplitK

* Introduce Device GroupedGemmSplitK

* Fix updating kbatch size.

* Add instance mk-nk-mn

* Enable set kbatch in profiler.

* Add GGemmSplitK mk-kn-mn instances

* Add more instances & split into multiple files.

* minor fix

* tuning

* clean

* disabled failed instances

* use pipe v2

* Ignore arg on not supported arch.

* fix warning

---------
Co-authored-by: carlushuang <carlus.huang@amd.com>
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Jing Zhang <jizhan@amd.com>
Co-authored-by: root <root@ctr-ubbsmc15.amd.com>

8bb2bb4a

Clean up · 600a9870
Alan Turner authored Apr 24, 2023

600a9870
Add jit library · 17acbbf4
Alan Turner authored Apr 24, 2023

17acbbf4

Revise layout of group convolution (#675) · 3eecbfb6

rocking authored Apr 24, 2023

* [What] Remove pure conv int8 instance
[Why] We will never use pure int8 conv in AI, use int8 quantization instead

* Change layout

* Share the kernel parameter

* Support more type of NHWGC for group conv

* Revise client example of conv 2d, use NHWGC layout

* Add instance to cmake

* Revise layout of group conv quantization instance

* Revise layout of external api of group conv quantization

* Revise layout of group conv quantization client example

* Fix clang format

* Add comment to describe meaning of each parameter

3eecbfb6

22 Apr, 2023 1 commit

Put back the split-k gemm code. (#684) · 903cd19c

Illia Silin authored Apr 21, 2023



* simplify karg in device/grid split-k op

* fix mk_kn_mn instances

* add more instances

* use name from tensor layout

---------
Co-authored-by: carlushuang <carlus.huang@amd.com>

903cd19c

17 Apr, 2023 1 commit
- Add (#677) · fd11a4a1
  rocking5566 authored Apr 17, 2023
  
  fd11a4a1
10 Apr, 2023 1 commit

Groupnorm + swish external api (#668) · ed3a2e52

rocking5566 authored Apr 10, 2023

* Rename to proper naming

* Add example of groupnorm + swish

* Extract duplicate code in example

* Add groupnorm + swish instances

* Ractor instance generation, split into multiple cpp file

* Add external api and client example

* Refine profiler message

* Use ck math version of exp

* Refine problem size in example

* Add host version of exp

ed3a2e52

07 Apr, 2023 1 commit
- Issue #666: Revert "simplify karg in device/grid of split-k op (#644)" (#665) · 3248387b
  Jun Liu authored Apr 06, 2023
```
This reverts commit bb5530af.
```
  3248387b
30 Mar, 2023 2 commits

add fp64 instances (#658) · fde6d274
zjing14 authored Mar 30, 2023
```
Co-authored-by: root <root@ctr-ubbsmc15.amd.com>
```
fde6d274

simplify karg in device/grid of split-k op (#644) · bb5530af

carlushuang authored Mar 30, 2023

* simplify karg in device/grid split-k op

* fix mk_kn_mn instances

* add more instances

* use name from tensor layout

bb5530af

29 Mar, 2023 1 commit

Conv + quantization + tanh (#645) · 389e84a8

rocking5566 authored Mar 30, 2023



* Rename file. Prepare to support another activation

* Add comment for quantization

* Extract out_elementop

* Add tanh example

* Add conv + bias + tanh quantization instance

* Add missing parameter

* Refine cmake

* Add external api and client example

* Extract variable in example

* Fix the comment

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

389e84a8

20 Mar, 2023 1 commit

workaround 637 (#640) · 6ae12434

ltqin authored Mar 21, 2023



* add workaround 637

* format

* change id

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

6ae12434

15 Mar, 2023 1 commit

gemm/Conv xdlops + dlops quantization (#625) · 16dc18e0

rocking5566 authored Mar 16, 2023

* Add conv perlayer quantization

* Add gemm_dlops quantization

* Support int8 for innerproduct

* Refine gemm dlops int8 kernel parameter

* Support gfx908(MI100) and gfx90a(MI200)

* clang-format

* Rename example number

* Support different layout for d tensor

* Add conv dlops perchannel quantization example

* Move to example 40

* Extract the common code for different platform (dlops and xdlops)

* Move ot subfolder. Prepare to add other op of quantization

* Refine the quantization instance library

* Add conv dl instances and client example

* Remove unnecessary type

* Add gemm quantization instance

* Add external api and client example

* Refine num_bytes

* Separete different layout to different cpp

* Add more xdl instances

* Revert "Remove unnecessary type"

This reverts commit 82086918

.

* Remove CShuffleDataType in dlops
Let acc and CShuffleDataType be the same in xdlops

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

16dc18e0

08 Mar, 2023 1 commit

GroupedGEMM + Gelu client example/instances/profiler (#614) · 9096b1c7

Adam Osewski authored Mar 08, 2023



* Grouped gemm + Gelu instances.

* Device Instance Factory for GroupedGemm+Gelu

* Client example

* Rangify fill helper functions.

* Fix name clash.

* Profiler for grouped_gemm+gelu

* No need to use full namespace name.

* Add check for MRaw divisible by vector load.

* Ugly fix for big errors.

* Add grouped_gemm+gelu to profiler CMakelists.

* Store in argument additional info.

* Information about Mraw, Nraw, Kraw values.

* Use FastGelu instead of Gelu.

* Change client ex to use FastGelu

* Remove relaxed error precision.

* Remove duplicate output elementwise-op

---------
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

9096b1c7

15 Feb, 2023 1 commit

Improve normalization (#580) · 6a6163a3

rocking5566 authored Feb 16, 2023

* Sync the order of type string with template parameter

* Add more instances

* Check the vector size and remove redundant var

* Extract var to static, prepare to separate sweep once kernel

* Separate sweeponce flow and optimize the flow

* 1. Rename AccDatatype in normalization to computeData
2. Rename AccElementwiseOperation to YElementwiseOperation in normalization

* Remove useless code

* Update naive variance kernel

* Refine string

* Fix typo

* Support naive variance for device_normalization

* Check the blocksize

* Share the VGPR of x and y

* Share the VGPR of gamma and beta

* Add more instances

* Support fp16 sqrt for experiment

* Add CHANGELOG

* Fix typo

* clang-format

6a6163a3