Commits · 98f349c6735a5b558e78055000767f2875bd81a7 · gaoqiong / composable_kernel

26 Sep, 2023 1 commit
- Minor stylistic fixes · 98f349c6
  Bartlomiej Kocot authored Sep 26, 2023
  
  98f349c6
22 Sep, 2023 1 commit
- Disable add instances functions for disabled data types · 728b8695
  Bartlomiej Kocot authored Sep 22, 2023
  
  728b8695
21 Sep, 2023 4 commits

Disable tests for disabled dtypes · 63fedbd0
Bartlomiej Kocot authored Sep 21, 2023

63fedbd0
Minor fixes for dtypes and client examples · c0c6fa59
Bartlomiej Kocot authored Sep 21, 2023

c0c6fa59
Add column to image kernel · ad24acb6
Bartlomiej Kocot authored Sep 20, 2023

ad24acb6

Refactoring cmake files to build data types separately. (#932) · bba085d2

Illia Silin authored Sep 20, 2023

* refactor cmake files for the tests

* refactor cmake files for examples

* fix cmake for gemm example

* fix the cmake file for all examples

* add splitting by data types in gemm_splitk instance header

* rename test to reflect only dl instances are used

* clean up CI workspace, update cmake for instances

* change the jenkinsfile syntax

* build all instances except DL on gfx11

* move workspace cleanup after stages

* clean up workspace after every stage

* isolate data types in grouped_conv_fwd header

* isolate dl instances for grouped_conv2d_fwd

* fix syntax

* fix cmake and batchnorm instances

* fix typo

* fix reduction instances

* fix grouped_conv headers

* fix syntax

* replace parsing logic for instances, replace bfp16 with bf16

* fix the client examples build

* clean up DTYPES from instances cmake files

* update the parsing logic in cmake files

* make an exception for reduction kernels

* update few remaining cmake files to handle DTYPES

* fix syntax

* fix cmake conflicts

* replace f8 with fp8 test name

* resolve conflicts for dpp instances

bba085d2

20 Sep, 2023 1 commit
- fix the building of the amd-stg-open compiler (#927) · 58817bf9
  Illia Silin authored Sep 19, 2023
  
  58817bf9
19 Sep, 2023 2 commits
- update to rocm5.7 by default (#925) · 718065eb
  Illia Silin authored Sep 19, 2023
```
* update to rocm5.7 by default

* fix jenkinsfile syntax
```
  718065eb
- fix the ckprofiler package build in a loop (#926) · 5a4416c8
  Illia Silin authored Sep 19, 2023
  
  5a4416c8
18 Sep, 2023 2 commits
- Fix DL GEMM instances with too large vector size (#901) · 63cd4592
  Bartlomiej Wroblewski authored Sep 18, 2023
```
* Fix vector lengths of DL GEMM instances with padding
* Add checks for correctness of vector lenghts in DL GEMM
```
  63cd4592
- Add native conversions fp8<->fp32 (#908) · f17af2e9
  Rostyslav Geyyer authored Sep 17, 2023
```
* Add native conversions

* Add bf8 conversions
```
  f17af2e9
15 Sep, 2023 2 commits

Stylistic improvements for grouped convolution code · bc2d0583
Bartlomiej Kocot authored Sep 13, 2023
```
Remove unnecessary ignoring

Update test/grouped_convnd_bwd_weight/test_grouped_convnd_bwd_weight.cpp
```
bc2d0583

Add fp16/fp8 support into Grouped gemm FixedNK (#874) · f9d0eddb

zjing14 authored Sep 14, 2023



* move all arguments into device

* add b2c_tile_map

* add examples

* add SetDeviceKernelArgs

* dedicated fixed_nk solution

* init client api

* add grouped_gemm_bias example

* add a instance

* add instances

* formatting

* fixed cmake

* Update EnableCompilerWarnings.cmake

* Update cmake-ck-dev.sh

* clean; fixed comments

* fixed comment

* add instances for fp32 output

* add instances for fp32 output

* add fp32 out client example

* fixed CI

* init commit for kbatch

* add splitk gridwise

* format

* fixed

* clean deviceop

* clean code

* finish splitk

* fixed instances

* change m_loops to tile_loops

* add setkbatch

* clean code

* add splitK+bias

* add instances

* opt mk_nk instances

* clean examples

* fixed CI

* remove zero

* finished non-zero

* clean

* clean code

* optimized global_barrier

* fixed ci

* fixed CI

* instance and client

* removed AddBias

* format

* fixed CI

* fixed CI

* move 20_grouped_gemm to 21_grouped_gemm

* clean

* formatting

* clean

* clean

* fixed computeType

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

f9d0eddb

14 Sep, 2023 1 commit
- change the cmake update method (#918) · 0d8efaa1
  Illia Silin authored Sep 14, 2023
  
  0d8efaa1
13 Sep, 2023 4 commits

[Cmake] Set cmake default build type Release and path to /opt/rocm (#914) · 5fe687fa
Jun Liu authored Sep 13, 2023

5fe687fa

Add grouped conv bwd weight dl instances and new layout (#897) · 475188ca

Bartłomiej Kocot authored Sep 13, 2023

* Add grouped conv bwd weight dl instances and new layout

* Add M and N padding

* Remove todo comment

* Enable grouped conv fwd dl k,c=1 generic instance

* Comment fixes

475188ca

fixed fp8 issues (#894) · a66d14ed

zjing14 authored Sep 12, 2023



* fixed fp8 init; and reference gemm

* Update host_tensor_generator.hpp

* fixed convert

* fixed reference gemm

* fixed comments

* fixed comments

* fixed ci

* fixed computeType

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

a66d14ed

Add a switch to build DL kernels and build them with staging compiler. (#907) · 74d32f07
Illia Silin authored Sep 12, 2023
```
* enable building DL kernels with the daily staging compiler

* move the DL_KERNELS flag to another function
```
74d32f07

12 Sep, 2023 3 commits

Refactor f8_t, add bf8_t (#792) · 62d4af74

Rostyslav Geyyer authored Sep 12, 2023

* Refactor f8_t to add bf8_t

* Add check_err impl for f8_t

* Update fp8 test

* Format

* Revert the fix

* Update vector_type implementation

* Add bf8 test

* Add bf8, use BitInt types

* Add bf8 conversion methods

* Update type_convert for fp8/bf8

* Add check_err fp8/bf8 support

* Add subnorm fp8 tests

* Add subnorm bf8 tests

* Fix conversion

* Add bf8 cmake bindings

* Add macros to enable build with disabled fp8/bf8

* Remove is_native method

* Update flag combination for mixed precision instances

* Add more flag checks

* Add another flag to a client example

* Add type traits, decouple f8/bf8 casting

* Clean up

* Decouple fp8 and bf8 flags

* Remove more redundant flags

* Remove leftover comments

62d4af74

clean up the workspace after every stage (#909) · 56c0279b
Illia Silin authored Sep 12, 2023

56c0279b
Add new instances and support for small cases in DPP8 GEMM (#896) · 547dbcfb
Bartlomiej Wroblewski authored Sep 12, 2023

547dbcfb

11 Sep, 2023 1 commit
- Add codeowners for documentation (#902) · 85e2e1e2
  Sam Wu authored Sep 11, 2023
```
Co-authored-by: samjwu <samjwu@users.noreply.github.com>
```
  85e2e1e2
08 Sep, 2023 2 commits

Enable DPP8 GEMM on Navi3 (#892) · 8f84a012
Bartlomiej Wroblewski authored Sep 08, 2023

8f84a012

[Navi3x] Add fp16/int8 wmma conv forward instances (#746) · 562b4cec

Haocong WANG authored Sep 08, 2023



* fix wmma gemm int8; add grouped conv int8 example

* Add int8 gemm-bilinear instances

* compile sanity check unknown

* Sanity pass + clang-format

* add int8 conv profiler instances

* solve merge conflict

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

562b4cec

06 Sep, 2023 3 commits

Redesign the DPP8 GEMM kernel to use warp-wise component (#863) · 37a8c1f7

Bartlomiej Wroblewski authored Sep 06, 2023

* Redesign the DPP8 GEMM kernel to use warp-wise component

* Review: Improve error messages

* Review: Remove unnecessary empty lines

* Review: Fix M, N per thread names

* Review: Rename mfma_input_type to dpp_input_type

* Review: Fix tensor adaptor; remove unnecessary element

* Review: Remove calls to dpp_gemm's MakeCDescriptor

* Review: Add blockwise doc, change function names to include dimension names

* Review: Remove duplicated code; Move Block2CtileMap alias to the top of the file

* Review: Add __restrict__ keywords

* Review: Use MatrixPadder for padding A, B, C matrices

* Review: Remove hardcoded datatypes

* Review: Change names from FloatX to XDataType

* Review: Introduce AK0 and BK0 instead of a single K0

* Review: Remove construction of dpp_datatypes object

* Review: Rename DppInstrRunner to DppLanegroupGemm

37a8c1f7

added padding of K into gemm_v2r3 (#887) · 3786bfe1

zjing14 authored Sep 06, 2023



* added kpad support into v2r3

* add generic instances

* fixed comments

* fixed mnk padding

* Update device_batched_gemm_xdl.hpp

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

3786bfe1

Fixed fp8 gemm (#882) · a61b8b78

zjing14 authored Sep 06, 2023



* add generic instances; fixed initi with fp8

* fixed comment

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

a61b8b78

05 Sep, 2023 6 commits
- set warnings as errors in doxygen (#864) · aae4df55
  Illia Silin authored Sep 05, 2023
  
  aae4df55
- Add contribution guidelines to the documentation (#843) · 1e1f82d9
  Bartlomiej Wroblewski authored Sep 05, 2023
```
Add contribution guidelines to the documentation
```
  1e1f82d9
- fix syntax (#890) · 7dcb14d9
  Illia Silin authored Sep 05, 2023
  
  7dcb14d9
- Add image to column kernel (#867) · 0077eeb3
  Bartłomiej Kocot authored Sep 05, 2023
```
* Add image to column kernel

* Add instances, tests, profiler, example

* Add client example

* Several fixes of image to column

* Fix variable name in device_image_to_column_impl

* Several fixes of image to column profiler

* Fix num_btype calculation

* Make new mesaurements for correct bytes calculation
```
  0077eeb3
- Add nhwgc dl generic instances for grouped conv fwd (#879) · 0c9a1d25
  Bartłomiej Kocot authored Sep 05, 2023
  
  0c9a1d25
- Fix K padding calculation for grouped conv data (#876) · c981f6d0
  Bartłomiej Kocot authored Sep 05, 2023
```
* Fix K padding calculation for grouped conv data

* Restore previous padd for 1x1 specialization
```
  c981f6d0
04 Sep, 2023 1 commit
- Fix config header installation (#880) · bd8024b8
  Lauren Wrubleski authored Sep 04, 2023
  
  bd8024b8
31 Aug, 2023 3 commits

Grouped Gemm with Fixed K and N with SplitK (#818) · f5ec04f0

zjing14 authored Aug 31, 2023



* move all arguments into device

* add b2c_tile_map

* add examples

* add SetDeviceKernelArgs

* dedicated fixed_nk solution

* init client api

* add grouped_gemm_bias example

* add a instance

* add instances

* formatting

* fixed cmake

* Update EnableCompilerWarnings.cmake

* Update cmake-ck-dev.sh

* clean; fixed comments

* fixed comment

* add instances for fp32 output

* add instances for fp32 output

* add fp32 out client example

* fixed CI

* init commit for kbatch

* add splitk gridwise

* format

* fixed

* clean deviceop

* clean code

* finish splitk

* fixed instances

* change m_loops to tile_loops

* add setkbatch

* clean code

* add splitK+bias

* add instances

* opt mk_nk instances

* clean examples

* fixed CI

* remove zero

* finished non-zero

* clean

* clean code

* optimized global_barrier

* fixed ci

* fixed CI

* removed AddBias

* format

* fixed CI

* fixed CI

* move 20_grouped_gemm to 21_grouped_gemm

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

f5ec04f0

MaxPool & AvgPool bwd instances, test, ckProfiler, client example (#861) · 866377de

rocking authored Aug 31, 2023

* Add maxpool instances

* Rename index pool to max pool.

* Add maxpool bwd bf16 instances

* Add avg pool bwd instances

* Rename avgpool and maxpool to avg_pool3d and max_pool

* Add bf16 pool fwd instances

* Add max pool bwd to ckProfiler

* Add avg pool3d bwd to ckProfiler

* Add avg pool bwd test

* Fix bug of reference pool fwd (dilation)

* Fix bug of max pool bwd  (dilation and initZero)

* Support bf16 compute data type

* Force compute type be f32. Because atomicAdd only support f32

* Add max pool bwd test

* Rename folder

* Rename pool

* Add max pool bwd client example

* Add avg pool bwd client example

* Add missing workspace

* clang format

* Rename macro

* remove useless header

* remove useless layout

866377de

fix gemm_streamk example on mi300 (#875) · bf1912ed
Illia Silin authored Aug 30, 2023

bf1912ed

30 Aug, 2023 1 commit
- Add number of error when fail (#868) · 9e86ebd6
  Bartłomiej Kocot authored Aug 30, 2023
  
  9e86ebd6
29 Aug, 2023 1 commit

add an example of customized type convert - bfp16_rtn (#869) · 38ada109

zjing14 authored Aug 29, 2023



* add an example of customized bfp16_rtn

* fixed threadwise_copy

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

38ada109

28 Aug, 2023 1 commit

Fp16/fp8 mixed-precision Gemm with multiply+add fusion (#865) · 31ea132a

zjing14 authored Aug 28, 2023



* add compute_type

* add multiply_add ckProfiler

* add f8_fp16 support

* clean

* clean

* fixed lds size calc

* format

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

31ea132a