Commits · 4daedf8ca56f3bd93481708bd9d762045839ec20 · gaoqiong / composable_kernel

05 Oct, 2023 2 commits
- Revert "Add support for mixed precision in contraction scale and bilinear" (#967) · 4daedf8c
  Illia Silin authored Oct 05, 2023
```
* Revert "Add support for mixed precision in contraction scale and bilinear (#936)"

This reverts commit f0748506.

* revert commits #957 and #960
```
  4daedf8c
- remove example 60 (#963) · 570ff3dd
  zjing14 authored Oct 05, 2023
```
Co-authored-by: Jing Zhang <jizha@amd.com>
```
  570ff3dd
04 Oct, 2023 3 commits

Grouped conv bwd data with fp16 input and bf8fp8 comp (#962) · 04f93aad

zjing14 authored Oct 04, 2023



* Add f8 bf8 gemm example

* Add element-wise ops

* Add intrinsics

* Update reference calculation

* Add an additional type option for xdlops gemm

* Fix build process

* Add bf8 to buffer addressing

* Update blockwise op, split typeA and typeB

* Update for compatibility

* Uppdate naming to f8->fp8

* Update naming

* Format

* Update naming (#937)

* Add a client example

* Add computetypes to device and gridwise ops

* Add instances, update instance factory

* Format

* Fix a flag

* Add ckProfiler mode

* Fix typos

* Add an example

* Add bf8 generator

* add bf8 mfma; fixed type_convert for bf8

* move verfication ahead of timing

* Update reference calculation

* Fix reference

* Narrow down float init range

* Fix bf8 bf8 mfma

* Add bf8 @ fp8 mfma

* Update example

* Update instances

* Update profiler api

* Update for compatibility

* Format

* Remove extra example

* Clean up

* workaround convert

* added instance of f16_bf8f8, and client example

* fixed mfma selector

* format

---------
Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com>
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
Co-authored-by: Jing Zhang <jizha@amd.com>

04f93aad

Add conv bwd weight fp16 comp bf8 fp8 op, instances and example (#945) · 42facfc6

Rostyslav Geyyer authored Oct 04, 2023



* Add f8 bf8 gemm example

* Add element-wise ops

* Add intrinsics

* Update reference calculation

* Add an additional type option for xdlops gemm

* Fix build process

* Add bf8 to buffer addressing

* Update blockwise op, split typeA and typeB

* Update for compatibility

* Uppdate naming to f8->fp8

* Update naming

* Format

* Update naming (#937)

* Add a client example

* Add computetypes to device and gridwise ops

* Add instances, update instance factory

* Format

* Fix a flag

* Add ckProfiler mode

* Fix typos

* Add an example

* Add bf8 generator

* add bf8 mfma; fixed type_convert for bf8

* move verfication ahead of timing

* Update reference calculation

* Fix reference

* Narrow down float init range

* Fix bf8 bf8 mfma

* Add bf8 @ fp8 mfma

* Update example

* Update instances

* Update profiler api

* Update for compatibility

* Format

* Remove extra example

* Clean up

* workaround convert

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

42facfc6

3d grouped conv fwd with input/output fp16 and comp fp8 (#931) · e921e1f0

zjing14 authored Oct 03, 2023



* add f8 comp instance

* fixed

* fixed comments

* rename

* fixed dtype

* format

* fixed CI

* fixed ci

* add missing ComputeType

* fixed cit

* fixed

* Update cmake-ck-dev.sh

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

e921e1f0

03 Oct, 2023 3 commits
- changed test for grouped_gemm to be random (#959) · 5311d1b3
  zjing14 authored Oct 03, 2023
```
Co-authored-by: Jing Zhang <jizha@amd.com>
```
  5311d1b3
- Fixed contraction issues (#960) · aa46039f
  zjing14 authored Oct 03, 2023
```
* add missing ComputeType

* fixed

* Update cmake-ck-dev.sh

---------
Co-authored-by: Jing Zhang <jizha@amd.com>
```
  aa46039f
- add generic instances (#947) · f477fca4
  zjing14 authored Oct 03, 2023
```
Co-authored-by: Jing Zhang <jizha@amd.com>
```
  f477fca4
02 Oct, 2023 3 commits

Add fp8 @ bf8 gemm support and example (#933) · bd09b5c5

Rostyslav Geyyer authored Oct 02, 2023

* Add f8 bf8 gemm example

* Add element-wise ops

* Add intrinsics

* Update reference calculation

* Add an additional type option for xdlops gemm

* Fix build process

* Add bf8 to buffer addressing

* Update blockwise op, split typeA and typeB

* Update for compatibility

* Uppdate naming to f8->fp8

* Update naming

* Format

bd09b5c5

get rid of gfx900/906, set rocm5.7 as default (#958) · 59dbb01f
Illia Silin authored Oct 02, 2023

59dbb01f

Contraction multi abd (#957) · 9d58c421

zjing14 authored Oct 02, 2023



* add gridwise_multi_abd

* move element_op into RunRead

* merge element_wise op with data read

* add multiABD example

* allow packed elementwise_op

* changed example

* clean

* clean

* add is_detected

* fix

* minor fix

* add scaleAdd_vec4 example

* init commit for contraction_multi_ABD

* add examples

* add examples of multiA and broadcast

* update example

* fixed comments

* Update cmake-ck-dev.sh

* Update cmake-ck-dev.sh

* Add comments into the example

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

9d58c421

29 Sep, 2023 2 commits

add gfx942 target to the daily ckprofiler package (#955) · 6b5f6473
Illia Silin authored Sep 29, 2023

6b5f6473

Add support for mixed precision in contraction scale and bilinear (#936) · f0748506

Bartlomiej Wroblewski authored Sep 29, 2023

* Extract common functionality to separate files

* Reference contraction: Remove incorrect consts from type_converts

* Reference contraction: Add missing type_convert for dst value

* Reference contraction: Fix incorrect order of B matrix dimensions

* Add support for mixed precision in contraction scale and bilinear

* Move using statements from instances to a common file

* Move using statements from examples to a common file

* Fix the order of B matrix dimensions across examples and profiler

* Fix the computation of error threshold

* Make ComputeDataType an optional argument

* Include possible DataType -> ComputeDataType casting error in the threshold

* Remove commented code

f0748506

28 Sep, 2023 2 commits

Add grouped conv bwd data wmma (#950) · cb538740

Bartłomiej Kocot authored Sep 28, 2023

* Add grouped conv bwd data wmma

* Fix copyrights

* Add instances with smaller NPerBlock

* Update interface test

* Minor stylistic fixes

* Minor stylistic fixes

cb538740

Add grouped convolution changes to changelog (#952) · 271ef645

Bartłomiej Kocot authored Sep 28, 2023



* Add grouped convolution changes to changelog

* Fix 0.2.0 ck release rocm version

* Suggested CHANGELOG.md edits

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

---------
Co-authored-by: Lisa <lisajdelaney@gmail.com>

271ef645

27 Sep, 2023 5 commits

Fix gemm_splitk test, add hip_check_error after kernel calls in kernel_launch. (#951) · bc1108bb

Illia Silin authored Sep 27, 2023



* Added error check after kernel launch (#919)
Co-authored-by: Xiaodong Wang <xdwang@meta.com>
Co-authored-by: Xiaodong Wang <xw285@cornell.edu>

* remove M=0 test cases for test_gemm_splitk

---------
Co-authored-by: Xiaodong Wang <xdwang@meta.com>
Co-authored-by: Xiaodong Wang <xw285@cornell.edu>

bc1108bb

Handle type conversions to a const datatype (#944) · f4af5aed

Bartlomiej Wroblewski authored Sep 27, 2023

* Handle type conversions to a const datatype

* Review: Handle X being const data type as well

* Review: Remove typo

f4af5aed

Add column to image kernel (#930) · e2243a4d

Bartłomiej Kocot authored Sep 27, 2023

* Add column to image kernel

* Minor fixes for dtypes and client examples

* Disable tests for disabled dtypes

* Disable add instances functions for disabled data types

* Minor stylistic fixes

* Revert "Disable add instances functions for disabled data types"

This reverts commit 728b8695.

* Instances reduction

* Add comments in device_column_to_image_impl

* Update changelog and Copyrights

* Improve changelog

e2243a4d

Add multiple A/B support (#906) · 11676c7e

zjing14 authored Sep 26, 2023



* add gridwise_multi_abd

* move element_op into RunRead

* merge element_wise op with data read

* add multiABD example

* allow packed elementwise_op

* changed example

* clean

* clean

* add is_detected

* fix

* minor fix

* add scaleAdd_vec4 example

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

11676c7e

Use lower case for ckprofiler package. (#948) · 420b5a03
Illia Silin authored Sep 26, 2023
```
* split ckProfiler gfx9 package into gfx90 and gfx94

* use lower case for package names
```
420b5a03

26 Sep, 2023 4 commits

Fixed Gemmv2r3 kpad (#938) · 48ba6e8a

zjing14 authored Sep 26, 2023



* added kpad support into v2r3

* add generic instances

* fixed comments

* fixed mnk padding

* Update device_batched_gemm_xdl.hpp

* fixed kpad

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

48ba6e8a

Add fp8 gemm instances (#920) · 94bfa502
Rostyslav Geyyer authored Sep 26, 2023
```
* Add fp8 gemm instances

* Update instance naming
```
94bfa502
split ckProfiler gfx9 package into gfx90 and gfx94 (#946) · 0b296a27
Illia Silin authored Sep 26, 2023

0b296a27

Resolve some data type issues and cmake policy. (#940) · 2ea75bd6

Illia Silin authored Sep 26, 2023

* split the types in gemm_bilinear instances, add condition to cmake policy

* fix syntax

* split the data types in batchnorm examples

* fix the batchnorm_bwd test

* fix types in the batchnorm_bwd test

2ea75bd6

23 Sep, 2023 1 commit

Add 3d grouped conv fwd wmma instances (#935) · c9553832

Bartłomiej Kocot authored Sep 23, 2023

* Add 3d grouped conv fwd wmma instances

* Refactor fwd conv tests

* Split wmma instances for each specialization

* Minor stylistic fixes

c9553832

22 Sep, 2023 1 commit
- Update naming (#937) · ede64ae9
  Rostyslav Geyyer authored Sep 22, 2023
  
  ede64ae9
21 Sep, 2023 1 commit

Refactoring cmake files to build data types separately. (#932) · bba085d2

Illia Silin authored Sep 20, 2023

* refactor cmake files for the tests

* refactor cmake files for examples

* fix cmake for gemm example

* fix the cmake file for all examples

* add splitting by data types in gemm_splitk instance header

* rename test to reflect only dl instances are used

* clean up CI workspace, update cmake for instances

* change the jenkinsfile syntax

* build all instances except DL on gfx11

* move workspace cleanup after stages

* clean up workspace after every stage

* isolate data types in grouped_conv_fwd header

* isolate dl instances for grouped_conv2d_fwd

* fix syntax

* fix cmake and batchnorm instances

* fix typo

* fix reduction instances

* fix grouped_conv headers

* fix syntax

* replace parsing logic for instances, replace bfp16 with bf16

* fix the client examples build

* clean up DTYPES from instances cmake files

* update the parsing logic in cmake files

* make an exception for reduction kernels

* update few remaining cmake files to handle DTYPES

* fix syntax

* fix cmake conflicts

* replace f8 with fp8 test name

* resolve conflicts for dpp instances

bba085d2

20 Sep, 2023 1 commit
- fix the building of the amd-stg-open compiler (#927) · 58817bf9
  Illia Silin authored Sep 19, 2023
  
  58817bf9
19 Sep, 2023 2 commits
- update to rocm5.7 by default (#925) · 718065eb
  Illia Silin authored Sep 19, 2023
```
* update to rocm5.7 by default

* fix jenkinsfile syntax
```
  718065eb
- fix the ckprofiler package build in a loop (#926) · 5a4416c8
  Illia Silin authored Sep 19, 2023
  
  5a4416c8
18 Sep, 2023 2 commits
- Fix DL GEMM instances with too large vector size (#901) · 63cd4592
  Bartlomiej Wroblewski authored Sep 18, 2023
```
* Fix vector lengths of DL GEMM instances with padding
* Add checks for correctness of vector lenghts in DL GEMM
```
  63cd4592
- Add native conversions fp8<->fp32 (#908) · f17af2e9
  Rostyslav Geyyer authored Sep 17, 2023
```
* Add native conversions

* Add bf8 conversions
```
  f17af2e9
15 Sep, 2023 2 commits

Stylistic improvements for grouped convolution code · bc2d0583
Bartlomiej Kocot authored Sep 13, 2023
```
Remove unnecessary ignoring

Update test/grouped_convnd_bwd_weight/test_grouped_convnd_bwd_weight.cpp
```
bc2d0583

Add fp16/fp8 support into Grouped gemm FixedNK (#874) · f9d0eddb

zjing14 authored Sep 14, 2023



* move all arguments into device

* add b2c_tile_map

* add examples

* add SetDeviceKernelArgs

* dedicated fixed_nk solution

* init client api

* add grouped_gemm_bias example

* add a instance

* add instances

* formatting

* fixed cmake

* Update EnableCompilerWarnings.cmake

* Update cmake-ck-dev.sh

* clean; fixed comments

* fixed comment

* add instances for fp32 output

* add instances for fp32 output

* add fp32 out client example

* fixed CI

* init commit for kbatch

* add splitk gridwise

* format

* fixed

* clean deviceop

* clean code

* finish splitk

* fixed instances

* change m_loops to tile_loops

* add setkbatch

* clean code

* add splitK+bias

* add instances

* opt mk_nk instances

* clean examples

* fixed CI

* remove zero

* finished non-zero

* clean

* clean code

* optimized global_barrier

* fixed ci

* fixed CI

* instance and client

* removed AddBias

* format

* fixed CI

* fixed CI

* move 20_grouped_gemm to 21_grouped_gemm

* clean

* formatting

* clean

* clean

* fixed computeType

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

f9d0eddb

14 Sep, 2023 1 commit
- change the cmake update method (#918) · 0d8efaa1
  Illia Silin authored Sep 14, 2023
  
  0d8efaa1
13 Sep, 2023 4 commits

[Cmake] Set cmake default build type Release and path to /opt/rocm (#914) · 5fe687fa
Jun Liu authored Sep 13, 2023

5fe687fa

Add grouped conv bwd weight dl instances and new layout (#897) · 475188ca

Bartłomiej Kocot authored Sep 13, 2023

* Add grouped conv bwd weight dl instances and new layout

* Add M and N padding

* Remove todo comment

* Enable grouped conv fwd dl k,c=1 generic instance

* Comment fixes

475188ca

fixed fp8 issues (#894) · a66d14ed

zjing14 authored Sep 12, 2023



* fixed fp8 init; and reference gemm

* Update host_tensor_generator.hpp

* fixed convert

* fixed reference gemm

* fixed comments

* fixed comments

* fixed ci

* fixed computeType

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

a66d14ed

Add a switch to build DL kernels and build them with staging compiler. (#907) · 74d32f07
Illia Silin authored Sep 12, 2023
```
* enable building DL kernels with the daily staging compiler

* move the DL_KERNELS flag to another function
```
74d32f07

12 Sep, 2023 1 commit

Refactor f8_t, add bf8_t (#792) · 62d4af74

Rostyslav Geyyer authored Sep 12, 2023

* Refactor f8_t to add bf8_t

* Add check_err impl for f8_t

* Update fp8 test

* Format

* Revert the fix

* Update vector_type implementation

* Add bf8 test

* Add bf8, use BitInt types

* Add bf8 conversion methods

* Update type_convert for fp8/bf8

* Add check_err fp8/bf8 support

* Add subnorm fp8 tests

* Add subnorm bf8 tests

* Fix conversion

* Add bf8 cmake bindings

* Add macros to enable build with disabled fp8/bf8

* Remove is_native method

* Update flag combination for mixed precision instances

* Add more flag checks

* Add another flag to a client example

* Add type traits, decouple f8/bf8 casting

* Clean up

* Decouple fp8 and bf8 flags

* Remove more redundant flags

* Remove leftover comments

62d4af74