Commits · cbaa29dde0289a0c94f5f005c02e7d1b775be130 · gaoqiong / composable_kernel

02 Oct, 2023 5 commits

Merge remote-tracking branch 'origin/develop' into 3d_grouped_conv_fp16_comp_fp8 · cbaa29dd
Jing Zhang authored Oct 02, 2023

cbaa29dd

Add fp8 @ bf8 gemm support and example (#933) · bd09b5c5

Rostyslav Geyyer authored Oct 02, 2023

* Add f8 bf8 gemm example

* Add element-wise ops

* Add intrinsics

* Update reference calculation

* Add an additional type option for xdlops gemm

* Fix build process

* Add bf8 to buffer addressing

* Update blockwise op, split typeA and typeB

* Update for compatibility

* Uppdate naming to f8->fp8

* Update naming

* Format

bd09b5c5

get rid of gfx900/906, set rocm5.7 as default (#958) · 59dbb01f
Illia Silin authored Oct 02, 2023

59dbb01f
Merge branch 'develop' into 3d_grouped_conv_fp16_comp_fp8 · 9b062051
zjing14 authored Oct 02, 2023

9b062051

Contraction multi abd (#957) · 9d58c421

zjing14 authored Oct 02, 2023



* add gridwise_multi_abd

* move element_op into RunRead

* merge element_wise op with data read

* add multiABD example

* allow packed elementwise_op

* changed example

* clean

* clean

* add is_detected

* fix

* minor fix

* add scaleAdd_vec4 example

* init commit for contraction_multi_ABD

* add examples

* add examples of multiA and broadcast

* update example

* fixed comments

* Update cmake-ck-dev.sh

* Update cmake-ck-dev.sh

* Add comments into the example

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

9d58c421

29 Sep, 2023 2 commits

add gfx942 target to the daily ckprofiler package (#955) · 6b5f6473
Illia Silin authored Sep 29, 2023

6b5f6473

Add support for mixed precision in contraction scale and bilinear (#936) · f0748506

Bartlomiej Wroblewski authored Sep 29, 2023

* Extract common functionality to separate files

* Reference contraction: Remove incorrect consts from type_converts

* Reference contraction: Add missing type_convert for dst value

* Reference contraction: Fix incorrect order of B matrix dimensions

* Add support for mixed precision in contraction scale and bilinear

* Move using statements from instances to a common file

* Move using statements from examples to a common file

* Fix the order of B matrix dimensions across examples and profiler

* Fix the computation of error threshold

* Make ComputeDataType an optional argument

* Include possible DataType -> ComputeDataType casting error in the threshold

* Remove commented code

f0748506

28 Sep, 2023 5 commits

Merge branch 'develop' into 3d_grouped_conv_fp16_comp_fp8 · a937fad1
zjing14 authored Sep 28, 2023

a937fad1

Add grouped conv bwd data wmma (#950) · cb538740

Bartłomiej Kocot authored Sep 28, 2023

* Add grouped conv bwd data wmma

* Fix copyrights

* Add instances with smaller NPerBlock

* Update interface test

* Minor stylistic fixes

* Minor stylistic fixes

cb538740

Add grouped convolution changes to changelog (#952) · 271ef645

Bartłomiej Kocot authored Sep 28, 2023



* Add grouped convolution changes to changelog

* Fix 0.2.0 ck release rocm version

* Suggested CHANGELOG.md edits

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

---------
Co-authored-by: Lisa <lisajdelaney@gmail.com>

271ef645

format · 956426b5
Jing Zhang authored Sep 28, 2023

956426b5
merge conflict · b43ac5ef
Jing Zhang authored Sep 28, 2023

b43ac5ef

27 Sep, 2023 6 commits

fixed dtype · b892a14a
Jing Zhang authored Sep 27, 2023

b892a14a

Fix gemm_splitk test, add hip_check_error after kernel calls in kernel_launch. (#951) · bc1108bb

Illia Silin authored Sep 27, 2023



* Added error check after kernel launch (#919)
Co-authored-by: Xiaodong Wang <xdwang@meta.com>
Co-authored-by: Xiaodong Wang <xw285@cornell.edu>

* remove M=0 test cases for test_gemm_splitk

---------
Co-authored-by: Xiaodong Wang <xdwang@meta.com>
Co-authored-by: Xiaodong Wang <xw285@cornell.edu>

bc1108bb

Handle type conversions to a const datatype (#944) · f4af5aed

Bartlomiej Wroblewski authored Sep 27, 2023

* Handle type conversions to a const datatype

* Review: Handle X being const data type as well

* Review: Remove typo

f4af5aed

Add column to image kernel (#930) · e2243a4d

Bartłomiej Kocot authored Sep 27, 2023

* Add column to image kernel

* Minor fixes for dtypes and client examples

* Disable tests for disabled dtypes

* Disable add instances functions for disabled data types

* Minor stylistic fixes

* Revert "Disable add instances functions for disabled data types"

This reverts commit 728b8695.

* Instances reduction

* Add comments in device_column_to_image_impl

* Update changelog and Copyrights

* Improve changelog

e2243a4d

Add multiple A/B support (#906) · 11676c7e

zjing14 authored Sep 26, 2023



* add gridwise_multi_abd

* move element_op into RunRead

* merge element_wise op with data read

* add multiABD example

* allow packed elementwise_op

* changed example

* clean

* clean

* add is_detected

* fix

* minor fix

* add scaleAdd_vec4 example

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

11676c7e

Use lower case for ckprofiler package. (#948) · 420b5a03
Illia Silin authored Sep 26, 2023
```
* split ckProfiler gfx9 package into gfx90 and gfx94

* use lower case for package names
```
420b5a03

26 Sep, 2023 4 commits

Fixed Gemmv2r3 kpad (#938) · 48ba6e8a

zjing14 authored Sep 26, 2023



* added kpad support into v2r3

* add generic instances

* fixed comments

* fixed mnk padding

* Update device_batched_gemm_xdl.hpp

* fixed kpad

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

48ba6e8a

Add fp8 gemm instances (#920) · 94bfa502
Rostyslav Geyyer authored Sep 26, 2023
```
* Add fp8 gemm instances

* Update instance naming
```
94bfa502
split ckProfiler gfx9 package into gfx90 and gfx94 (#946) · 0b296a27
Illia Silin authored Sep 26, 2023

0b296a27

Resolve some data type issues and cmake policy. (#940) · 2ea75bd6

Illia Silin authored Sep 26, 2023

* split the types in gemm_bilinear instances, add condition to cmake policy

* fix syntax

* split the data types in batchnorm examples

* fix the batchnorm_bwd test

* fix types in the batchnorm_bwd test

2ea75bd6

23 Sep, 2023 1 commit

Add 3d grouped conv fwd wmma instances (#935) · c9553832

Bartłomiej Kocot authored Sep 23, 2023

* Add 3d grouped conv fwd wmma instances

* Refactor fwd conv tests

* Split wmma instances for each specialization

* Minor stylistic fixes

c9553832

22 Sep, 2023 3 commits
- fixed merge · adfcb029
  Jing Zhang authored Sep 22, 2023
  
  adfcb029
- Update naming (#937) · ede64ae9
  Rostyslav Geyyer authored Sep 22, 2023
  
  ede64ae9
- rename · 141efec8
  Jing Zhang authored Sep 22, 2023
  
  141efec8
21 Sep, 2023 2 commits

Refactoring cmake files to build data types separately. (#932) · bba085d2

Illia Silin authored Sep 20, 2023

* refactor cmake files for the tests

* refactor cmake files for examples

* fix cmake for gemm example

* fix the cmake file for all examples

* add splitting by data types in gemm_splitk instance header

* rename test to reflect only dl instances are used

* clean up CI workspace, update cmake for instances

* change the jenkinsfile syntax

* build all instances except DL on gfx11

* move workspace cleanup after stages

* clean up workspace after every stage

* isolate data types in grouped_conv_fwd header

* isolate dl instances for grouped_conv2d_fwd

* fix syntax

* fix cmake and batchnorm instances

* fix typo

* fix reduction instances

* fix grouped_conv headers

* fix syntax

* replace parsing logic for instances, replace bfp16 with bf16

* fix the client examples build

* clean up DTYPES from instances cmake files

* update the parsing logic in cmake files

* make an exception for reduction kernels

* update few remaining cmake files to handle DTYPES

* fix syntax

* fix cmake conflicts

* replace f8 with fp8 test name

* resolve conflicts for dpp instances

bba085d2

fixed comments · d96ccc1d
Jing Zhang authored Sep 21, 2023

d96ccc1d

20 Sep, 2023 4 commits
- fixed · d670c5a6
  Jing Zhang authored Sep 20, 2023
  
  d670c5a6
- Merge branch '3d_grouped_conv_fp16_comp_fp8' of... · 963bc7a3
  Jing Zhang authored Sep 20, 2023
```
Merge branch '3d_grouped_conv_fp16_comp_fp8' of github.com:ROCmSoftwarePlatform/composable_kernel into 3d_grouped_conv_fp16_comp_fp8
```
  963bc7a3
- add f8 comp instance · a18342d6
  Jing Zhang authored Sep 20, 2023
  
  a18342d6
- fix the building of the amd-stg-open compiler (#927) · 58817bf9
  Illia Silin authored Sep 19, 2023
  
  58817bf9
19 Sep, 2023 2 commits
- update to rocm5.7 by default (#925) · 718065eb
  Illia Silin authored Sep 19, 2023
```
* update to rocm5.7 by default

* fix jenkinsfile syntax
```
  718065eb
- fix the ckprofiler package build in a loop (#926) · 5a4416c8
  Illia Silin authored Sep 19, 2023
  
  5a4416c8
18 Sep, 2023 2 commits
- Fix DL GEMM instances with too large vector size (#901) · 63cd4592
  Bartlomiej Wroblewski authored Sep 18, 2023
```
* Fix vector lengths of DL GEMM instances with padding
* Add checks for correctness of vector lenghts in DL GEMM
```
  63cd4592
- Add native conversions fp8<->fp32 (#908) · f17af2e9
  Rostyslav Geyyer authored Sep 17, 2023
```
* Add native conversions

* Add bf8 conversions
```
  f17af2e9
15 Sep, 2023 2 commits

Stylistic improvements for grouped convolution code · bc2d0583
Bartlomiej Kocot authored Sep 13, 2023
```
Remove unnecessary ignoring

Update test/grouped_convnd_bwd_weight/test_grouped_convnd_bwd_weight.cpp
```
bc2d0583

Add fp16/fp8 support into Grouped gemm FixedNK (#874) · f9d0eddb

zjing14 authored Sep 14, 2023



* move all arguments into device

* add b2c_tile_map

* add examples

* add SetDeviceKernelArgs

* dedicated fixed_nk solution

* init client api

* add grouped_gemm_bias example

* add a instance

* add instances

* formatting

* fixed cmake

* Update EnableCompilerWarnings.cmake

* Update cmake-ck-dev.sh

* clean; fixed comments

* fixed comment

* add instances for fp32 output

* add instances for fp32 output

* add fp32 out client example

* fixed CI

* init commit for kbatch

* add splitk gridwise

* format

* fixed

* clean deviceop

* clean code

* finish splitk

* fixed instances

* change m_loops to tile_loops

* add setkbatch

* clean code

* add splitK+bias

* add instances

* opt mk_nk instances

* clean examples

* fixed CI

* remove zero

* finished non-zero

* clean

* clean code

* optimized global_barrier

* fixed ci

* fixed CI

* instance and client

* removed AddBias

* format

* fixed CI

* fixed CI

* move 20_grouped_gemm to 21_grouped_gemm

* clean

* formatting

* clean

* clean

* fixed computeType

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

f9d0eddb

14 Sep, 2023 1 commit
- change the cmake update method (#918) · 0d8efaa1
  Illia Silin authored Sep 14, 2023
  
  0d8efaa1
13 Sep, 2023 1 commit
- [Cmake] Set cmake default build type Release and path to /opt/rocm (#914) · 5fe687fa
  Jun Liu authored Sep 13, 2023
  
  5fe687fa