Commits · jizhan/universal_gemm_multi_d · gaoqiong / composable_kernel_ROCM

22 Apr, 2024 3 commits
- add · ea38a958
  root authored Apr 22, 2024
  
  ea38a958
- add client example · 162d0305
  root authored Apr 22, 2024
  
  162d0305
- add instances · d8ab41d5
  root authored Apr 22, 2024
  
  d8ab41d5
21 Apr, 2024 1 commit
- clean up · bdf6cddb
  root authored Apr 21, 2024
  
  bdf6cddb
20 Apr, 2024 4 commits
- add multi_d · 1bc44df2
  root authored Apr 20, 2024
  
  1bc44df2
- add multi_d · f537f83f
  root authored Apr 20, 2024
  
  f537f83f
- add multi_d deviceop · 3afb2f74
  root authored Apr 20, 2024
  
  3afb2f74
- add multiD support into gridwise and deviceOp · 489599ba
  Jing Zhang authored Apr 20, 2024
  
  489599ba
19 Apr, 2024 2 commits

Refactor elementwise kernels (#1222) · ad1597c4

Bartłomiej Kocot authored Apr 19, 2024

* Refactor elementwise kernels

* Instances fixes

* Fix cmake

* Fix max pool bwd test

* Update two stage gemm split k

* Restore elementwise scale for hiptensor backward compatiblity

* Fix Acc data type check in conv fwd multiple abd

* Disable conv fp64 fwd example

* Update grouped conv weight multi d

ad1597c4

Add bf16 and bf16@int8 mk_nk_mn instances for grouped gemm two stage (#1228) · e0f3f918
jakpiase authored Apr 19, 2024
```
* added bf16 and bf16@int8 mk_nk_mn instances

* fix preprocessor guards
```
e0f3f918

18 Apr, 2024 3 commits

Add grouped conv bwd weight multi d kernel (#1237) · fd923b6d

Bartłomiej Kocot authored Apr 18, 2024

* Add grouped conv bwd weight multi d kernel

* Reference fix

* Fix cmake files

* bwd weight scale only xdl

* Fixes

* Fix client conv fwd example

fd923b6d

Make daily cron jobs use the rocm6.1 compiler. (#1253) · 930f889c

Illia Silin authored Apr 18, 2024

* add rocm6.1 docker and make it default for CI

* fix typo

* move the rocm6.1 image into public dockerhub repo

* upgrade daily cron jobs to use rocm6.1

930f889c

Upgrade to ROCm6.1 and turn on the -enable-post-misched=0 compiler flag. (#1250) · caae537d
Illia Silin authored Apr 18, 2024
```
* add rocm6.1 docker and make it default for CI

* fix typo

* move the rocm6.1 image into public dockerhub repo
```
caae537d

16 Apr, 2024 3 commits

docs: fix broken contributing link (#1244) · 501a6b68
peter authored Apr 16, 2024

501a6b68

Added Multi_ABD support into Gemm and GroupedGemmFixedNK (#978) · 12865fbf

zjing14 authored Apr 15, 2024



* added an example grouped_gemm_multi_abd

* fixed ci

* add setElementwiseOp

* changed API

* clean code: add multiA into example

* fixed v7r2 copy

* add transpose

* clean

* fixed vector_load check

* Update example/15_grouped_gemm/grouped_gemm_multi_abd_xdl_fixed_nk_bias_fp16.cpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update example/15_grouped_gemm/grouped_gemm_multi_abd_xdl_fixed_nk_bias_fp16.cpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update example/15_grouped_gemm/grouped_gemm_multi_abd_xdl_fixed_nk_bias_fp16.cpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_multiple_abd_xdl_cshuffle.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/grid/gridwise_gemm_multiple_abd_xdl_cshuffle.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd_fixed_nk.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd_fixed_nk.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* add reduce

* testing

* add example_b16_i8

* refactor example

* clean

* add mpading

* disable reduce for kbatch = 1

* seperate reduce device op

* add reduce op

* add guard for workspace_size

* add instances

* format

* fixed

* add client example

* add a colmajor

* add instances

* Update cmake-ck-dev.sh

* Update profile_gemm_splitk.cpp

* Update gridwise_gemm_xdlops_v2r4r2.hpp

* format

* Update profile_gemm_splitk.cpp

* fixed

* fixed

* adjust test

* adjust precision loss

* adjust test

* fixed

* add bf16_i8 scale bias

* fixed scale

* fixed scale elementwise_op

* revert contraction deviceop changes

* fixed

* Add AddFastGelu

* Revert "Merge branch 'jizhan/gemm_splitk_reduce' into grouped_gemm_multi_abd_fixed_nk_example"

This reverts commit 3b5d001efd74335b38dcb7d8c8877580b49d23a4, reversing
changes made to 943199a99191661c5597c51ca8371a90bf57837e.

* add Scales into elementwise

* add gemm_multi_abd client example

* add client examples

* add rcr and crr

* add grouped gemm client example

* add grouped gemm client example

* add instance for rcr crr

* format

* fixed

* fixed cmake

* fixed

* fixed client_example

* format

* fixed contraction isSupport

* Update include/ck/tensor_operation/gpu/device/device_grouped_gemm_multi_abd_fixed_nk.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

* Update device_reduce_threadwise.hpp

* clean

* Fixes

* Fix example

---------
Co-authored-by: Jing Zhang <jizha@amd.com>
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

12865fbf

introducing ck_tile! (#1216) · db376dd8

carlushuang authored Apr 16, 2024

* enable gfx940

* switch between intrinsic mfma routines on mi100/200 and mi300

* fix mfma_int8 on MI300

* disable 2 int8 examples on MI300

* Update cmake-ck-dev.sh

* restore gitignore file

* modify Jenkinsfile to the internal repo

* Bump rocm-docs-core from 0.24.0 to 0.29.0 in /docs/sphinx

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.24.0 to 0.29.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.24.0...v0.29.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>

* initial enablement of gfx950

* fix clang format

* disable examples 31 and 41 int8 on gfx950...

db376dd8

15 Apr, 2024 1 commit
- add CK_USE_XDL/WMMA for client examples (#1238) · dd34ab6e
  Illia Silin authored Apr 15, 2024
  
  dd34ab6e
14 Apr, 2024 1 commit

[GEMM] Gemm universal device operation (#1154) · f83e9701

Haocong WANG authored Apr 14, 2024



* Optimize GEMM on MI200/300:
1. Add new blockwise gemm pipeline
2. Add irregular splitk intances

* clang format + typo fix

* Fix a bug

* initial commit

* Add more instances to irregular splitk

* blkgemm pipeline v1~4 prototype

* Sanity Checked. Known issue:
1. Poor performance of splitk
2. Register spill on blkgemmpipeline v3

* Sanity and Performance fix:
1. fix a bug related to sanity in grouped b2c mapping
2. fix a bug related to sanity and performance in splitk offset

* Sanity and API update:
1. Remove prefetch stage
2. Fix valid check bug
3, Add first gemm_universal instance into ckProfiler

* Add NN instances for gemm universal

* 1. Add NT instances for gemm_universal
2. Fix a bug about Kpadding in gemm_universal

* Fix a bug regarding padding Odd K number

* remove kernel print

* Fix KPadding bug...

* Update safety check

* another try to fix kpadding..

* Sanity checked

* new instances..

* clang format+typo fix

* remove clang format script's change

* Add non-hotloop compile option

* 1. Add fp16xfp8 example
2. pull packed convert f8 from pr1150

* Some miscs.. opt and fix

* Add pipeline description docs

* Split universal gemm instance library to cut profiler compiling time

* uncomment cmakefile

* Fix a bug caused by blockwise_gemm_pipe_v2

* reduce default splitk to 1

* Add 224x256x64 tile size

* update, including:
1. Experiment pipeline 5~7
2. Optimization for pipeline 4
3. Organized instance library

* temp save

* temp save

* Permuted lds layout, sanity and function checked

* clang format

* Move OOB check from RunRead to RunWrite, for better software pipeline.
TODO: agpr spill when NN layout

* clangformat

* A/B splitpipe scheduler for v3

* Fix two bugs

* bug fix

* fix a bug in oob check

* Example for mixed fp16_fp8 gemm

* Clean experimental code blocks

* Add mixed precision gemm into profiler

* tempsave

* optimize m/n major lds layout

* Add RRR GEMM  mixed precision instances

* Optimize f8 matrix transpose

* Add test_gemm_universal

* A/B spilt schedule for blkpip v5

* Take ds_read2 into iglp scheduling scheme

* format

* fixed cmake

* Add llvm-option into CI cmake flag

---------
Co-authored-by: Jing Zhang <jizhan@amd.com>

f83e9701

12 Apr, 2024 1 commit
- Update the config.h after the CK_USE_XDL/WMMA are set. (#1236) · 7cdf5a96
  Illia Silin authored Apr 12, 2024
```
* pass XDL and WMMA macros to libs that use CK

* update config.h after XDL and WMMA macros get set
```
  7cdf5a96
11 Apr, 2024 3 commits

[HotFix] pass XDL and WMMA macros to libs that use CK (#1234) · d7f05fb9
Illia Silin authored Apr 11, 2024

d7f05fb9
Add instances for conv_scale with bf8@fp8->fp8 (#1231) · bbefc12a
Rostyslav Geyyer authored Apr 11, 2024
```
* Add instances

* Add example

* Add profiler mode

* Add client example
```
bbefc12a

Bump rocm-docs-core from 0.38.0 to 0.38.1 in /docs/sphinx (#1232) · b2735caf

dependabot[bot] authored Apr 11, 2024

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.38.0 to 0.38.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.38.0...v0.38.1

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

b2735caf

10 Apr, 2024 1 commit
- add yigex (#1230) · 381d44aa
  zjing14 authored Apr 09, 2024
  
  381d44aa
09 Apr, 2024 3 commits
- Extend support for contraction 6D (#1207) · ced5af16
  Bartłomiej Kocot authored Apr 09, 2024
```
* Extend support for contraction up to 5D

* Extend contraction bilinear instances

* Fix interface test

* Add 6d support, remove 3d,4d,5d

* Fixes

* Fix readme

* Make defualt dim for contraction instances
```
  ced5af16
- Add an example (#1227) · 366592b0
  Rostyslav Geyyer authored Apr 09, 2024
  
  366592b0
- Add an example (#1225) · 50cc0a13
  Rostyslav Geyyer authored Apr 09, 2024
  
  50cc0a13
04 Apr, 2024 2 commits

fix the latest errors with staging compiler (#1229) · 7e5c81fe
Illia Silin authored Apr 04, 2024

7e5c81fe

Add Grouped Gemm Multiple D SplitK TwoStage (#1212) · c7010716

jakpiase authored Apr 04, 2024



* Support A/B/C elementwise ops.

* First part of GGEMM multiD splitk two stage.

* WIP - changes for debuggin.

* tmp save

* working version

* added bf16@int8 version

* fixes

* add reviewers sugestions

* pre-commited missing files

* switched to ifs from elseifs

---------
Co-authored-by: Adam Osewski <Adam.Osewski@amd.com>

c7010716

03 Apr, 2024 1 commit

Add instances for conv_scale with fp8@bf8->fp8 (#1220) · a61e73bc

Rostyslav Geyyer authored Apr 03, 2024

* Update device op api to support BComputeType

* Add example

* Add instances

* Add profiler mode

* Add client example

* Update copyright year

* Add BComputeType check

* Fix compute types

a61e73bc

02 Apr, 2024 3 commits

Introduce combined elementwise ops (#1217) · 9a194837
Bartłomiej Kocot authored Apr 03, 2024
```
* Introduce combined elementwise ops

* Introduce refrence elementwise
```
9a194837

Split the instances by architecture. (#1223) · ae57e593

Illia Silin authored Apr 02, 2024

* parse examples inside the add_example_executable function

* fix the example 64 cmake file

* add xdl flag to the gemm_bias_softmax_gemm_permute example

* add filtering of tests based on architecture type

* enable test_grouped_gemm for gfx9 only

* enable test_transpose only for gfx9

* only linnk test_transpose if it gets built

* split the gemm instances by architectures

* split gemm_bilinear,grouped_conv_bwd_weight instances by targets

* split instances by architecture

* split grouped_conv instances by architecture

* fix clang format

* fix the if-else logic in group_conv headers

* small fix for grouped convolution instances

* fix the grouped conv bwd weight dl instances

* fix client examples

* only enable client examples 3 and 4 on gfx9

* set the gfx9 macro

* make sure the architecture macros are set by cmake

* use separate set of xdl/wmma flags for host code

* sinmplify the main cmake file

* add co...

ae57e593

improved zeroing (#1221) · 303d4594
zjing14 authored Apr 02, 2024

303d4594

27 Mar, 2024 1 commit

Bump rocm-docs-core from 0.37.1 to 0.38.0 in /docs/sphinx (#1218) · 5f2c89e8

dependabot[bot] authored Mar 27, 2024

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.37.1 to 0.38.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.37.1...v0.38.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

5f2c89e8

22 Mar, 2024 3 commits

allow the CI to pass even if can't connect to db (#1214) · cc1f733d
Illia Silin authored Mar 22, 2024

cc1f733d

Bump rocm-docs-core from 0.37.0 to 0.37.1 in /docs/sphinx (#1211) · 2ae16e90

dependabot[bot] authored Mar 22, 2024

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.37.0 to 0.37.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.37.0...v0.37.1

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

2ae16e90

Add elementwise with dynamic vector dim (#1198) · 9c052804
Bartłomiej Kocot authored Mar 22, 2024
```
* Add elementwise with dynamic vector dim

* Reduce number of instaces

* Fixes

* Fixes
```
9c052804

21 Mar, 2024 1 commit

Add instances for conv_scale with bf8 in / fp8 out (#1200) · fd0d093e

Rostyslav Geyyer authored Mar 21, 2024

* Add bf8 conv fwd instances

* Add example

* Add profiler mode

* Add client example

* Fix copyright headers

* Format

fd0d093e

20 Mar, 2024 1 commit

Bump rocm-docs-core from 0.36.0 to 0.37.0 in /docs/sphinx (#1208) · 9e504269

dependabot[bot] authored Mar 20, 2024

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.36.0 to 0.37.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.36.0...v0.37.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

9e504269

19 Mar, 2024 1 commit

Fix a couple of docker issues. (#1206) · f5210953

Illia Silin authored Mar 19, 2024

* do not install sccache by default, only install rocm-llvm-dev for rocm6.1

* add sccache flag to docker build options

f5210953

18 Mar, 2024 1 commit

update the changelog for ROCm6.1 release (#1205) · 9e011bcd

Illia Silin authored Mar 18, 2024

* update the changelog for ROCm6.1 release

* modifty the order of items in changelog, capitalize GEMMs

9e011bcd