Commits · 0606e5498e7aa085a91c083d9c49794d30d371dc · gaoqiong / composable_kernel_ROCM

13 Aug, 2024 1 commit
- Support large: 12d tensor size for reduction kenrel (#1465) · 0606e549
  Mateusz Ozga authored Aug 13, 2024
  
  0606e549
12 Aug, 2024 2 commits

Disable inapplicable xdl and mha instances for gfx12 (#1464) · cbb6f2ab
Illia Silin authored Aug 12, 2024

cbb6f2ab

Rewrite *sh reduce unit tests to gtest: part 1 (#1407) · ab60b390

Mateusz Ozga authored Aug 12, 2024



* Rewrite .sh test to Gtest

* review chnages

* Removew unused comments

* Review v2

* Typo

* Separete UT: AMAX, MAX, MIN; added template params to trigger them

* Update test/reduce/reduce_no_index.cpp

---------
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

ab60b390

10 Aug, 2024 1 commit
- Fix bug with n block id calculation in DeviceGroupedConvXdlCShuffle (#1457) · 4a870942
  Bartłomiej Kocot authored Aug 10, 2024
```
* Fix typo in TransformConvFwdToGemm

* Fix bug in n offset calculation
```
  4a870942
09 Aug, 2024 2 commits

Codegen build w/CK (#1428) · da214a5a

arai713 authored Aug 09, 2024



* initial push

* cleaned up compiler errors

* removed commented code

* build codegen folder only for gfx9 targets

* remove separate stage for codegen tests from CI

* removed commented code from CMake

---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>

da214a5a

Revert "Revert Revert Support access per groups and filter2x3 in grouped conv... · 5ff8eeeb

Jun Liu authored Aug 08, 2024

Revert "Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382) (#1406) (#1415)" (#1455)

This reverts commit 33b399cc.

5ff8eeeb

08 Aug, 2024 3 commits
- Enable CI on gfx12. (#1454) · 4a5ab678
  Illia Silin authored Aug 08, 2024
```
* enable CI build and test on gfx1201

* skip DL kernels in CI for gfx12

* only run CI on gfx12 if rocm version >= 6.2

* remove the rocm version check for CI on gfx12

* add a switch for CI builds on gfx12
```
  4a5ab678
- check if the coerce-illegal-types flag is supported (#1451) · ae3b8ff8
  Illia Silin authored Aug 08, 2024
  
  ae3b8ff8
- add rocm-llvm-dev package to docker image (#1452) · 8a757284
  Illia Silin authored Aug 08, 2024
  
  8a757284
07 Aug, 2024 4 commits

Remove reinterpret_cast uses that result in undefined behaviour. (#1445) · 901e5f15

Juan Manuel Martinez Caamaño authored Aug 07, 2024

* Remove reinterpret_cast uses that result in undefined behaviour. Use a bitcast instead.

See https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_accessibility



Closes #1439

* fix clang format

---------
Co-authored-by: illsilin <Illia.Silin@amd.com>

901e5f15

upgrade to rocm6.2 as new default compiler (#1448) · 5df10432
Illia Silin authored Aug 07, 2024

5df10432

Bump rocm-docs-core from 1.6.1 to 1.6.2 in /docs/sphinx (#1449) · a71d407e

dependabot[bot] authored Aug 07, 2024

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.6.1 to 1.6.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.1...v1.6.2

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

a71d407e

Run CK_TILE FMHA benchmarks and collect the performance data. (#1447) · 12c1f68d

Illia Silin authored Aug 07, 2024

* run ck_tile benchmarks after the smoke tests and store logs

* change the path of fmha benchmark logs

* change the way of stashig ck_tile fmha logs

* prevent the errors in stages where no logs are generated

* fix the ck_tile fmha log names and headers

* generate the fmha performance logs in the root folder

* change jenkins scrip arguments format

* use exact file names for stashing

* modify scripts to process FMHA performance results

* unstash FMHA logs before parsing them

12c1f68d

06 Aug, 2024 7 commits

modify python wrapper for addmm (#1441) · 886d14cc
Max Podkorytov authored Aug 06, 2024

886d14cc
Limit fp8only operator build arch in ckProfiler (#1443) · 6fc7bff5
Haocong WANG authored Aug 07, 2024

6fc7bff5
Fix ROCm 6.2 compiler not fully supporting gfx12 when building CK with INSTANCES_ONLY (#1446) · afbf6350
Jun Liu authored Aug 06, 2024

afbf6350
Add missing constexpr to if conditions (#1444) · fd9ef4e6
Juan Manuel Martinez Caamaño authored Aug 06, 2024

fd9ef4e6

adding mha as static lib (#1366) · 840c5397

bibek authored Aug 06, 2024



* adding mha as static lib

* add fmha fwd compile options

* typo

* fix python version

* python version to 3

* increase path length

* add max path flag in mha cmake

* fix long path issue

* mha currently only runs in gfx94x

* only buld mha in mi300

* populate gpu_list

* add mha compile flags

* avoid building mha in gpu other then gfx94x

* some comments and  include ck_tile in rocm

* use rocm_install

* place ck_tile in include

* correct ck_tile path

---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

840c5397

Fix for beta!=0 in reduce (#1440) · b74d4d4d
jakpiase authored Aug 06, 2024
```
* fix for beta!=0 in reduce

* add reviewers suggestions
```
b74d4d4d

Add Grouped Conv Fwd Large Tensor kernel (#1432) · 4ec5c52a

Bartłomiej Kocot authored Aug 06, 2024

* Support 64 bit indexing

* Add new grouped conv fwd kernel for large tensors

* Add instances large tensor

* Fixes for transform conv to gemm

* Fixes

* fixes

* Remove not needed instances

* examples fixes

* Remove not need ds arrays

* Fix tests

* Add 2GB check in gridwise dl

* Fixes

4ec5c52a

05 Aug, 2024 2 commits

add --offload-compress compiler flag (#1433) · 7f57b2e0

Illia Silin authored Aug 05, 2024



* add --offload-compress compiler flag

* only apply the --offload-compress flag to the ckProfiler

* move the --offload-compress flag back to main cmake file

* add offload-compress to target compile option of ckProfiler

---------
Co-authored-by: carlushuang <carlus.huang@amd.com>

7f57b2e0

[CI][Jenkins] delete CI docker container upon exit (#1437) · f31ba04a
Illia Silin authored Aug 05, 2024

f31ba04a

01 Aug, 2024 1 commit

Add compiler flags for ROCm versions 6.2+ (#1429) · d311c953

Illia Silin authored Aug 01, 2024

* add compiler flags to fix compiler issues

* fix typo.

* disable test_smfmac_op on all devices except gfx942

* specify full path to compiler in CI

d311c953

31 Jul, 2024 4 commits
- Update doc requirements (#1423) · 6648fd3b
  Sam Wu authored Jul 31, 2024
  
  6648fd3b
- [HotFix] Fixed a typo in profile_gemm_multiply_multiply (#1425) · f31e8dfa
  zjing14 authored Jul 31, 2024
```
* fixed a typo

* clean

---------
Co-authored-by: Jing Zhang <jizhan@fb.com>
```
  f31e8dfa
- Codegen: isSupportedArgument check (#1417) · d32997a7
  arai713 authored Jul 31, 2024
```
* added isSupportedArgument check into codegen device op

* adding function call

* remove commented code
```
  d32997a7
- workaround rocm-6.2 compiler issue (#1421) · b3f86e79
  carlushuang authored Jul 31, 2024
  
  b3f86e79
30 Jul, 2024 2 commits
- add docker for rocm6.2_rc4 compiler (#1424) · b527cad4
  Illia Silin authored Jul 30, 2024
  
  b527cad4
- Revert Revert Support access per groups and filter2x3 in grouped conv fwd (#1382) (#1406) (#1415) · 33b399cc
  Bartłomiej Kocot authored Jul 30, 2024
  
  33b399cc
26 Jul, 2024 2 commits

Bump rocm-docs-core from 1.6.0 to 1.6.1 in /docs/sphinx (#1420) · b9ba5b26

dependabot[bot] authored Jul 26, 2024

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.6.0 to 1.6.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.0...v1.6.1

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

b9ba5b26

Introduce cmake USE_GLIBCXX_ASSERTIONS option (#1404) · 733f33af

trixirt authored Jul 25, 2024



A standard option in Fedora packaging that is used to check
the correctness of c++ use of the standard c++ library.
Signed-off-by: Tom Rix <trix@redhat.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

733f33af

25 Jul, 2024 2 commits

Add rotating buff for gemm_multi_d (#1411) · 105bd708

zjing14 authored Jul 25, 2024



* add rotating_buff for gemm_multi_d

* format

* Update flush_cache.hpp

* Update gtest.cmake

---------
Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: Haocong WANG <haocwang@amd.com>

105bd708

Bump rocm-docs-core from 1.5.1 to 1.6.0 in /docs/sphinx (#1416) · 1208082e

dependabot[bot] authored Jul 24, 2024

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.5.1 to 1.6.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.5.1...v1.6.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

1208082e

24 Jul, 2024 3 commits

Adding more instances of grouped convolution 3d forward for FP8 with... · 4a8a1bef

Andriy Roshchenko authored Jul 24, 2024

Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. (#1412)

* Add CMakePresets configurations.

* Add binary elementwise ConvScaleAdd and an example.

* Numerical verification of results.

Observed significant irregularities in F8 to F32 type conversions:
```log
ConvScaleAdd: float=145.000000   f8_t=160.000000    e=144.000000
ConvScaleAdd: float=97.000000   f8_t=96.000000    e=104.000000
ConvScaleAdd: float=65.000000   f8_t=64.000000    e=72.000000
```

* Implemented ConvScaleAdd + Example.

* Add ConvScale+Bias Instances

* Add Client Example for ConvScale+Bias

* Fix number of bytes in an example..

* Cleanup.

4a8a1bef

Add support for half_t and bfloat to reduction operations (#1395) · ffabd70a
Bartłomiej Kocot authored Jul 24, 2024
```
* Add support for half_t and bfloat to reduction operations

* Fix bhalf convert

* Next fix bf16
```
ffabd70a

Bump rocm-docs-core from 1.5.0 to 1.5.1 in /docs/sphinx (#1414) · 33b2a2bd

dependabot[bot] authored Jul 24, 2024

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.5.0 to 1.5.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.5.0...v1.5.1

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

33b2a2bd

23 Jul, 2024 1 commit
- disable bad instance (#1410) · d22713a7
  Haocong WANG authored Jul 24, 2024
  
  d22713a7
22 Jul, 2024 1 commit
- Revert Support access per groups and filter2x3 in grouped conv fwd (#1382) (#1406) · 5d8c3d81
  Bartłomiej Kocot authored Jul 22, 2024
  
  5d8c3d81
19 Jul, 2024 2 commits

[GEMM] F8 GEMM, performance optimized. (#1384) · 8c90f25b

Haocong WANG authored Jul 19, 2024



* add ab_scale init support

* enabled interwave

* add scale type; update isSupport

* adjust example

* clean

* enable f8 pure gemm rcr ckprofiler

* Add gemm_multiply_multiply instances

* clang format

* Optimize for ScaleBlockMNK=128

* enable abscale f8 gemm ck profiler

* Add pure f8 gemm test suite

* Reverting to the state of project at f60fd77

* update copyright

* clang format

* update copyright

---------
Co-authored-by: root <jizhan@amd.com>

8c90f25b

Universal gemm splitk using reduce (with multi-d) (#1341) · c544eb4d

ltqin authored Jul 19, 2024



* init for reduce_threadwise multi_d

* add reduce_threadwise_multi_d

* add reduce_multi_d

* clean

* start add an other splitk device op

* add reduce template parameter to SplitKBatchOffset

* add reduce c matrix

* clean up code

* change example data type to bf16

* add bf16Ai8B example

* remove reduce template parameter

* add splitk atomic status to v4

* example add multi d parameters

* device op add multi-d parameters

* add multi-d to reduce

* fix kbach=1 bug

* change B layout to col in  bf16Ai8B example

* remove float adding struct

* change  multi-d interface

* change file and class name

* remove multi-d of bf16Ai8B example

* change IsReduce function to IsReduceAdd

* change example layout to RRR from RCR

* according layout to set ds stride

* reset parameter layout

* add gemm universal reduce instance

* add reduce factory

* add profile_gemm_universal_reduce

* add reduce to profiler

* fix reduce instance

* fix profiler reduce compiling bug

* format

* format library instance code

* add mem instance for reduce library

* fix call instance names

* add workspace for reduce in ckProfiler

* format

* add mnpading to reduce library instance

* add fp16 instance to reduce of profiler

* change copyright time

* restore profiler cmake file

* add reduce text to instances

* add DsLayout and DsDataType to instances template parameter

* fixed gemm_reduce_multi_d

* add an example without multi_d

* Update common.hpp

* Update gtest.cmake

* Update gemm_xdl_splitk_reduce_bf16.cpp

* clean

* Update gtest.cmake

* format

* fixe api

* format

* default parameter change to RRR

* add vector_len for multi_d

* format

* Update gtest.cmake

* fix bf16A iBB elementwiseop

* add ReduceDataType

* move ReduceDataType to end position

* format

* remove googletest git method  address

* fix copyright time

* update init data

---------
Co-authored-by: root <jizhan@amd.com>
Co-authored-by: letaoqin <letaoqin@amd.com>
Co-authored-by: Jing Zhang <jizhan@meta.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

c544eb4d