Commits · fd7600ce80871da258e6aa1828382dd39d1dbd0b · gaoqiong / composable_kernel_ROCM

01 Feb, 2025 2 commits
- Fix clang format · f24bb5a0
  jefyang1 authored Jan 31, 2025
  
  f24bb5a0
- Fix gemm gemm on gfx950 · e95cb82b
  jefyang1 authored Jan 31, 2025
  
  e95cb82b
31 Jan, 2025 6 commits
- Update tests and pack functions · df1bad99
  Rostyslav Geyyer authored Jan 31, 2025
  
  df1bad99
- Fix a typo · 544aad11
  Rostyslav Geyyer authored Jan 31, 2025
  
  544aad11
- Use pointers instead of array indices · 91fa13b0
  Rostyslav Geyyer authored Jan 31, 2025
  
  91fa13b0
- Move flag logic to scaled_type_convert header · 9f58449c
  Rostyslav Geyyer authored Jan 31, 2025
  
  9f58449c
- Test the functionality of V_MFMA_F32_16X16X128_F8F6F4 and ... · c38163cd
  Andriy Roshchenko authored Jan 31, 2025
```
Test the functionality of V_MFMA_F32_16X16X128_F8F6F4 and  V_MFMA_F32_32X32X64_F8F6F4 instructions. (#293)

* Introduced MFMA tests

* Verified f8f6f4 MFMA Instructions
```
  c38163cd
- Add a flag to config file · 7336b04b
  Rostyslav Geyyer authored Jan 31, 2025
  
  7336b04b
30 Jan, 2025 4 commits
- Add docstrings · d032ea56
  Rostyslav Geyyer authored Jan 30, 2025
  
  d032ea56
- Remove unneeded AsType accessors · 97c7e725
  Rostyslav Geyyer authored Jan 30, 2025
  
  97c7e725
- Update pack/unpack methods · bcc12098
  Rostyslav Geyyer authored Jan 30, 2025
  
  bcc12098
- Fix build logic · acf8854e
  Rostyslav Geyyer authored Jan 30, 2025
  
  acf8854e
29 Jan, 2025 2 commits
- Add conversions · b8f4de71
  Rostyslav Geyyer authored Jan 29, 2025
  
  b8f4de71
- Add a flag · c98974ee
  Rostyslav Geyyer authored Jan 29, 2025
  
  c98974ee
27 Jan, 2025 1 commit
- Add size checks in pack function · 2a807013
  Rostyslav Geyyer authored Jan 27, 2025
  
  2a807013
24 Jan, 2025 2 commits
- Fix merge · 7c6a541b
  Rostyslav Geyyer authored Jan 24, 2025
  
  7c6a541b
- Update unpack signature · 86950b3a
  Rostyslav Geyyer authored Jan 24, 2025
  
  86950b3a
22 Jan, 2025 3 commits
- fix typo · 6a747f03
  illsilin authored Jan 22, 2025
  
  6a747f03
- fix typo · 108f2733
  illsilin authored Jan 22, 2025
  
  108f2733
- fic build for multiple archs · 50010cf9
  illsilin authored Jan 21, 2025
  
  50010cf9
16 Jan, 2025 3 commits
- Fix and optimize dynamic unary elementwise (#1818) · 1519ce91
  Bartłomiej Kocot authored Jan 16, 2025
```
* Fix and optimize dynamic unary elementwise

* fix
```
  1519ce91
- Add missing type aliases · 17d1e68b
  Rostyslav Geyyer authored Jan 16, 2025
  
  17d1e68b
- Add vector support · 3a64757f
  Rostyslav Geyyer authored Jan 16, 2025
  
  3a64757f
15 Jan, 2025 1 commit

Add rounding for float to bf16 conversion as default (#1812) · 7790e8c3

Bartłomiej Kocot authored Jan 15, 2025

* Add rounding for float to bf16 conversion

* Add bhalf test

* Add inf test bhalf

* Refactor

* update cmake

* Fixes

7790e8c3

10 Jan, 2025 1 commit

Grouped convolution backward weight special vector size loads (#1772) · fd46a01d

Bartłomiej Kocot authored Jan 10, 2025

* Grouped convolution backward weight special vector size loads

* Instnaces and tests

* Fixes

* Add 7 and 13 special cases

* fix comments

* Fix

* Fix2

* fixes

* fix atomic add bf16

fd46a01d

08 Jan, 2025 1 commit

Disable building DPP kernels by default (#1804) · 26b3829c

darren-amd authored Jan 08, 2025

* Disable building DPP kernels by default

* Disable building dpp instances, examples, or tests if DPP_KERNELS is not set

* Add new DPP_KERNELS flag to readme

26b3829c

07 Jan, 2025 1 commit

[MX FP8] Add Scaled Type Convert Functions for OCP FP8/BF8 data types (#271) · c4a05057

Andriy Roshchenko authored Jan 07, 2025

* Move scaled_type_convert functions to a separate header

* Introduce MX data tests

* Build MX tests only on relevant architectures

* Refactor E8M0 scale implementation

* Fix `config.h` typo

* Cleanup deprecated symbols

* Refactor `amd_ck_fp8.hpp`

* `scaled_type_convert` for `f8_ocp_t`

* Implement test for MX FP8 scaled type convert

* Implement test for MX BF8 scaled type convert

* Scaled type convert for vectors of 2 FP8 elements

* Scaled type convert for vectors of 16 FP8 elements

* Implementation of scaled conversion from F32 to F8

* Add tests for scaled conversions from FP32 to FP8

* Add documentation to the test functions

* Implementation of scaled conversion from F32x2 to F8x2

* Implementation of scaled conversion from F32x16 to F8x16

* Implementation of scaled conversion from F32x32 to F8x32

* Implementation of scaled conversion from F8x32 to F32x32

* Verified on the emulator

c4a05057

06 Jan, 2025 1 commit

Add MXFP6 and MXBF6 conversion methods (#270) · e093146e

Rostyslav Geyyer authored Jan 06, 2025

* Add conversions

* Add tests

* Add docstrings

* Add scaled conversions

* Add fp6/bf6 tests

* Remove misleading fp4 test case

* Add docstrings

* Clean up

* Address comments

* Set stricter tolerances for RNE tests

* Add missing tests

* Add native conversions to float

* Revert "Add native conversions to float"

This reverts commit 09467111f73b753c8cc3d597533b187940353dab.

* Update copyright years

e093146e

04 Jan, 2025 2 commits
- Fix universal gemm profiler for pk_i4_t (#1790) · 888317e6
  Bartłomiej Kocot authored Jan 04, 2025
```
* Fix universal gemm profiler for pk_i4_t

* fix
```
  888317e6
- terminology clean-up (#1792) · 8ea375bb
  Illia Silin authored Jan 03, 2025
  
  8ea375bb
03 Jan, 2025 1 commit

Implement the fp16xint4 scale weight only kernel for Ali (#1786) · 4f62f6e9

Mingtao Gu authored Jan 03, 2025



* enable int4 scale (weight only) kernel

* format some files

* Add unit test for int4 weight only

* fixed and formatted code

* fixed

* formated

* formated

* fixed

* fixed a bug in the ckProfiler, and formatted the code

---------
Co-authored-by: mtgu0705 <mtgu@amd.com>

4f62f6e9

02 Jan, 2025 2 commits

BF16 GEMM Stream-K (#1541) · 9e95d54c

Muhammed Emin Ozturk authored Jan 02, 2025



* initial

* Cmake file

* successfull compilation but validation failed

* Cmake

* update

* gpu validation

* gemm universal

* gemm universal sk update

* sk bf16 universal instance

* gemm_universal_streamk.hpp

* only build for gfx94

* Cmakelist

* profiler update, bf16 sk only works at gfx42

* clang

* clang

* clang all

* no need flags

* cmake script

* delete comment

* gemm universal sk fix

* clang

* profiler fix

* clang

* update

* update

* delete comment

* code formatting

* cmake

* fix instance

* clang

* argument supported

* argument supported and clang

* update

* fix

* removing unnecessary comments

* clang formatting

* Update library/src/tensor_operation_instance/gpu/CMakeLists.txt
Co-authored-by: afagaj <john.afaganis@gmail.com>

* CopyRight Comment 2025

* clang reformatting

* copy right 2025

---------
Co-authored-by: Emin Ozturk <ozturk.27@osu.edu>
Co-authored-by: root <root@ctr-ubbsmc16.amd.com>
Co-authored-by: Muhammed Emin Ozturk <meozturk@t004-008.hpcfund>
Co-authored-by: root <root@splinter-126-wr-d3.amd.com>
Co-authored-by: Muhammed Emin Ozturk <meozturk@t006-001.hpcfund>
Co-authored-by: Muhammed Emin Ozturk <meozturk@login1.hpcfund>
Co-authored-by: Muhammed Emin Ozturk <meozturk@t004-004.hpcfund>
Co-authored-by: Emin Ozturk <emin.ozturk@utah.edu>
Co-authored-by: Muhammed Emin Ozturk <meozturk@t008-001.hpcfund>
Co-authored-by: afagaj <john.afaganis@gmail.com>

9e95d54c

Jing's contribution: prototype of mixed precision gemm FP16/BF16xint4 GEMM (#1762) · 1d8e4ec2

Adam Osewski authored Jan 02, 2025



* add a prototype of int4

* clean

* debug

* clean

* clean

* move packed into dynamic_buffer

* fixed coord reset

* add fast pki4 to half conversion

* fix

* fixed reference and host_tensor

* fixed tensor init

* format

* debug i4_to_f16_convert

* format

* fixed splitk

* weight permute

* add b tile permute

* clean

* weight permute with splitki

* format

* improve weight layout

* add and_or_b32

* fixed splitk crush

* add permute switch as a template

* recover v3r1

* clean

* failure with intrawave v2

* fixed

* fixed

* add ckProfiler

* add bfp16 support

* add bf16 example

* fixed int4 to bhalf_t conversion

* format

* fixed int4 to bf16 conversion

* clean

* add instances for mem

* clean

* fixed host tensor size

* fixed

* debug

* fixed

* add pk_i4_t as a struct

* fix

* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* revert

* Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* fixed comments

* revert

* clean

* revert

* revert

* fixed

* Update CMakeLists.txt

* Update script/cmake-ck-dev.sh
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update include/ck/tensor_operation/gpu/element/unary_element_wise_operation.hpp
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* Update CMakeLists.txt
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

* fixed

* fixed

* fixed

* revert

* revert

* add comments

* format

* fixed assert

* fixed

* Fix I4 define in ckProfiler

* Fixed example_gemm_xdl_bf16_pk_i4_v3 test failed issue

---------
Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: mtgu0705 <mtgu@amd.com>

1d8e4ec2

20 Dec, 2024 1 commit
- fix typo for CK_USE_OCP_FP8 (#1769) · 07339c73
  Illia Silin authored Dec 20, 2024
  
  07339c73
19 Dec, 2024 1 commit
- Refactor E8M0 scale implementation (#262) · 9598b9a0
  Andriy Roshchenko authored Dec 18, 2024
```
* Refactor E8M0 scale implementation
```
  9598b9a0
18 Dec, 2024 4 commits
- fix one more typo · 12b16cc3
  illsilin authored Dec 18, 2024
  
  12b16cc3
- Add FP6 and BF6 types (#261) · c847d5be
  Rostyslav Geyyer authored Dec 18, 2024
```
* Add a rounding flag

* Add FP6 and BF6

* Add tests
Co-authored-by: Andriy Roshchenko <107577548+andriy-ca@users.noreply.github.com>

* Clean up

---------
Co-authored-by: Andriy Roshchenko <107577548+andriy-ca@users.noreply.github.com>
```
  c847d5be
- fix typo for CK_USE_OCP_FP8 · 401ce3ff
  illsilin authored Dec 18, 2024
  
  401ce3ff
- fix typo for CK_USE_OCP_FP8 · 105d2d1b
  illsilin authored Dec 18, 2024
  
  105d2d1b
17 Dec, 2024 1 commit
- Pass build flags to config.h (#1760) · 689a5ae4
  Illia Silin authored Dec 17, 2024
```
* pass the build flags to config.h

* fix clang format
```
  689a5ae4