Commits · 14daa201e9f2140ae1d151dae447f7c45114a18e · gaoqiong / composable_kernel

08 Nov, 2023 1 commit
- renamed files · 14daa201
  Astha Rai authored Nov 08, 2023
  
  14daa201
07 Nov, 2023 2 commits
- updating profiler · 6bfdd98a
  Astha Rai authored Nov 07, 2023
  
  6bfdd98a
- updated example with 1d kernel · ddefb951
  Astha Rai authored Nov 07, 2023
  
  ddefb951
01 Nov, 2023 1 commit
- updated test/profiler files · 4a20c076
  Astha Rai authored Nov 01, 2023
  
  4a20c076
31 Oct, 2023 1 commit
- updated vector dim access to enable vector load · baaad9ec
  Astha Rai authored Oct 31, 2023
  
  baaad9ec
29 Oct, 2023 1 commit
- fixed errors in test/profiler · f3b6e205
  Astha Rai authored Oct 29, 2023
  
  f3b6e205
26 Oct, 2023 1 commit
- update profiler and client example tensor layouts · 4dab86fe
  Astha Rai authored Oct 26, 2023
  
  4dab86fe
25 Oct, 2023 2 commits
- remove comments · 0f8c6a60
  Astha Rai authored Oct 25, 2023
  
  0f8c6a60
- fixed error in instance generation · 6b5bce42
  Astha Rai authored Oct 25, 2023
  
  6b5bce42
24 Oct, 2023 3 commits
- removing instances · fb6efd98
  Astha Rai authored Oct 24, 2023
  
  fb6efd98
- updated instance list for client example, added different layout example · 0768642d
  Astha Rai authored Oct 24, 2023
  
  0768642d
- removed unneccesary comments, renamed files · 57b9cf69
  Astha Rai authored Oct 24, 2023
  
  57b9cf69
19 Oct, 2023 2 commits
- minor fix · a3115568
  Astha Rai authored Oct 19, 2023
  
  a3115568
- fixing minor error · 5a6d8251
  Astha Rai authored Oct 19, 2023
  
  5a6d8251
18 Oct, 2023 1 commit
- adding test files and profiler · 244681cf
  Astha Rai authored Oct 18, 2023
  
  244681cf
17 Oct, 2023 2 commits
- minor formatting and naming fixes · 991ce41a
  Astha Rai authored Oct 17, 2023
  
  991ce41a
- removed extra files · 09b0780d
  Astha Rai authored Oct 17, 2023
  
  09b0780d
16 Oct, 2023 1 commit
- implemented client ex with device_elementwise.hpp and device_elementwise_3d_impl.hpp · 77a60235
  Astha Rai authored Oct 16, 2023
  
  77a60235
15 Oct, 2023 1 commit
- Merge branch 'develop' into transpose_5d · a2ddbd2b
  arai713 authored Oct 14, 2023
  
  a2ddbd2b
14 Oct, 2023 1 commit
- fixed errors in client example · e9ecf8d1
  Astha Rai authored Oct 14, 2023
  
  e9ecf8d1
13 Oct, 2023 2 commits

Add splitk gemm fp16 @ fp16 with fp8 compute instances (#983) · fa753f27
Rostyslav Geyyer authored Oct 13, 2023
```
* Add ComputeType

* Update for compatibility

* Add instances

* Update profiler api
```
fa753f27

add vector_type support into thread_copy_v3r1 (#969) · 2ce9b56c

zjing14 authored Oct 13, 2023



* add vector_type support into thread_copy_v3r1

* remove unncessary type_convert

* fixed datatype

* fixed dataType

* changed API with is_packx_invocable

* changed example

* add missing cmake file

* fixed ci

* fixed cmake

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

2ce9b56c

12 Oct, 2023 2 commits

Bump gitpython from 3.1.31 to 3.1.35 in /docs/sphinx (#898) · a3c80265

dependabot[bot] authored Oct 12, 2023

Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.31 to 3.1.35.
- [Release notes](https://github.com/gitpython-developers/GitPython/releases)
- [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES)
- [Commits](https://github.com/gitpython-developers/GitPython/compare/3.1.31...3.1.35

)

---
updated-dependencies:
- dependency-name: gitpython
  dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

a3c80265

simplified buffer_load/store (#971) · f3b02ecf

zjing14 authored Oct 11, 2023



* simplified buffer_load/store

* add bfp8/fp8

* fixed

* fixed buffer_load

* fixed buffer_store

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

f3b02ecf

11 Oct, 2023 2 commits

Revert "Grouped Gemm with looping over the tiles. (#788)" (#982) · c99323be
zjing14 authored Oct 11, 2023
```
This reverts commit a4f72a31.
```
c99323be

Grouped Gemm with looping over the tiles. (#788) · a4f72a31

Adam Osewski authored Oct 11, 2023



* Introduce LocalBlockToCTileMap.

* Change the signature of CalculateBottomIndex() function which now does
not accept any argument. The B2C map which is already passed as an
argument to the kernel Run function is calculating block's local id
already outside at kernel entry point __global__ function.
The LocalB2C map stores as members local block ID.

* Use LocalBlockToCTile map in device ops.

* First draft of tile loop work distribution.

* Fix typo.

* Simplify kernel arguments.

Calculate descriptors & B2C maps on the device.

* Use looping kernel.

* Fix B2C constructor.

* Fix Navi21 errors.

* Calculate tile start/end in device kernel.

* Change Run API to accept user provided workspace buffer.

* Add new line at EOF.

* Move Gemm KernelArguments to device op interface.

* Remove unused code.

* Update API.

* Launch grid size which is min of occupancy vs tile count

* Get back to use constant memory for gemm descriptors.

* Remove unused code.

* Add default virtual method implementation.

* Update comments to conform with doxygen style.

* Fix doc style and unused parameters.

* Add thread cluster lengths to kernel name.

* Remove old splitk impl and replace it with tile looping one.

* Modify instances.

* set KPerBlock to 64
* maximize wherever possible vector load size.

* Fix instances cluster lengths.

* Change comment style.

* Use 128b store where possible in instances.

* Update test cases, since KPerBlock has doubled.

* Update output stream operator for Sequence.

* Add pipeline version to GroupedGEMM device op type string.

* Fix pipeline version type logging.

* Fix input tensors type after merge.

* Fix compiler error.

* Fix output stream operator for Pipeline version.

* Store using 128b.

* Set of instances with kpb 32/64

* Limit number of instances

* Remove commented out instances.

* Fix function name.

* Limit the number of instances.

Add pipline version to the regular instances

* Change thr cluster layout for reading B tensor.

* disabled failed instances

---------
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Jing Zhang <jizha@amd.com>

a4f72a31

10 Oct, 2023 3 commits
- Fix MNKPadding in gridwise_gemm_xdlops_v2r3 (#981) · 98c80714
  Bartłomiej Kocot authored Oct 10, 2023
  
  98c80714
- Fixed f8_gemm NaN (#975) · ac9595a9
  zjing14 authored Oct 10, 2023
```
* workaround nan problem by changing output to fp16

* enable f8/bf8 gemm tests on MI200

* workaround f16 to f8 conversion

---------
Co-authored-by: Jing Zhang <jizha@amd.com>
```
  ac9595a9
- Merge branch 'develop' into transpose_5d · 11001fa3
  arai713 authored Oct 09, 2023
  
  11001fa3
09 Oct, 2023 2 commits
- added instances for client example · c4926252
  Astha Rai authored Oct 09, 2023
  
  c4926252
- adding client example · 574fd35a
  Astha Rai authored Oct 09, 2023
  
  574fd35a
05 Oct, 2023 3 commits
- Replace CMake `return` from later CMake (#970) · 59136091
  Lauren Wrubleski authored Oct 05, 2023
  
  59136091
- Revert "Add support for mixed precision in contraction scale and bilinear" (#967) · 4daedf8c
  Illia Silin authored Oct 05, 2023
```
* Revert "Add support for mixed precision in contraction scale and bilinear (#936)"

This reverts commit f0748506.

* revert commits #957 and #960
```
  4daedf8c
- remove example 60 (#963) · 570ff3dd
  zjing14 authored Oct 05, 2023
```
Co-authored-by: Jing Zhang <jizha@amd.com>
```
  570ff3dd
04 Oct, 2023 3 commits

Grouped conv bwd data with fp16 input and bf8fp8 comp (#962) · 04f93aad

zjing14 authored Oct 04, 2023



* Add f8 bf8 gemm example

* Add element-wise ops

* Add intrinsics

* Update reference calculation

* Add an additional type option for xdlops gemm

* Fix build process

* Add bf8 to buffer addressing

* Update blockwise op, split typeA and typeB

* Update for compatibility

* Uppdate naming to f8->fp8

* Update naming

* Format

* Update naming (#937)

* Add a client example

* Add computetypes to device and gridwise ops

* Add instances, update instance factory

* Format

* Fix a flag

* Add ckProfiler mode

* Fix typos

* Add an example

* Add bf8 generator

* add bf8 mfma; fixed type_convert for bf8

* move verfication ahead of timing

* Update reference calculation

* Fix reference

* Narrow down float init range

* Fix bf8 bf8 mfma

* Add bf8 @ fp8 mfma

* Update example

* Update instances

* Update profiler api

* Update for compatibility

* Format

* Remove extra example

* Clean up

* workaround convert

* added instance of f16_bf8f8, and client example

* fixed mfma selector

* format

---------
Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com>
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
Co-authored-by: Jing Zhang <jizha@amd.com>

04f93aad

Add conv bwd weight fp16 comp bf8 fp8 op, instances and example (#945) · 42facfc6

Rostyslav Geyyer authored Oct 04, 2023



* Add f8 bf8 gemm example

* Add element-wise ops

* Add intrinsics

* Update reference calculation

* Add an additional type option for xdlops gemm

* Fix build process

* Add bf8 to buffer addressing

* Update blockwise op, split typeA and typeB

* Update for compatibility

* Uppdate naming to f8->fp8

* Update naming

* Format

* Update naming (#937)

* Add a client example

* Add computetypes to device and gridwise ops

* Add instances, update instance factory

* Format

* Fix a flag

* Add ckProfiler mode

* Fix typos

* Add an example

* Add bf8 generator

* add bf8 mfma; fixed type_convert for bf8

* move verfication ahead of timing

* Update reference calculation

* Fix reference

* Narrow down float init range

* Fix bf8 bf8 mfma

* Add bf8 @ fp8 mfma

* Update example

* Update instances

* Update profiler api

* Update for compatibility

* Format

* Remove extra example

* Clean up

* workaround convert

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

42facfc6

3d grouped conv fwd with input/output fp16 and comp fp8 (#931) · e921e1f0

zjing14 authored Oct 03, 2023



* add f8 comp instance

* fixed

* fixed comments

* rename

* fixed dtype

* format

* fixed CI

* fixed ci

* add missing ComputeType

* fixed cit

* fixed

* Update cmake-ck-dev.sh

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

e921e1f0

03 Oct, 2023 3 commits
- changed test for grouped_gemm to be random (#959) · 5311d1b3
  zjing14 authored Oct 03, 2023
```
Co-authored-by: Jing Zhang <jizha@amd.com>
```
  5311d1b3
- Fixed contraction issues (#960) · aa46039f
  zjing14 authored Oct 03, 2023
```
* add missing ComputeType

* fixed

* Update cmake-ck-dev.sh

---------
Co-authored-by: Jing Zhang <jizha@amd.com>
```
  aa46039f
- add generic instances (#947) · f477fca4
  zjing14 authored Oct 03, 2023
```
Co-authored-by: Jing Zhang <jizha@amd.com>
```
  f477fca4