- 21 May, 2024 4 commits
-
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
Add additiona unit-test.
-
- 14 May, 2024 4 commits
-
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
- 11 May, 2024 1 commit
-
-
Illia Silin authored
-
- 10 May, 2024 3 commits
-
-
Illia Silin authored
* code clean-up * remove the profiling output samples
-
carlushuang authored
* add random norm * normalized default to 0/3 * change squant->auto
-
Bartłomiej Kocot authored
-
- 09 May, 2024 2 commits
-
-
Adam Osewski authored
-
Adam Osewski authored
-
- 08 May, 2024 2 commits
-
-
Illia Silin authored
-
Bartłomiej Kocot authored
-
- 07 May, 2024 2 commits
-
-
Illia Silin authored
* enable logging using environment variable * update ck.hpp header * fix typo * fix clang format * Update include/ck/utility/env.hpp Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com> --------- Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com>
-
carlushuang authored
* add alibi support * fix code * update code based on comment * Support more hdim * fix fp8 bias * support seqlen_k=0 case * remove unused printf * fix format --------- Co-authored-by:rocking <ChunYu.Lai@amd.com>
-
- 06 May, 2024 3 commits
-
-
Sam Wu authored
Also add component owners as codeowners for header directory
-
Adam Osewski authored
* fix Accumulation when there's only one workgroup per K dim. * Update occupancy values after KBatch update and fix it's calculation.
-
Adam Osewski authored
-
- 02 May, 2024 1 commit
-
-
Illia Silin authored
-
- 01 May, 2024 4 commits
-
-
Illia Silin authored
-
Illia Silin authored
-
Rostyslav Geyyer authored
-
Sam Wu authored
* Update documentation requirements Set rocm-docs-core to v1.1.1 * Update RTD config Set Python 3.10 for rocm-docs-core >= v1.0.0
-
- 30 Apr, 2024 2 commits
-
-
Illia Silin authored
* add a daily build for instances for gfx9;gfx10;gfx11 * fix jenkins logic for instances only build * fix the path for instance_only build * reduce the number of build threads to 32
-
Adam Osewski authored
Add proper dependency target.
-
- 29 Apr, 2024 3 commits
-
-
Rostyslav Geyyer authored
* Add a flag * Add flag check and messages --------- Co-authored-by:root <root@aus-g7-rogeyyer.amd.com>
-
Adam Osewski authored
-
Adam Osewski authored
-
- 26 Apr, 2024 4 commits
-
-
Haocong WANG authored
* Add bf16 instances * Add bf16 gemm universal example * tempsave * Add guard to navi compilation * workground on a specific mixed gemm instance ( bring back it when compiler fix upload) * fix formatting condition statement issue * solve conflict --------- Co-authored-by:Jun Liu <Liu.Jun@amd.com>
-
Rostyslav Geyyer authored
-
zjing14 authored
* Overload output stream operator for LoopScheduler and PiplineVersion * Add Run overload accepting grid descriptors MK. * Add __device__ keyword for CalculateGridSize * Create device op GroupedGemmMultipleD * Add GroupedGemm MultipleD Tile Loop implementation. * Add an example for GroupedGemm MultipleD tile loop. * Device Op GroupedGEMMTileLoop. * Bunch of small changes in exmaple. * CkProfiler * Remove unused tparam. * changed the copy function to v7r2 * adding multi_abd * in-progress * add post-load oob check * Fix include statement. * Fix output stream overloads. * Do not make descriptors and check validity untill we find group. * Fix gemm desc initialization. * debugging * adjust instances * add run_lds * add elemntwise_op * replace multi_abd_device with v3 * clean up * clean * clean * Revert device op * Fix compilation for DTYPES=FP16 * Validate tensor transfers paramters. * Added LDSType * profiling * adjust oobcheck * add missing file * Validate on host only NK dims if M is not known. * add * clean * refactor * clean * add examples * add fuse * add fusion and client example * Fix bug. * A convenient debug func for selecting threads. * Fix has main k block loop bug. * Make sure that b2c has up to date tile offset. * Output stream operator for Sequence type. * Cmake file formatting. * clean --------- Co-authored-by:Adam Osewski <Adam.Osewski@amd.com>
-
zjing14 authored
* changed the copy function to v7r2 * adding multi_abd * in-progress * add post-load oob check * debugging * adjust instances * add run_lds * add elemntwise_op * replace multi_abd_device with v3 * clean up * clean * clean * Added LDSType * profiling * adjust oobcheck * add missing file * refactor * clean * add examples
-
- 25 Apr, 2024 2 commits
-
-
Adam Osewski authored
* Overload output stream operator for LoopScheduler and PiplineVersion * Add Run overload accepting grid descriptors MK. * Add __device__ keyword for CalculateGridSize * Create device op GroupedGemmMultipleD * Add GroupedGemm MultipleD Tile Loop implementation. * Add an example for GroupedGemm MultipleD tile loop. * Device Op GroupedGEMMTileLoop. * Bunch of small changes in exmaple. * CkProfiler * Remove unused tparam. * Fix include statement. * Fix output stream overloads. * Do not make descriptors and check validity untill we find group. * Fix gemm desc initialization. * Revert device op * Fix compilation for DTYPES=FP16 * Validate tensor transfers paramters. * Validate on host only NK dims if M is not known. * Fix bug. * A convenient debug func for selecting threads. * Fix has main k block loop bug. * Make sure that b2c has up to date tile offset. * Output stream operator for Sequence type. * Cmake file formatting.
-
ltqin authored
* add flush cache to device op * add flush cache parameter to ckProfiler * change calculate size a and b method * chang evaluation time method foro AVERAGE to MEDIAN * format code * adjust some code * fix core dumped * remove loop call flush icache in kernel * remove loop(outer) call flush icache --------- Co-authored-by:letaoqin <letaoqin@amd.com>
-
- 24 Apr, 2024 2 commits
-
-
Adam Osewski authored
-
Adam Osewski authored
-
- 23 Apr, 2024 1 commit
-
-
Bartłomiej Kocot authored
-