- 07 Aug, 2024 6 commits
-
-
Illia Silin authored
-
Juan Manuel Martinez Caamaño authored
* Remove reinterpret_cast uses that result in undefined behaviour. Use a bitcast instead. See https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_accessibility Closes #1439 * fix clang format --------- Co-authored-by:
illsilin <Illia.Silin@amd.com>
-
Illia Silin authored
-
dependabot[bot] authored
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.6.1 to 1.6.2. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.1...v1.6.2 ) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
Illia Silin authored
* run ck_tile benchmarks after the smoke tests and store logs * change the path of fmha benchmark logs * change the way of stashig ck_tile fmha logs * prevent the errors in stages where no logs are generated * fix the ck_tile fmha log names and headers * generate the fmha performance logs in the root folder * change jenkins scrip arguments format * use exact file names for stashing * modify scripts to process FMHA performance results * unstash FMHA logs before parsing them
-
Jing Zhang authored
-
- 06 Aug, 2024 10 commits
-
-
Max Podkorytov authored
-
Haocong WANG authored
-
Jun Liu authored
-
Juan Manuel Martinez Caamaño authored
-
bibek authored
* adding mha as static lib * add fmha fwd compile options * typo * fix python version * python version to 3 * increase path length * add max path flag in mha cmake * fix long path issue * mha currently only runs in gfx94x * only buld mha in mi300 * populate gpu_list * add mha compile flags * avoid building mha in gpu other then gfx94x * some comments and include ck_tile in rocm * use rocm_install * place ck_tile in include * correct ck_tile path --------- Co-authored-by:Illia Silin <98187287+illsilin@users.noreply.github.com>
-
jakpiase authored
* fix for beta!=0 in reduce * add reviewers suggestions
-
Jing Zhang authored
-
Jing Zhang authored
Merge branch 'jizhan/enable_bf16_atomic_add' of github.com:zjing14/composable_kernel into jizhan/enable_bf16_atomic_add
-
Jing Zhang authored
-
Bartłomiej Kocot authored
* Support 64 bit indexing * Add new grouped conv fwd kernel for large tensors * Add instances large tensor * Fixes for transform conv to gemm * Fixes * fixes * Remove not needed instances * examples fixes * Remove not need ds arrays * Fix tests * Add 2GB check in gridwise dl * Fixes
-
- 05 Aug, 2024 5 commits
-
-
Illia Silin authored
-
illsilin authored
-
Illia Silin authored
* add --offload-compress compiler flag * only apply the --offload-compress flag to the ckProfiler * move the --offload-compress flag back to main cmake file * add offload-compress to target compile option of ckProfiler --------- Co-authored-by:carlushuang <carlus.huang@amd.com>
-
Illia Silin authored
-
Illia Silin authored
-
- 02 Aug, 2024 5 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
Merge branch 'jizhan/enable_bf16_atomic_add' of github.com:zjing14/composable_kernel into jizhan/enable_bf16_atomic_add
-
Jing Zhang authored
-
Illia Silin authored
-
- 01 Aug, 2024 2 commits
-
-
Illia Silin authored
* add compiler flags to fix compiler issues * fix typo. * disable test_smfmac_op on all devices except gfx942 * specify full path to compiler in CI
-
zjing14 authored
-
- 31 Jul, 2024 12 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
Merge branch 'jizhan/enable_bf16_atomic_add' of github.com:zjing14/composable_kernel into jizhan/enable_bf16_atomic_add
-
Jing Zhang authored
-
zjing14 authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-