- 19 Aug, 2024 1 commit
-
-
Bartłomiej Kocot authored
* Add script to convert MIOpen driver to ckProfiler * Fix
-
- 16 Aug, 2024 3 commits
-
-
Illia Silin authored
* re-enable fp8 and bf8 for all targets * restore the fp8 gemm instances * re-enable conv_3d fp8 on all architectures * diasble several fp8 gemm instances on all architectures except gfx94 * clang format fix
-
Dan Yao authored
* tmp save * fix batch deterministic bugs * fix group deterministic bugs * codegen update * reorder files * bias support * hd256 bias support * bwd smoke test update * simplify convert dq * fix hd256 dropout scratch * do{}while() -> while(){} * comments * remove FmhaBwdTilePartitioner * save clear_tile * refactor dropout * code cleanup * code cleanup * comments * fix epilogue problem * fix fwd dropout * group convert_dq opt * fix dq alignment * Do not store storerandval in bwd for flash attention integration * fix hd32 error and boost performance * revert * Remove duplicated WarpGemm definitions in the policy file * dropout patch for mrepeat 16*16 * code sync up * dq_acc stride * dq_acc stride stuff * codegen update * fwd dropout revert * fix hd128 scratches and boost performance * receipt 3 for simplified smoke test * more strides for fa integration * fix hd64 scratches and boost performance * non-iglp pipeline for headdim padding cases * dpad same as dvpad for flash attention integration * unpadded lse&d for group mode * Support unpad layout for group lse * Support unpad lse layout for splitkv * Fix stride for splitkv kernel * fix unpadded lse issue in fwd splitkv * comment * solve lds read&write conflicts * rename * bias rename * tile index revert --------- Co-authored-by: danyao12 <danyao12> Co-authored-by:rocking <ChunYu.Lai@amd.com> Co-authored-by:
Qianfeng Zhang <Qianfeng.Zhang@amd.com>
-
Bartłomiej Kocot authored
* Add performance and large tensor tests for grouped conv * Resize tests * Resize tests * update the python script to parse the grouped_conv results * Remove int8 tests * change bwd wei layout --------- Co-authored-by:illsilin <Illia.Silin@amd.com>
-
- 15 Aug, 2024 2 commits
-
-
dependabot[bot] authored
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.6.2 to 1.7.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.2...v1.7.0 ) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
trixirt authored
* Check compiler flags before using The user's compiler may not support these flags, so check. Resolves failures on Fedora. Signed-off-by:
Tom Rix <trix@redhat.com> * fix syntax CMakeLists.txt Fix syntax in the check_cxx_compiler_flag. --------- Signed-off-by:
Tom Rix <trix@redhat.com> Co-authored-by:
Tom Rix <trix@redhat.com> Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com>
-
- 14 Aug, 2024 1 commit
-
-
Haocong WANG authored
* replace buffer_atomic with global_atomic * fixed global_atomic_add * added bf16 atomic_add * format * clang-format-12 * clean * clean * add guards * Update gtest.cmake * enabled splitk_gemm_multi_d * format * add ckProfiler * format * fixed naming * format * clean * clean * add guards * fix clang format * format * add kbatch printout * clean * Add rocm6.2 related gemm optimization * Limit bf16 atomic usage * remove redundant RCR gemm_universal instance * Add RRR fp8 gemm universal instance * Bug fix * Add GPU_TARGET guard to FP8/BF8 target * bug fix * update cmake * remove all fp8/bf8 example if arch not support * Enable fp8 RRR support in ckProfiler * limit greedy-reverse flag to gemm_universal in ckProfiler --------- Co-authored-by:
Jing Zhang <jizhan@fb.com> Co-authored-by:
Jing Zhang <jizhan@meta.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com> Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by:
illsilin <Illia.Silin@amd.com>
-
- 13 Aug, 2024 2 commits
-
-
AngryLoki authored
This fixes 2 issues when compiled with libc++. First issue is attempt to call std::numeric_limits<ranges::range_value_t<_Float16>>::min(). _Float16 is extension of libstdc++, it does not exist in C++ standard[2]. Luckily, there is NumericLimits class in composable_kernel, which does everything needed. Second issue with call to 'check_err' is ambiguous: there are 2 candidates. It happens because composable_kernel relies on idea that f8_t (defined as _BitInt(8)) does not pass is_integral trait. However, libc++ treats _BitInt(N) as integral (per standard "any implementation-defined extended integer types" can be integral). Closes: #1460 Signed-off-by:Sv. Lockal <lockalsash@gmail.com>
-
Mateusz Ozga authored
-
- 12 Aug, 2024 2 commits
-
-
Illia Silin authored
-
Mateusz Ozga authored
* Rewrite .sh test to Gtest * review chnages * Removew unused comments * Review v2 * Typo * Separete UT: AMAX, MAX, MIN; added template params to trigger them * Update test/reduce/reduce_no_index.cpp --------- Co-authored-by:Bartłomiej Kocot <barkocot@amd.com>
-
- 10 Aug, 2024 1 commit
-
-
Bartłomiej Kocot authored
* Fix typo in TransformConvFwdToGemm * Fix bug in n offset calculation
-
- 09 Aug, 2024 2 commits
-
-
arai713 authored
* initial push * cleaned up compiler errors * removed commented code * build codegen folder only for gfx9 targets * remove separate stage for codegen tests from CI * removed commented code from CMake --------- Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by:
illsilin <Illia.Silin@amd.com>
-
- 08 Aug, 2024 3 commits
-
-
Illia Silin authored
* enable CI build and test on gfx1201 * skip DL kernels in CI for gfx12 * only run CI on gfx12 if rocm version >= 6.2 * remove the rocm version check for CI on gfx12 * add a switch for CI builds on gfx12
-
Illia Silin authored
-
Illia Silin authored
-
- 07 Aug, 2024 4 commits
-
-
Juan Manuel Martinez Caamaño authored
* Remove reinterpret_cast uses that result in undefined behaviour. Use a bitcast instead. See https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_accessibility Closes #1439 * fix clang format --------- Co-authored-by:
illsilin <Illia.Silin@amd.com>
-
Illia Silin authored
-
dependabot[bot] authored
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.6.1 to 1.6.2. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.1...v1.6.2 ) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
Illia Silin authored
* run ck_tile benchmarks after the smoke tests and store logs * change the path of fmha benchmark logs * change the way of stashig ck_tile fmha logs * prevent the errors in stages where no logs are generated * fix the ck_tile fmha log names and headers * generate the fmha performance logs in the root folder * change jenkins scrip arguments format * use exact file names for stashing * modify scripts to process FMHA performance results * unstash FMHA logs before parsing them
-
- 06 Aug, 2024 7 commits
-
-
Max Podkorytov authored
-
Haocong WANG authored
-
Jun Liu authored
-
Juan Manuel Martinez Caamaño authored
-
bibek authored
* adding mha as static lib * add fmha fwd compile options * typo * fix python version * python version to 3 * increase path length * add max path flag in mha cmake * fix long path issue * mha currently only runs in gfx94x * only buld mha in mi300 * populate gpu_list * add mha compile flags * avoid building mha in gpu other then gfx94x * some comments and include ck_tile in rocm * use rocm_install * place ck_tile in include * correct ck_tile path --------- Co-authored-by:Illia Silin <98187287+illsilin@users.noreply.github.com>
-
jakpiase authored
* fix for beta!=0 in reduce * add reviewers suggestions
-
Bartłomiej Kocot authored
* Support 64 bit indexing * Add new grouped conv fwd kernel for large tensors * Add instances large tensor * Fixes for transform conv to gemm * Fixes * fixes * Remove not needed instances * examples fixes * Remove not need ds arrays * Fix tests * Add 2GB check in gridwise dl * Fixes
-
- 05 Aug, 2024 2 commits
-
-
Illia Silin authored
* add --offload-compress compiler flag * only apply the --offload-compress flag to the ckProfiler * move the --offload-compress flag back to main cmake file * add offload-compress to target compile option of ckProfiler --------- Co-authored-by:carlushuang <carlus.huang@amd.com>
-
Illia Silin authored
-
- 01 Aug, 2024 1 commit
-
-
Illia Silin authored
* add compiler flags to fix compiler issues * fix typo. * disable test_smfmac_op on all devices except gfx942 * specify full path to compiler in CI
-
- 31 Jul, 2024 4 commits
-
-
Sam Wu authored
-
zjing14 authored
* fixed a typo * clean --------- Co-authored-by:Jing Zhang <jizhan@fb.com>
-
arai713 authored
* added isSupportedArgument check into codegen device op * adding function call * remove commented code
-
carlushuang authored
-
- 30 Jul, 2024 2 commits
-
-
Illia Silin authored
-
Bartłomiej Kocot authored
-
- 26 Jul, 2024 2 commits
-
-
dependabot[bot] authored
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.6.0 to 1.6.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.0...v1.6.1 ) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
trixirt authored
A standard option in Fedora packaging that is used to check the correctness of c++ use of the standard c++ library. Signed-off-by:
Tom Rix <trix@redhat.com> Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com>
-
- 25 Jul, 2024 1 commit
-
-
zjing14 authored
* add rotating_buff for gemm_multi_d * format * Update flush_cache.hpp * Update gtest.cmake --------- Co-authored-by:
Jing Zhang <jizhan@fb.com> Co-authored-by:
Haocong WANG <haocwang@amd.com>
-