- 15 Nov, 2024 2 commits
-
-
Illia Silin authored
-
Illia Silin authored
-
- 05 Nov, 2024 1 commit
-
-
Illia Silin authored
-
- 01 Nov, 2024 1 commit
-
-
Illia Silin authored
* disable fp8 gemm_universal on gfx90a and gfx908 by default * fix cmake syntax * fix clang format * add ifdefs in amd_xdlops * disable fp8 gemm instances on gfx90a by default * update readme
-
- 23 Oct, 2024 1 commit
-
-
Illia Silin authored
-
- 18 Oct, 2024 1 commit
-
-
Illia Silin authored
-
- 10 Oct, 2024 1 commit
-
-
Illia Silin authored
-
- 09 Oct, 2024 1 commit
-
-
Illia Silin authored
-
- 07 Oct, 2024 2 commits
-
-
Illia Silin authored
* add a CK_USE_CODEGEN build argument to enable codegen * fix cmake codegen logic
-
Illia Silin authored
* update build logic with GPU_ARCHS * fix the GPU_ARCHS build for codegen * unset GPU_TARGETS when GPU_ARCHS are set
-
- 04 Oct, 2024 1 commit
-
- 13 Sep, 2024 1 commit
-
-
Jun Liu authored
* Legacy support: customized filesystem * Update cmakefile for python alternative path * fix build issues * CK has no boost dependency * More fixes to issues found on legay systems * fix clang format issue * Check if blob is correctly generated in cmake * fix the python issues * add a compiler flag for codegen when using alternative python * use target_link_options instead of target_compile_options --------- Co-authored-by:illsilin <Illia.Silin@amd.com>
-
- 04 Sep, 2024 1 commit
-
-
Illia Silin authored
* locate a newwer version of python when -DRHEL=ON flag is set * allow setting python version on cmake command line
-
- 23 Aug, 2024 1 commit
-
-
Illia Silin authored
-
- 22 Aug, 2024 1 commit
-
-
arai713 authored
* initial push - altering codegen build * fix the codegen cmake * enable codegen build for gfx908 and gfx90a * enable building codegen with INSTANCES_ONLY=ON * updating ck_rtc * remove gpu targets for codegen and rename tests * make codegen tests dependencies of tests and check targets --------- Co-authored-by:
illsilin <Illia.Silin@amd.com> Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com>
-
- 16 Aug, 2024 1 commit
-
-
Illia Silin authored
* re-enable fp8 and bf8 for all targets * restore the fp8 gemm instances * re-enable conv_3d fp8 on all architectures * diasble several fp8 gemm instances on all architectures except gfx94 * clang format fix
-
- 15 Aug, 2024 1 commit
-
-
trixirt authored
* Check compiler flags before using The user's compiler may not support these flags, so check. Resolves failures on Fedora. Signed-off-by:
Tom Rix <trix@redhat.com> * fix syntax CMakeLists.txt Fix syntax in the check_cxx_compiler_flag. --------- Signed-off-by:
Tom Rix <trix@redhat.com> Co-authored-by:
Tom Rix <trix@redhat.com> Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com>
-
- 14 Aug, 2024 1 commit
-
-
Haocong WANG authored
* replace buffer_atomic with global_atomic * fixed global_atomic_add * added bf16 atomic_add * format * clang-format-12 * clean * clean * add guards * Update gtest.cmake * enabled splitk_gemm_multi_d * format * add ckProfiler * format * fixed naming * format * clean * clean * add guards * fix clang format * format * add kbatch printout * clean * Add rocm6.2 related gemm optimization * Limit bf16 atomic usage * remove redundant RCR gemm_universal instance * Add RRR fp8 gemm universal instance * Bug fix * Add GPU_TARGET guard to FP8/BF8 target * bug fix * update cmake * remove all fp8/bf8 example if arch not support * Enable fp8 RRR support in ckProfiler * limit greedy-reverse flag to gemm_universal in ckProfiler --------- Co-authored-by:
Jing Zhang <jizhan@fb.com> Co-authored-by:
Jing Zhang <jizhan@meta.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com> Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by:
illsilin <Illia.Silin@amd.com>
-
- 09 Aug, 2024 1 commit
-
-
arai713 authored
* initial push * cleaned up compiler errors * removed commented code * build codegen folder only for gfx9 targets * remove separate stage for codegen tests from CI * removed commented code from CMake --------- Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by:
illsilin <Illia.Silin@amd.com>
-
- 08 Aug, 2024 1 commit
-
-
Illia Silin authored
-
- 06 Aug, 2024 1 commit
-
-
Jun Liu authored
-
- 01 Aug, 2024 1 commit
-
-
Illia Silin authored
* add compiler flags to fix compiler issues * fix typo. * disable test_smfmac_op on all devices except gfx942 * specify full path to compiler in CI
-
- 26 Jul, 2024 1 commit
-
-
trixirt authored
A standard option in Fedora packaging that is used to check the correctness of c++ use of the standard c++ library. Signed-off-by:
Tom Rix <trix@redhat.com> Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com>
-
- 16 Jul, 2024 2 commits
-
-
Mateusz Ozga authored
-
Illia Silin authored
* add a build parameter to build only XNACK targets * use ENABLE_ASAN_PACKAGING flag to set targets for ASAN builds --------- Co-authored-by:Bartłomiej Kocot <barkocot@amd.com>
-
- 10 Jul, 2024 1 commit
-
-
Illia Silin authored
-
- 27 Jun, 2024 1 commit
-
-
Illia Silin authored
-
- 19 Jun, 2024 1 commit
-
-
zjing14 authored
-
- 22 May, 2024 1 commit
-
-
Illia Silin authored
* set individual gpu targets for instances, examples, tests * fix path to hip compiler * fix path to hip compiler once more * aggregate device macros in ck_tile config header * fix the cmake logic for instances * fix clang format * add gfx900 and gfx906 to default set of targets
-
- 10 May, 2024 1 commit
-
-
Illia Silin authored
* code clean-up * remove the profiling output samples
-
- 01 May, 2024 1 commit
-
-
Illia Silin authored
-
- 18 Apr, 2024 1 commit
-
-
Illia Silin authored
* add rocm6.1 docker and make it default for CI * fix typo * move the rocm6.1 image into public dockerhub repo
-
- 16 Apr, 2024 1 commit
-
-
carlushuang authored
* enable gfx940 * switch between intrinsic mfma routines on mi100/200 and mi300 * fix mfma_int8 on MI300 * disable 2 int8 examples on MI300 * Update cmake-ck-dev.sh * restore gitignore file * modify Jenkinsfile to the internal repo * Bump rocm-docs-core from 0.24.0 to 0.29.0 in /docs/sphinx Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.24.0 to 0.29.0. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.24.0...v0.29.0 ) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by:
dependabot[bot] <support@github.com> * initial enablement of gfx950 * fix clang format * disable examples 31 and 41 int8 on gfx950 * add code * fix build wip * fix xx * now can build * naming * minor fix * wip fix * fix macro for exp2; fix warpgemm a/b in transposedC * unify as tuple_array * Update the required Python version to 3.9 * Update executable name in test scripts * re-structure tuple/array to avoid spill * Merge function templates * Fix format * Add constraint to array<> ctor * Re-use function * Some minor changes * remove wrong code in store_raw() * fix compile issue in transpose * Rename enum Rename 'cood_transform_enum' to 'coord_transform_enum' * let more integral_constant->constant, and formating * make sure thread_buffer can be tuple/array * temp fix buffer_store spill * not using custom data type by default, now we can have ISA-level same code as opt_padding * fix compile error, fp8 not ready now * fix fp8 duplicated move/shift/and/or problem * Default use CK_TILE_FLOAT_TO_FP8_STOCHASTIC rounding mode * fix scratch in fp8 kernel * update some readme * fix merge from upstream * sync with upstream * sync upstream again * sync 22 * remove unused * fix clang-format * update README of ck_tile example * fix several issue * let python version to be 3.8 as minimal * remove ck_tile example from default cmake target like all/install/check * remove mistake * 1).support receipe in generate.py 2).use simplified mask type 3).change left/right to pass into karg * fix some bug in group-mode masking and codegen. update README * F8 quantization for FMHA forward (#1224) * Add SAccElementFunction, PComputeElementFunction, OAccElementFunction in pipeline * Add element function to fmha api * Adjust P elementwise function * Fix bug of elementwise op, our elementwise op is not inout * Add some elementwise op, prepare to quantization * Let generate.py can generate different elementwise function * To prevent compiler issue, remove the elementwise function we have not used. * Remove f8 pipeline, we should share the same pipeline even in f8 * Remove remove_cvref_t * Avoid warning * Fix wrong fp8 QK/KV block gemm setting * Check fp8 rounding error in check_err() * Set fp8 rounding error for check_err() * Use CK_TILE_FLOAT_TO_FP8_STANDARD as default fp8 rounding mode * 1. codgen the f8 api and kernel 2. f8 host code * prevent warning in filter mode * Remove not-in-use elementwise function kargs * Remove more not-in-use elementwise function kargs * Small refinements in C++ source files * Use conditional_t<> to simplify code * Support heterogeneous argument for binary function types * Re-use already-existing scales<> functor template * Fix wrong value produced by saturating * Generalize the composes<> template * Unify saturates<> implementation * Fix type errors in composes<> * Extend less_equal<> * Reuse the existing template less_equal<> in check_err() * Add equal<float> & equal<double> * Rename check_err() parameter * Rename check_err() parameter * Add FIXME comment for adding new macro in future * Remove unnecessary cast to void * Eliminate duplicated code * Avoid dividing api pool into more than 2 groups * Use more clear variable names * Use affirmative condition in if stmt * Remove blank lines * Donot perfect forwarding in composes<> * To fix compile error, revert generate.py back to 4439cc107dd90302d68a6494bdd33113318709f8 * Fix bug of p element function * Add compute element op to host softmax * Remove element function in api interface * Extract user parameter * Rename pscale and oscale variable * rename f8 to fp8 * rename more f8 to fp8 * Add pipeline::operator() without element_functor * 1. Remove deprecated pipeline enum 2. Refine host code parameter * Use quantization range as input * 1. Rename max_dtype to dtype_max. 2. Rename scale to scale_s 3.Add init description * Refine description * prevent early return * unify _squant kernel name in cpp, update README * Adjust the default range. * Refine error message and bias range * Add fp8 benchmark and smoke test * fix fp8 swizzle_factor=4 case --------- Co-authored-by:
Po Yen Chen <PoYen.Chen@amd.com> Co-authored-by:
carlushuang <carlus.huang@amd.com> --------- Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
illsilin <Illia.Silin@amd.com> Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by:
Jing Zhang <jizha@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by:
Po-Yen, Chen <PoYen.Chen@amd.com> Co-authored-by:
rocking <ChunYu.Lai@amd.com>
-
- 12 Apr, 2024 1 commit
-
-
Illia Silin authored
* pass XDL and WMMA macros to libs that use CK * update config.h after XDL and WMMA macros get set
-
- 02 Apr, 2024 1 commit
-
-
Illia Silin authored
* parse examples inside the add_example_executable function * fix the example 64 cmake file * add xdl flag to the gemm_bias_softmax_gemm_permute example * add filtering of tests based on architecture type * enable test_grouped_gemm for gfx9 only * enable test_transpose only for gfx9 * only linnk test_transpose if it gets built * split the gemm instances by architectures * split gemm_bilinear,grouped_conv_bwd_weight instances by targets * split instances by architecture * split grouped_conv instances by architecture * fix clang format * fix the if-else logic in group_conv headers * small fix for grouped convolution instances * fix the grouped conv bwd weight dl instances * fix client examples * only enable client examples 3 and 4 on gfx9 * set the gfx9 macro * make sure the architecture macros are set by cmake * use separate set of xdl/wmma flags for host code * sinmplify the main cmake file * add conv_fwd_bf8 instance declaration
-
- 03 Jan, 2024 1 commit
-
-
Illia Silin authored
-
- 02 Jan, 2024 1 commit
-
-
Illia Silin authored
-
- 20 Dec, 2023 1 commit
-
-
Artur Wojcik authored
* enable compilation of INSTANCES_ONLY for Windows * suppress ROCMChecks warnings on GoogleTests * suppress -Wfloat-equal warning on GoogleTests --------- Co-authored-by:Illia Silin <98187287+illsilin@users.noreply.github.com>
-
- 19 Dec, 2023 2 commits
-
-
Jun Liu authored
* ROCm 6.0 replaces all __HIP_PLATFORM_HCC__ with __HIP_PLATFORM_AMD__ * make it backward compatible * Update .clang-tidy * Update ClangTidy.cmake
-
Illia Silin authored
-