- 10 Mar, 2024 1 commit
-
-
Jing Zhang authored
-
- 09 Mar, 2024 5 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
- 08 Mar, 2024 2 commits
-
-
Jing Zhang authored
-
illsilin authored
-
- 06 Mar, 2024 1 commit
-
-
Paul Fultz II authored
* Format * Format * Format * Remove const * Use the right template * Format * Format * add row/col instances * Add missing file * fixed * Format * Updates * Format * fixed rrr layout * Format * Update test and embed modules * Restore older version * Update year * Set -fPIC * Format * Use double for isnan * rename host folder to codegen + minor fix * add codegen CI test * add option to build components without building CK * fix the groovy syntax * fix typo * use the correct function for the codegen stage --------- Co-authored-by:
Jing Zhang <jizha@amd.com> Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by:
illsilin <Illia.Silin@amd.com>
-
- 01 Mar, 2024 1 commit
-
-
Rostyslav Geyyer authored
* Update clipping for fp8 conversion * Add clipping for bf8 conversion * Format
-
- 29 Feb, 2024 3 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
- 28 Feb, 2024 2 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
- 27 Feb, 2024 4 commits
-
-
Illia Silin authored
* clip fp8 to +/-240 on all targets * if inputs to fp8 conversion are +/-inf, they remain unaltered * increase tolerance for test_elementwise_layernorm to prevent false errors * change the input values for gemm examples to floats * reduce gemm example float input values to prevent errors * increase the tolerance for gemm examples
-
aska-0096 authored
-
aska-0096 authored
-
aska-0096 authored
-
- 26 Feb, 2024 1 commit
-
-
aska-0096 authored
-
- 24 Feb, 2024 1 commit
-
-
Jing Zhang authored
-
- 21 Feb, 2024 1 commit
-
-
jakpiase authored
* add support for mixed precision bf16&int8 grouped gemm * fix gfx versions and add bf16 kbatch condition * added reviewers comments
-
- 20 Feb, 2024 1 commit
-
-
Bartłomiej Kocot authored
* Extend permute scale support up to 6D * Fixes * Fixes * Update profiler/README.md Co-authored-by:
Lisa <lisajdelaney@gmail.com> * Update profiler/README.md Co-authored-by:
Lisa <lisajdelaney@gmail.com> * Update profiler/README.md Co-authored-by:
Lisa <lisajdelaney@gmail.com> * Update profiler/README.md Co-authored-by:
Lisa <lisajdelaney@gmail.com> * Update profiler/README.md Co-authored-by:
Lisa <lisajdelaney@gmail.com> * Update profiler/README.md Co-authored-by:
Lisa <lisajdelaney@gmail.com> * Update profiler/README.md Co-authored-by:
Lisa <lisajdelaney@gmail.com> --------- Co-authored-by:
Lisa <lisajdelaney@gmail.com>
-
- 17 Feb, 2024 1 commit
-
-
Jing Zhang authored
-
- 16 Feb, 2024 1 commit
-
-
Jing Zhang authored
-
- 15 Feb, 2024 1 commit
-
-
illsilin authored
-
- 14 Feb, 2024 1 commit
-
-
illsilin authored
-
- 13 Feb, 2024 2 commits
-
-
Bartłomiej Kocot authored
* Add optimized blockwise gemm using ck wrapper * Add basic gemm example * Update docs * Add tutorial for gemm using ck wrapper * Add perf note * edits * Fix cmake * Fixes --------- Co-authored-by:Lisa Delaney <lisa.delaney@amd.com>
-
Bartłomiej Kocot authored
-
- 12 Feb, 2024 1 commit
-
-
zjing14 authored
* add delayed cvt * extend fp16 gemm_splitk instances for fp8_fp16 gemm * add f8 example * add 128 kperblk instances for fp8 * add kpb128 instance * added more instances into kpb128 * clean code * clean code * fix * fix * fixed * Update example/35_splitK_gemm/splitK_gemm_xdl_fp16_fp8.cpp Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com> * Update include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer.hpp Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com> * Update library/src/tensor_operation_instance/gpu/gemm_splitk/device_gemm_xdl_splitk_f16_fp8_f16_mk_nk_mn_kpb128_instance.cpp Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com> --------- Co-authored-by:
Jing Zhang <jizha@amd.com> Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com>
-
- 08 Feb, 2024 1 commit
-
-
Lakhinder Walia authored
-
- 07 Feb, 2024 2 commits
-
-
jakpiase authored
-
Bartlomiej Wroblewski authored
* WIP: Implement direct loads split-K GEMM kernel * Clean the review --------- Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com>
-
- 02 Feb, 2024 2 commits
-
-
Illia Silin authored
* add support for navi2x and navi3x models * fix syntax * use common macro for different mi300 architectures
-
Bartłomiej Kocot authored
-
- 31 Jan, 2024 1 commit
-
-
Bartłomiej Kocot authored
* Add blockwise gemm to ck wrapper * Add blockwise gemm traits * Disable test_gemm for non xdl devices * Fixes * Add c layout descritpions
-
- 24 Jan, 2024 1 commit
-
-
Illia Silin authored
* fix cppcheck errors, first pass * fix format * fix returned value in examples * add macro definitions for cppcheck * fix the profile_gemm logic * update the gemm profiler logic * add more difinitions to cppcheck, fix couple more errors * replace runtime error with message in device function * fix a couple of int4 issues * no return for fill function * fix errors in data_types.hpp * fix format * fix few remaining errors * fix errors in data_types.hpp * fix last couple of errors in datat_types.hpp
-
- 19 Jan, 2024 2 commits
-
-
Haocong WANG authored
* Optimize GEMM on MI200/300: 1. Add new blockwise gemm pipeline 2. Add irregular splitk intances * clang format + typo fix * Fix a bug
-
Bartłomiej Kocot authored
* Add optimized copy to ck wrapper * Example optimizations * Fixes * Move img2col test to client example * Refactor example * Fix docs * Fixes * Fix * Fixes * Fixes * Fixes * Fixes * Fixes --------- Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
- 15 Jan, 2024 1 commit
-
-
Illia Silin authored
* add cppcheck to the CK CI * fix the path to CK source for cppcheck * fix the path to CK source for cppcheck one more time * fix the path to CK source for cppcheck third time * change the path to ck_cppcheck.log * install latest cppcheck from source * fix bug in ck.hpp and use 20 threads for cppcheck * create a switch to turn cppckeck on and off in CI
-