- 21 Nov, 2023 1 commit
-
-
Chao Liu authored
* fix build for old ck examples * fix build for old ck
-
- 15 Nov, 2023 2 commits
-
-
Chao Liu authored
-
carlushuang authored
* support hdim=64/128 in same example code * support v transpose * revert gemm.cpp, not intent to modify it * remove useless code * fix a bug for swizzle C encoding, no perf change * optimize LDS encoding * update LDS layout * clean up code
-
- 03 Nov, 2023 2 commits
-
-
carlushuang authored
* unify q persistent in register * add refactor warp_gemm dispatcher
-
Po Yen Chen authored
-
- 30 Oct, 2023 1 commit
-
-
Po Yen Chen authored
-
- 27 Oct, 2023 1 commit
-
-
carlushuang authored
* support batch & nhead * support scale * tile scheduler * rename tile-scheduler to tile-partitioner * add some exp2 math * fix a bug when chaning tile size
-
- 26 Oct, 2023 1 commit
-
-
carlushuang authored
-
- 19 Oct, 2023 3 commits
-
-
Chao Liu authored
* refactor gemm+softmax+gemm using block-gemm * reorg files * clean
-
carlushuang authored
* Revert "Extract gemm0 prefetch0 out from loop" This reverts commit d3b56f39f9fd12edb476b24ae9cf480841d311e4. * add fmha fwd pipeline * Extract gemm0 prefetch0 out from loop * move blockSize to another place ; fix a missing header in tile_window_impl_static_distribution.hpp * remove KArgs from tile modules --------- Co-authored-by:
Po-Yen, Chen <PoYen.Chen@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
- 18 Oct, 2023 1 commit
-
-
Po Yen Chen authored
* Extract store_tile() logics as method * Extract load_tile() logics as method * Rename type alias * Extract common logics as traits * Remove unnecessary access specifier * Add ComputeMode for TileWindowWithStaticDistribution * Put field check into Traits * More definition of Traits types * Use more clear static_assert() message * Enable pre-compute coordinates in store_tile() * Re-formate static assert * Undo changes to the wrong method * Enable pre-compute coords for store_tile() * Remove static_vector usage * Add method to move non-member coordinates * Force using pre-computed coordinates in Store() * Fix wrong access for SFC_Ys * Change comment * Allow users to hint # access per coord * Add comment for noting remove data members later * Unify FIXME comments * Replace FIXME comments by TODO * Let user specify HintNumCoords * clean * clean * clean * clean * refactor load/store for window * clean * clean * bug fix for window; clean --------- Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 12 Oct, 2023 2 commits
-
-
Chao Liu authored
* refactor * refactor * change load_tile, update block gemm * debug * clean * clean * experiment lod * workaround spilling issue * clean
-
carlushuang authored
* slice kv, and use 3d padding LDS layout * add missing sync * put sync to another poace * move sync place * revert to normal
-
- 06 Oct, 2023 1 commit
-
-
carlushuang authored
* add tensor slicing API * remove redundant ck namespace * better gemm_gemm interface * modify gemm_gemm * add slice_tile api * fix merge bug * update to 3d padding, since we no longer need that much LDS size * clean * cleang * clean * clean * clean * clean * clean * clean * clean * clean * clean * clean * clean * clean --------- Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 03 Oct, 2023 1 commit
-
-
Chao Liu authored
* adding in-thread shuffle * update softmax example * refactor grid gemm * refactor gemm: layouts * bug fix * clean * clean
-
- 14 Sep, 2023 2 commits
- 13 Sep, 2023 1 commit
-
-
Chao Liu authored
* adding gemm+softmax+gemm
-
- 07 Sep, 2023 1 commit
-
-
Po Yen Chen authored
-
- 05 Sep, 2023 5 commits
-
-
Chao Liu authored
-
Chao Liu authored
Tile Program init bulk PR --------- Co-authored-by:
zjing14 <zhangjing14@gmail.com> Co-authored-by:
Po-Yen, Chen <PoYen.Chen@amd.com>
-
Bartłomiej Kocot authored
* Add image to column kernel * Add instances, tests, profiler, example * Add client example * Several fixes of image to column * Fix variable name in device_image_to_column_impl * Several fixes of image to column profiler * Fix num_btype calculation * Make new mesaurements for correct bytes calculation
-
Bartłomiej Kocot authored
-
Bartłomiej Kocot authored
* Fix K padding calculation for grouped conv data * Restore previous padd for 1x1 specialization
-
- 04 Sep, 2023 1 commit
-
-
Lauren Wrubleski authored
-
- 31 Aug, 2023 3 commits
-
-
zjing14 authored
* move all arguments into device * add b2c_tile_map * add examples * add SetDeviceKernelArgs * dedicated fixed_nk solution * init client api * add grouped_gemm_bias example * add a instance * add instances * formatting * fixed cmake * Update EnableCompilerWarnings.cmake * Update cmake-ck-dev.sh * clean; fixed comments * fixed comment * add instances for fp32 output * add instances for fp32 output * add fp32 out client example * fixed CI * init commit for kbatch * add splitk gridwise * format * fixed * clean deviceop * clean code * finish splitk * fixed instances * change m_loops to tile_loops * add setkbatch * clean code * add splitK+bias * add instances * opt mk_nk instances * clean examples * fixed CI * remove zero * finished non-zero * clean * clean code * optimized global_barrier * fixed ci * fixed CI * removed AddBias * format * fixed CI * fixed CI * move 20_grouped_gemm to 21_grouped_gemm --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
rocking authored
* Add maxpool instances * Rename index pool to max pool. * Add maxpool bwd bf16 instances * Add avg pool bwd instances * Rename avgpool and maxpool to avg_pool3d and max_pool * Add bf16 pool fwd instances * Add max pool bwd to ckProfiler * Add avg pool3d bwd to ckProfiler * Add avg pool bwd test * Fix bug of reference pool fwd (dilation) * Fix bug of max pool bwd (dilation and initZero) * Support bf16 compute data type * Force compute type be f32. Because atomicAdd only support f32 * Add max pool bwd test * Rename folder * Rename pool * Add max pool bwd client example * Add avg pool bwd client example * Add missing workspace * clang format * Rename macro * remove useless header * remove useless layout
-
Illia Silin authored
-
- 30 Aug, 2023 1 commit
-
-
Bartłomiej Kocot authored
-
- 29 Aug, 2023 1 commit
-
-
zjing14 authored
* add an example of customized bfp16_rtn * fixed threadwise_copy --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 28 Aug, 2023 1 commit
-
-
zjing14 authored
* add compute_type * add multiply_add ckProfiler * add f8_fp16 support * clean * clean * fixed lds size calc * format --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 23 Aug, 2023 4 commits
-
-
Jun Liu authored
* experiment with config file * experiment with version.h config * add more info to version.h * minor updates * minor updates * fix case where DTYPE is not used * large amount of files but minor changes * remove white space * minor changes to add more MACROs * fix cmakedefine01 * fix issue with CK internal conflict * fix define and define value * fix clang-format * fix formatting issue * experiment with cmake * clang format v12 to be consistent with miopen * avoid clang-format for config file
-
Qianfeng authored
-
Illia Silin authored
-
zjing14 authored
Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 22 Aug, 2023 3 commits
-
-
zjing14 authored
* updated regular gemm * update ckProfiler * fixed gtests --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
Bartłomiej Kocot authored
* Fix transform and instances for grouped conv bwd data * Add instances for small K and small C * Remove workaround after fix * Fix interface tests
-
Rostyslav Geyyer authored
* Add ComputeType arg to splitk device and gridwise ops * Update for gridwise op compatibility * Update bf16 and int8 splitk gemm examples with ComputeType * Add instances * Update ckProfiler for mixed precision cases * Add a mixed precision splitK gemm client example --------- Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
- 18 Aug, 2023 1 commit
-
-
cloudhan authored
-