- 25 Jul, 2024 1 commit
-
-
zjing14 authored
* add rotating_buff for gemm_multi_d * format * Update flush_cache.hpp * Update gtest.cmake --------- Co-authored-by:
Jing Zhang <jizhan@fb.com> Co-authored-by:
Haocong WANG <haocwang@amd.com>
-
- 27 Jun, 2024 1 commit
-
-
Illia Silin authored
-
- 22 May, 2024 1 commit
-
-
Bartłomiej Kocot authored
* Optimize grouped conv bwd weight for small M and N * Fixes
-
- 17 May, 2024 1 commit
-
-
Illia Silin authored
-
- 10 May, 2024 1 commit
-
-
Illia Silin authored
* code clean-up * remove the profiling output samples
-
- 08 May, 2024 1 commit
-
-
Illia Silin authored
-
- 07 May, 2024 1 commit
-
-
Illia Silin authored
* enable logging using environment variable * update ck.hpp header * fix typo * fix clang format * Update include/ck/utility/env.hpp Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com> --------- Co-authored-by:
Bartłomiej Kocot <barkocot@amd.com>
-
- 02 May, 2024 1 commit
-
-
Illia Silin authored
-
- 25 Apr, 2024 1 commit
-
-
ltqin authored
* add flush cache to device op * add flush cache parameter to ckProfiler * change calculate size a and b method * chang evaluation time method foro AVERAGE to MEDIAN * format code * adjust some code * fix core dumped * remove loop call flush icache in kernel * remove loop(outer) call flush icache --------- Co-authored-by:letaoqin <letaoqin@amd.com>
-
- 02 Feb, 2024 1 commit
-
-
Illia Silin authored
* add support for navi2x and navi3x models * fix syntax * use common macro for different mi300 architectures
-
- 09 Jan, 2024 2 commits
-
-
Illia Silin authored
* allow setting the number of warmup cycles and iterations for profiler * fix the gemm_splitk and grouped_gemm examples
-
raramakr authored
SWDEV-439954 - Use hard coded filename rather than using the macro __FILE__ for debug prints. (#1123) * SWDEV-439954 - Use hard coded filename rather than using the macro __FILE__ for debug prints. Hiptensor library is using the header files from CK. Hard coded ROCm path was getting embedded into the hiptensor library, since the header file was having the macro __FILE__. Replace the macro with filename. * fix syntax --------- Co-authored-by:illsilin <Illia.Silin@amd.com>
-
- 07 Dec, 2023 1 commit
-
-
Illia Silin authored
* switch from ROCmSoftwarePlatform to ROCm org * replace ROCmSoftwarePlatform with ROCm in few more places
-
- 25 Nov, 2023 1 commit
-
-
Bartlomiej Wroblewski authored
* Add basic support for direct loads from global to LDS * Clean the code and comments * Add support for fp16 * Add comments * Add check for thread cluster lengths * Align non-direct-load fp16 example * Small fixes * Extend IsSupported to check for supported GPU gens * Build examples only on the supported HW * Do not throw when instance not supported in 04 example * Review: Apply review suggestions * Review: small fix * Review: small fix
-
- 17 Nov, 2023 1 commit
-
-
zjing14 authored
* improve 4k gemm perf * add f8 instances * format --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 19 Oct, 2023 1 commit
-
-
Qianfeng authored
* reinterpret_cast to const char* in dumpBufferToFile to be compatible with both const and non-const input pointers * Add seed input to GeneratorTensor_4 for normal_distribution generator * Add GetTypeString() for DeviceElementwiseImpl * Add HIP_CHECK_ERROR macro
-
- 27 Sep, 2023 1 commit
-
-
Illia Silin authored
* Added error check after kernel launch (#919) Co-authored-by:
Xiaodong Wang <xdwang@meta.com> Co-authored-by:
Xiaodong Wang <xw285@cornell.edu> * remove M=0 test cases for test_gemm_splitk --------- Co-authored-by:
Xiaodong Wang <xdwang@meta.com> Co-authored-by:
Xiaodong Wang <xw285@cornell.edu>
-
- 26 Jul, 2023 2 commits
-
-
carlushuang authored
* initial stream-k implementation with example * fix unexpected change in err * improve a little bit performance by reorganize pipeline. * improve perf a little bit by swizzle block idx * add profiler * update example * fix spelling * shrink karg for streamk * support dynamic buffer using memory coherence glc_slc bit from template * control memory coherence while construct dynamic buffer * update reduction for streamk(not ready yet) * Add template parameter to make_dynamic_buffer to support amd_buffer coherence setting * fix build issue * fix several bug * now result is correct, everything works (but has scratch) * remove scratch by manually reset coordinate * update device code * fix a bug in final reduce * fix something in example * update async memset * fix enum as camel case * modify coherence enum name * clean code and use atomic streamk by default * remove unused var * throw exception if have empty pointer * fix format * fix CI warning * fix type in init * modify CI error * filter out on gfx10+ * restore changed example code --------- Co-authored-by:Qianfeng Zhang <Qianfeng.Zhang@amd.com>
-
Bartłomiej Kocot authored
* Disable XDL kernels on unsupported HW; Add ck::is_xdl_supported function (#765) * Do not throw an error when GEMM problem is not supported. --------- Co-authored-by:
Bartlomiej Wroblewski <bwroblewski10@gmail.com> Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com>
-
- 19 Jun, 2023 1 commit
-
-
rocking authored
* Add maxpool f32 kernel and example * Revise copyright * Add device pool bwd device op * Support f16 and bf16 * Add compute datatype for reference code. Prevent error in bf16 * Fix type error * Remove layout * Fix bf16 error * Add f16 and bf16 example * Add more operations * Implement IsSupportedArgument * Add changelog * Add comment * Add comment * Remove useless header * Move initialize of workspace to the run * Move set din zero to the device operator * Save din_length_raw * Remove useless header * Calculate gridsize according to the number of CU * Calculate gridSize according to the number of CU. Remove useless header * Add put example * Remove useless header * Fix CI fail
-
- 15 Jun, 2023 1 commit
-
-
Qianfeng authored
* Add getAvailableComputeUnitCount() interface * Use available number of compute units to set kernel grid size
-
- 31 May, 2023 1 commit
-
-
Illia Silin authored
-
- 15 Feb, 2023 1 commit
-
-
Illia Silin authored
* clean up output from kernel_launch * set RUN_WARMUP to 0 by default * split the warm-up into a separate issue --------- Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
- 29 Jul, 2022 1 commit
-
-
Chao Liu authored
* convnd_fwd fp16 example * update example * update example * update instance * updating refernce conv * update reference conv * update conv fwd profiler * update conv 1d and 3d instance * update include path * clean * update profiler for conv bwd data and weight * update conv bwd weight * clean * update conv example * update profiler for conv bwd weight * update ckprofiler for conv bwd data * fix reference conv bwd data bug; update conv bwd data test * update examples * fix initialization issue * update test for conv fwd * clean * clean * remove test case too sensitive to error threshhold * fix test * clean * fix build * adding conv multiple d * adding conv multiple D * add matrix padder * add gemm padding to convnd * adding group conv * update gemm multi-d * refactor * refactor * refactor * clean * clean * refactor * refactor * reorg * add ds * add bias * clean * add G * adding group * adding group * adding group * update Tensor * clean * update example * update DeviceGemmMultipleD_Xdl_CShuffle * update conv bwd-data and bwd-weight * upate contraction example * update gemm and batch gemm with e permute * fix example build * instance for grouped conv1d * update example * adding group conv instance * update gemm bilinear instance * update gemm+add+add+fastgelu instance * update profiler * update profiler * update test * update test and client example * clean * add grouped conv into profiler * update profiler * clean * add test grouped conv, update all conv test to gtest * update test
-
- 25 Jun, 2022 1 commit
-
-
Chao Liu authored
* ad gelu and fast_gelu * added GeLU and fast GeLU * clean up * add gemm+fastgelu example * add gemm+gelu instances * update profiler * clean up * clean up * adding gemm+bias+activation * clean * adding bias * clean * adding gemm multiple d * debugging * add gemm bias add fastgelu * rename, clean * refactoring; add readme * refactor * refactor * refactor * refactor * refactor * refactor * fix * fix * update example * update example * rename * update example * add ckProfiler * clean * clean * clean * clean * add client app example * update readme * delete obselete files * remove old client app * delete old file * cleaning * clean * remove half * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path for all examples * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * revert client app example * clean build * fix build * temporary disable client test on Jenkins * clean * clean * clean
-
- 24 May, 2022 1 commit
-
-
Jianfeng Yan authored
* start adding navi21 GEMM * navi_gemm_km_kn_mn_fp32 compiles and passes one test. * rename variables and functions in gridwise_gemm_dlops_v1r3 * add other 3 layouts; format instance * adding more tuning parameters add tuning parameters for other 3 layouts * add gemm_dlops_f16 * tmp * add dependence of DeviceGemm::IsSupportedArg() on arch * minor changes * minor changes * minor changes * minor changes * minor changes * minor changes * minor changes * push gemm_dlops into profiler * minor changes * if using xdl or dlops is moved into profiler_gemm_impl * minor changes * minor changes * remove is_xdl from profile_gemm_impl * make IsSupportedArg dependent on arch for other device_gemm * minor changes * minor changes * fix a bug in f_generate_tensor_value * add 64x64x64 for gemm_dlops_int8 * add 64x64x64 for gemm_dlops_int8 * comment out 3 layouts in gemm_dlops_int8; add 32x32x32 for gemm_dlops_int8; init A values to 1 * fix * start fixing tuning parameters * monir * minor changes * minor changes * minor changes * fixing * adding example * adding example * adding example * add gemm fp32 example * clean up * use 128x128x16 as MNK tile in navi21 gemm example * bug fix * fix test * use new block c tile * clean * fix build Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
shaojiewang <wsjmessi@163.com>
-