- 18 Jan, 2023 1 commit
-
-
Raman R jana authored
* wavelet gemm programming model support for CK * GEMM pipeline update for wavelet progrmmaing model * Updated wavelet programming pipeline * fixes for global-write for math-wave * fixed bug in global writes * Updated comments for better readability * fixed clang format errors * added block_lds without barrier sync * clean * clean * clean * clean * refactor * prototype 4 layouts fix default stride all problem sizes tidy move file update build script restore old file fix build * refactor standalone test to use gemm test harness * simplify gemm test * update build script * remove redundant * early return when cmd arg doesn't match * tidy * report failure when result not validated * tidy * Add comment depicting B2C mapping pattern. * Formatting & comments. * Comparison with custom B2C mapping pattern. * Example for wavelet gemm. * Add wavelet to Gemm standalone test. * Remove debug code. * Remove dangling #endif directive. Co-authored-by: root <Raman Jana> Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
Anthony Chang <ac.chang@outlook.com> Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com>
-
- 13 Oct, 2022 1 commit
-
-
Adam Osewski authored
* Move kernel implementation files under impl directory. * Update examples paths. * Update device kernel impl include paths. * Update tensor operation instances include paths. * Update profiler and tests include paths. * Clang-format * Update include paths for batched gemm reduce * Refactor UnitTest ConvNDBwdWeight. * Refactor fwd and bwd data convND UT. * Fix used test macro. * Fix include path. * Fix include paths. * Fix include paths in profiler and tests. * Fix include paths. Co-authored-by:Adam Osewski <aosewski@amd.com>
-
- 23 Aug, 2022 1 commit
-
-
Po Yen Chen authored
* Add GEMM examples for int4 Currently the source files are just copied from int8 examples * Re-use pre-defined alias in int4 exmples * Distinguish user-side type from kernel-side type * Add int4_t support for check_err() * Allow conversion between Tensor<> specializations * Re-format source files * Use different type for host tensors * Re-use CopyAsType<>() to implement copy ctor * Re-use element-wise operation type alias * Fix typo in alias names * Complete the int4 examples * Add constraint to Tensor<> templated methods * Add type traits 'is_signed_integral<>' * Add type constraints for integer version check_err<>() * Allow comparing different-sized integral types in check_err() * Check converted Tensor<int4_t> with golden Tensor<int8_t> * Remove constraint of Tensor<>::CopyAsType() * Avoid compilation error while disabling ck::int4_t support * Remove debug messages * Add #error directive to prevent compile sources with wrong setting * Simplify tensor usages in examples * Add constraint to check_err() input reference type * Align design with other PR * Use ""_uz to simplify example code * Avoid too much generalizing check_err() * Re-format GEMM instance template arguments * Extract int4 example common codes * Sort include directives * Move #include directives into new header * Move common codes together * Re-format template argument in example code * Reuse same implementation code for most of GEMM examples * Re-format common.hpp * Unify structured comment in examples * Use reinterpret_cast<>() for cross-type pointer conversion * Revert "Add type traits 'is_signed_integral<>'" This reverts commit f2c148efaedf42c8ee66032dac6d13a1003b0f3a. * Allow unsigned integer arguments for check_err() * Fix compilation error in check_err() * Remove unnecessary copy ctor for Tensor<> * Mark Tensor<> special member functions as 'default' * Use more strict condition to add code in examples * Fix wrong program return value of GEMM examples * Handle the case while user specify all the strides * Fix never-ran examples * Exit successfully if GEMM instance does not support given problem * Add missing 'else' keyword * Re-format CMakeLists.txt * Add wrapper function to hide value conversion while copying memory * Add new DeviceMem API to copy memory * Use new DeviceMem API to implement examples * Revert "Add new DeviceMem API to copy memory" This reverts commit 3f190b0779ceedf7aaf0b380712fda0518de72c1. * Add conversion ctor for Tensor<> * Write Tensor<> conversion logics explicitly in example code * Convert Tensor<> values after transfer data to host
-
- 29 Jul, 2022 1 commit
-
-
Chao Liu authored
* convnd_fwd fp16 example * update example * update example * update instance * updating refernce conv * update reference conv * update conv fwd profiler * update conv 1d and 3d instance * update include path * clean * update profiler for conv bwd data and weight * update conv bwd weight * clean * update conv example * update profiler for conv bwd weight * update ckprofiler for conv bwd data * fix reference conv bwd data bug; update conv bwd data test * update examples * fix initialization issue * update test for conv fwd * clean * clean * remove test case too sensitive to error threshhold * fix test * clean * fix build * adding conv multiple d * adding conv multiple D * add matrix padder * add gemm padding to convnd * adding group conv * update gemm multi-d * refactor * refactor * refactor * clean * clean * refactor * refactor * reorg * add ds * add bias * clean * add G * adding group * adding group * adding group * update Tensor * clean * update example * update DeviceGemmMultipleD_Xdl_CShuffle * update conv bwd-data and bwd-weight * upate contraction example * update gemm and batch gemm with e permute * fix example build * instance for grouped conv1d * update example * adding group conv instance * update gemm bilinear instance * update gemm+add+add+fastgelu instance * update profiler * update profiler * update test * update test and client example * clean * add grouped conv into profiler * update profiler * clean * add test grouped conv, update all conv test to gtest * update test
-
- 08 Jul, 2022 1 commit
-
-
Po Yen Chen authored
* format * improving pipeline * fix typo * format * adding thread group * adding thread group * adding thread group * adding gemm pipeline * tweak * refactor * refactor * add missing type convert * refactor * refactor * refactor * clean * fix build * refactor * format * clean up * use remove_cvref_t * clean * use pipeline_v2 for gemm kernel * Remove inconsistent indent * Fix compilation errors due to incomplete merge process * Add missing include directives * Fix compilation errors in currently unused files * Add license in newly added files * Re-format touched files by clang-format-10 * Fix wrong template argument count of DeviceGemm<> * Use language construct to choose between types * Use language construct to choose GEMM example instance * Fix compilation error due to interface change * Re-use type alias to avoid duplication * Unify type alias usage in source file * Only use v2 pipeline in one gridwise GEMM type * Remove no-longer used include directives * Add static_assert() to check pipeline type requirements * Revert "Add static_assert() to check pipeline type requirements" This reverts commit f0985f0a132671a1caaea92810c9f30dcf062bde. * clean * clean * clean * clean Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
shaojiewang <wsjmessi@163.com>
-
- 25 Jun, 2022 2 commits
-
-
Chao Liu authored
-
Chao Liu authored
* ad gelu and fast_gelu * added GeLU and fast GeLU * clean up * add gemm+fastgelu example * add gemm+gelu instances * update profiler * clean up * clean up * adding gemm+bias+activation * clean * adding bias * clean * adding gemm multiple d * debugging * add gemm bias add fastgelu * rename, clean * refactoring; add readme * refactor * refactor * refactor * refactor * refactor * refactor * fix * fix * update example * update example * rename * update example * add ckProfiler * clean * clean * clean * clean * add client app example * update readme * delete obselete files * remove old client app * delete old file * cleaning * clean * remove half * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path for all examples * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * revert client app example * clean build * fix build * temporary disable client test on Jenkins * clean * clean * clean
-
- 19 Jun, 2022 1 commit
-
-
Chao Liu authored
* ad gelu and fast_gelu * added GeLU and fast GeLU * clean up * add gemm+fastgelu example * add gemm+gelu instances * update profiler * clean up * clean up * adding gemm+bias+activation * clean * adding bias * clean * adding gemm multiple d * debugging * add gemm bias add fastgelu * rename, clean * refactoring; add readme * refactor * refactor * refactor * refactor * refactor * refactor * fix * fix * update example * update example * rename * update example * add ckProfiler * clean * clean * clean * clean * add comment * use type_convert * clean * clean element wise op
-
- 31 May, 2022 1 commit
-
-
Chao Liu authored
* fix example * update IsSupportedArgument * fix * disable fp64 conv example as test
-
- 26 May, 2022 1 commit
-
-
ltqin authored
* add intrin_mfma_f64_16x16x4f64 * add example * gemm reference add double data type * chang init data * fix M N PerXdlops * fix ifdef * add comparsion config * add conv fwd example * format log out * change rc matrix egister layout * reorganize example * reorganize example 2 * format,because merge develop * fix call impl adding acc data type * lost ; * add compiler warning * change example tunning parameters * add test for fp64 * add instance * add test/gemm/gemm_fp64.cpp * fix get name issue * remove some tunning parameter * fix conflict * format * use integer value for GEMM test * add acc data type * remove typeid because fp16 * fix streamconfig etc bug from merging develop * format * remove test_gemm_xdl_fp64 * add AccDataType * AccDataType problem Co-authored-by:
qinletao <letaoqin@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
- 13 May, 2022 1 commit
-
-
Anthony Chang authored
* validate examples in ctest runs * format * fix usage of check_err * amend * add example codes to custom target 'check' Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 12 May, 2022 1 commit
-
-
JD authored
* Add host API * manually rebase on develop * clean * manually rebase on develop * exclude tests from all target * address review comments * update client app name * fix missing lib name * clang-format update * refactor * refactor * refactor * refactor * refactor * fix test issue * refactor * refactor * refactor * upate cmake and readme Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 09 May, 2022 1 commit
-
-
Chao Liu authored
* format * improving pipeline * fix typo * format * adding thread group * adding thread group * adding thread group * adding gemm pipeline * tweak * refactor * refactor * add missing type convert * refactor * refactor * refactor * clean * fix build * refactor * format * clean up * use remove_cvref_t * clean * clean up * clean up * clean up
-
- 05 Apr, 2022 1 commit
-
-
Adam Osewski authored
* Convolution ND * Code unification across dimensions for generating tensor descriptors. * Example * Instances * Move convnd f32 instance file to comply with repo structure. * Conv 1D tensor layouts. * Formatting and use ReferenceConv * Reference ConvFwd supporting 1D and 2D convolution. * Debug printing TensorLayout name. * Conv fwd 1D instance f32 * Refactor conv ND example. Needed to support various conv dimensio. Needed to support various conv dimensions * Rename conv nd example director to prevent conflicts. * Refactor some common utility to single file. Plus some tests. * Refactor GetHostTensorDescriptor + UT. * Add 1D test case. * Test reference convolution 1d/2d * Remove some leftovers. * Fix convolution example error for 1D * Refactor test check errors utility function. * Test Conv2D Fwd XDL * More UT for 1D case. * Parameterize input & weight initializers. * Rename example to prevent conflicts. * Split convnd instance into separate files for 1d/2d * Address review comments. * Fix data type for flops/gbytes calculations. * Assign example number 11. * 3D cases for convolution utility functions. * 3D reference convolution. * Add support for 3D convolution. * Check for inputs bigger than 2GB. * Formatting * Support for bf16/f16/f32/i8 - conv instances + UT. * Use check_err from test_util.hpp. * Split convnd test into separate files for each dim. * Fix data generation and use proper instances. * Formatting * Skip tensor initialization if not necessary. * Fix CMakefiles. * Remove redundant conv2d_fwd test. * Lower problem size for conv3D UT. * 3D case for convnd example. * Remove leftovers after merge. * Add Conv Specialization string to GetTypeString * Skip instance causing numerical errors. * Small fixes. * Remove redundant includes. * Fix namespace name error. * Script for automatic testing and logging convolution fwd UTs * Comment out numactl cmd. * Refine weights initalization and relax rtol for fp16 * Move test_util.hpp to check_err.hpp * Refine weights initalization and relax rtol for fp16 * Refactor common part of test conv utils. * Move utility function to single common place. * Add additional common functions to utility. * Refactor convnd_fwd_xdl examples. * Remove redundant files. * Unify structure. * Add constructor to ConvParams. * And add input parameters validation. * Modify conv examples to use single utility file. * Remove check_error from host_tensor.hpp * Get rid of check_indices function. * Remove bf16_to_f32 function overload for scalars. * Fix namespace. * Add half_float::half for check_err. * Fix conv params size in UT. * Fix weights initialization for int8. * Fix weights initialization for int8. * Add type_convert when store output in ref conv 1D. * Get back old conv2d_fwd_xdl operation. * Silence conv debug print. * format * clean * clean * Fix merge. * Fix namespace for check_err * Formatting. * Fix merge artifacts. * Remove deleted header. * Fix some includes and use ck::utils::check_err. * Remove unused check_indices restored by previous merge. * Fix namespaces after merge. * Fix compilation error. * Small fixes. * Use common functions. * Fix filename * Fix namespaces. * Fix merge artifact - retrieve removed by accident fun. * Fix ConvForwardSpecialization. * Adhere to coding style rules. * Fix merge artifacts. Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
- 31 Mar, 2022 2 commits
-
-
Anthony Chang authored
* retune & add conflict-free bf16/fp16 c-shuffle gemm instances amend wrong K1 value in some fp16/bf16 kernel instances * make gemm cshuffle's timing behavior consistent with all other functions * clang-format * retune & add conflict-free fp32 c-shuffle gemm instances * retune & add conflict-free int8 c-shuffle gemm instances * update the underlying gridwise gemm of all c-shuffle gemm kernels * typo
-
Chao Liu authored
* adding compilation for multiple targets * fix build * clean * update Jekinsfile * update readme * update Jenkins * use ck::half_t instead of ushort for bf16 * rename enum classes * clean * rename * clean
-
- 30 Mar, 2022 1 commit
-
-
Jianfeng Yan authored
* adding batched_gemm_and_reduction * batched_gemm_reduce works with bactch_count=1 * fix a bug in grid_size; batched_gemm_reduce works for batch_count > 1 * adding profiler for batched_gemm_fp16 * fixed a bug in declaration of d1 and d0; both example and profiler work * clang-format * cleanup * batched_gemm_reduce: add test * minor change * fixed some typo in function names
-
- 24 Mar, 2022 1 commit
-
-
Chao Liu authored
* add gridwise gemm v4r1 * rename * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * use sfc in shuffling * remove hardcode * remove hardcode * refactor * fix build * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * format * clean * adding gemm+reduce * adding profiler for gemm+reduce * adding gemm+reduce profiler * fix build * clean up * gemm+reduce * fix build * update DeviceGemm_Xdl_CShuffle; update enum to enum class * clean up * add test for gemm+reduce * clean up * refactor * fix build * fix build
-
- 09 Mar, 2022 1 commit
-
-
Chao Liu authored
* delete obselete files * move files * build * update cmake * update cmake * fix build * reorg examples * update cmake for example and test
-
- 05 Mar, 2022 1 commit
-
-
rocking5566 authored
* Add int8 of mk_nk_mn to the ckProfiler * Add example of int8 gemm * Fix typo, use ushort instead of half_t for bfloat16 * replace ushortXXX_t to bhalfXXX_t * rename ushort to bhalf_t * Add bf16 example * Add bf16 gemm to ckProfiler * Fix alignment * Fix typo * Add unit test for gemm_xdl int8 * Add gemm_xdl fp32 unit test * Add gemm_xdl bf16 unit test * fix build * fix build issue due to merge conflict * Fix build * Fix build error * [What] gemm + relu inference [How] gemm + requant + relu + requant + clamp * clean Co-authored-by:
rocking <chunylai@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
- 04 Mar, 2022 1 commit
-
-
Jianfeng Yan authored
* add space_filling_curve * cleanup and move space_filling_curve into test * WIP: start refactoring threadwise_transfer_v1r3 * threadwise_copy works but needs further refactoring * add some comments * add SpaceFillingCurve::GetIndices() * minor changes * removed GetIndices; refactored GetDstCoordinateResetStep * add DynamicBuffer::Transfer, but Add is not tested * rebased agaist develop * threadwise_copy_v6r1/v6r2/v6r3 using space-filling curve start to work * minor changes * refactored threadcopy v3r1, v2; removed old implementations * clang-format * cleanup * fix a typo in v6r3 * format Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 28 Feb, 2022 1 commit
-
-
Anthony Chang authored
* add gitignore * host tensor: allow generating sequentially increasing value in a given dimension * gridwise gemm v3r1: allow distinct K0/K1 values for A/B block descriptor - remove dangling header include - modify example gemm_xdl accordingly - infer KPack value from M/NPerXdl - device conv2d fwd: update parameters accordingly for the underlying gridwise gemm v3r1 (API for conv2d fwd stays the same for now until we decide to expose individual K0s for activation and weight) * add LDS data dump utility * profiler: reflect API change for distinct K0/K1 for A/B matrices * profiler: add conflict-free LDS write FP16 kernel instances * fix accidental perf regression * address feedback; cosmetic changes * clang-format for new files * format Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 23 Feb, 2022 2 commits
-
-
Chao Liu authored
* clean up * add mutilple thread scratch to ThreadwiseTensorSliceTransfer_v3r1 * add 2 stage prefetch * add more sanity check into transform_tensor_descriptor * tweak * enabling 2 stage prefetch to exsiting gridwise gemm; tweak * enabling 2 stage prefetch to exsiting gridwise gemm * move gridwise gemm pipeline in class; clean up * add some irregular tile size * update CalculateHasMainK0BlockLoop for multi-stage-prefetch * refactor gridwise gemm pipeline class
-
Jianfeng Yan authored
* conv3d compiles but has memory error * conv3d works * fix performance issue by using __builtin_amdgc_readfirstlane * change MakeBlock2CTileMap to MakeDefaultBlock2CTileMap; change c_blockid_to* to cblockid_to* * clang-format * remove CK_EXPERIMENTAL_PASS_TENSOR_DECRIPTOR_BY_*; moved wrapper into DeviceConv3d * format * remove useless marc * add comment Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 07 Feb, 2022 1 commit
-
-
Chao Liu authored
* tweak conv for odd C * update script * clean up elementwise op * fix build * clean up * added example for gemm+bias+relu+add * added example for gemm+bias+relu * add profiler for gemm_s_shuffle; re-org files * add profiler * fix build * clean up * clean up * clean up * fix build
-
- 21 Jan, 2022 1 commit
-
-
rocking5566 authored
* [What] 1. Add DeviceGemmXdl_C_Shuffle 2. Revise example of gemm_xdl [Why] Prepare to add shuffle version of D = alpha * (A * B) + beta * C [How] Imitate DeviceGemmXdl and device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk.hpp
-
- 18 Jan, 2022 1 commit
-
-
Chao Liu authored
* fix build issue
-
- 26 Dec, 2021 1 commit
-
-
Chao Liu authored
* fix relu * clean up * clean up * adding 1x1 conv * adding 1x1 conv * added 1x1 conv * refactor * refactor * refactor * added profiler for conv+bias+relu+add * clean up * adding conv+bias+relu * adding conv+bias+relu * added conv+bias+relu * Update README.md * update cpu verification * adding c shuffle * update static_tensor for dealing with invalid element * adding c shuffle * debugging * fix bug * convert to fp16 before shuffle * shuffle more than one M/NRepeat * clean up * remove coordinate step hack from GridwiseGemm_k0mk1_k0nk1_mn_xdlops_v3r1 * clean up * remove coordinate step hack from all gridwise gemm xdl * clean up coordinate step hack * clean up coordinate step hack * ThreadwiseTensorSliceTransfer_v3r2 support pointwise op on both src and dst * adding output shuffle in conv+bias+relu+add * update * added conv+bias+relu+add with c shuffle * added conv+bias+relu+add with c shuffle * fix forward_sweep bugs in threadwise copy * clean up * refactor * clean up * clean up * added conv_c_shuffle+bias_relu * clean up * added conv+bias+relu+atomic_add * clean up * clean up * clean up * clean up * clean up * clean up * misc fixes; add 1x1 specialization * clean up * delete unused device op * clean up * add support for odd C value
-
- 04 Dec, 2021 1 commit
-
-
Chao Liu authored
* fix relu * clean up * clean up
-
- 03 Dec, 2021 1 commit
-
-
Chao Liu authored
* gemm+activation * move C pointwise operation into threadwise copy * add pointwise operation to A/B matrix * update ckProfiler * adding bias add * adding bias add * adding bias add * added bias add; worked around compiler issues * clean up * clean up * Update README.md * Update README.md * Update README.md * clean up * add conv_xdl example * adding conv_xdl_bias_relu_add example * add conv+bias+relu+add, but has register spill issue * tweak * tweak * refactor * Update README.md update readme for example/2_gemm_xdl_bias_relu_add * clean up * Update README.md update readme for example/3_conv_xdl * Update README.md
-
- 18 Nov, 2021 1 commit
-
-
Chao Liu authored
* reworking vector_type * use __builtin_memcpy for bit_cast and vector access of scalar pointer * clean up
-
- 14 Nov, 2021 1 commit
-
-
Chao Liu authored
* add DeviceGemmXdl * update script * fix naming issue * fix comment * output HostTensorDescriptor * rename * padded GEMM for fwd v4r4r4 nhwc * refactor * refactor * refactor * adding ckProfiler * adding ckProfiler * refactor * fix tuning parameter bug * add more gemm instances * add more fp16 GEMM instances * fix profiler driver * fix bug in tuning parameter * add fp32 gemm instances * small fix * refactor * rename * refactor gemm profiler; adding DeviceConv and conv profiler * refactor * fix * add conv profiler * refactor * adding more GEMM and Conv instance * Create README.md Add build instruction for ckProfiler * Create README.md Add Readme for gemm_xdl example * Update README.md Remove build instruction from top most folder * Update README.md * clean up
-