- 09 May, 2022 2 commits
-
-
myamlak authored
* Turning compare warnings on * Cleaning part I * Cleaning part II * Explicit static_cast to ck::type_convert * Resolving large tensor size issue. * format * revert change to tensor descriptor; promote lementSpaceSize to 64bit * use integer value for GEMM test * Review remarks * Review remarks + issues with (un)signed arithmetic * Format fix * Format * Clang-format. * fix 2gb limit issue Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
Adam Osewski <aosewski@amd.com>
-
Chao Liu authored
* format * improving pipeline * fix typo * format * adding thread group * adding thread group * adding thread group * adding gemm pipeline * tweak * refactor * refactor * add missing type convert * refactor * refactor * refactor * clean * fix build * refactor * format * clean up * use remove_cvref_t * clean * clean up * clean up * clean up
-
- 30 Apr, 2022 2 commits
-
-
Adam Osewski authored
* Use googletest for tests. Add conv2d_fwd UT. * Add conv1D/3D to gtest UT. * Fix: not duplicate test with CTest. * Convert more tests to googltests. * Fix: GIT_SHALLOW is not allowed for git commit hash. * Clang-format * use integer value for GEMM test Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
Chao Liu <lc.roy86@gmail.com>
-
Chao Liu authored
-
- 29 Apr, 2022 1 commit
-
-
Anthony Chang authored
* pass by ref to avoid throwing away initialization results * EOL CRLF -> LF
-
- 21 Apr, 2022 2 commits
-
-
Adam Osewski authored
* Convolution ND * Code unification across dimensions for generating tensor descriptors. * Example * Instances * Move convnd f32 instance file to comply with repo structure. * Conv 1D tensor layouts. * Formatting and use ReferenceConv * Reference ConvFwd supporting 1D and 2D convolution. * Debug printing TensorLayout name. * Conv fwd 1D instance f32 * Refactor conv ND example. Needed to support various conv dimensio. Needed to support various conv dimensions * Rename conv nd example director to prevent conflicts. * Refactor some common utility to single file. Plus some tests. * Refactor GetHostTensorDescriptor + UT. * Add 1D test case. * Test reference convolution 1d/2d * Remove some leftovers. * Fix convolution example error for 1D * Refactor test check errors utility function. * Test Conv2D Fwd XDL * More UT for 1D case. * Parameterize input & weight initializers. * Rename example to prevent conflicts. * Split convnd instance into separate files for 1d/2d * Address review comments. * Fix data type for flops/gbytes calculations. * Assign example number 11. * 3D cases for convolution utility functions. * 3D reference convolution. * Add support for 3D convolution. * Check for inputs bigger than 2GB. * Formatting * Support for bf16/f16/f32/i8 - conv instances + UT. * Use check_err from test_util.hpp. * Split convnd test into separate files for each dim. * Fix data generation and use proper instances. * Formatting * Skip tensor initialization if not necessary. * Fix CMakefiles. * Remove redundant conv2d_fwd test. * Lower problem size for conv3D UT. * 3D case for convnd example. * Remove leftovers after merge. * Add Conv Specialization string to GetTypeString * Skip instance causing numerical errors. * Small fixes. * Remove redundant includes. * Fix namespace name error. * Script for automatic testing and logging convolution fwd UTs * Comment out numactl cmd. * Refine weights initalization and relax rtol for fp16 * Move test_util.hpp to check_err.hpp * Refine weights initalization and relax rtol for fp16 * Refactor common part of test conv utils. * Move utility function to single common place. * Add additional common functions to utility. * Refactor convnd_fwd_xdl examples. * Remove redundant files. * Unify structure. * Add constructor to ConvParams. * And add input parameters validation. * Modify conv examples to use single utility file. * Remove check_error from host_tensor.hpp * Get rid of check_indices function. * Remove bf16_to_f32 function overload for scalars. * Fix namespace. * Add half_float::half for check_err. * Fix conv params size in UT. * Fix weights initialization for int8. * Fix weights initialization for int8. * Add type_convert when store output in ref conv 1D. * Get back old conv2d_fwd_xdl operation. * Silence conv debug print. * format * clean * clean * Fix merge. * Fix namespace for check_err * Formatting. * Fix merge artifacts. * Remove deleted header. * Fix some includes and use ck::utils::check_err. * Remove unused check_indices restored by previous merge. * Fix namespaces after merge. * Fix compilation error. * Small fixes. * Use common functions. * Fix filename * Fix namespaces. * Fix merge artifact - retrieve removed by accident fun. * Fix ConvForwardSpecialization. * Working example of OpInstanceRunEngine for conv2dfwd UT. * Adhere to coding style rules. * Formatting and adhere to coding style rules. * Fix merge artifacts. * Utility for collecting conv fwd instances. + Plus commmon part for parsing cmdline params. * Refactor FillUniform because of segfault for int8_t. * Naming convention. * Elegant version of device mem allocation. * Use OpInstanceRunEngine in conv fwd nd tests. * Multiple refinements. * conditional init * don't run reference op if not provided. * Use OpInstanceRunEngine for ckProfiler conv_fwd * Refactor common tensor fill function to separate file. * Clean up unused functions. * Support different init methods. * Create CMake target for conv_fwd_util. * Add header for profile_convnd_fwd.cpp * Fix CMakefiles to link with conv_fwd_util where needed. * Fix some clutter. Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
Qianfeng authored
* Add math functions for host * Change to host reduction to use ck::math: * Remove the using of half_float::half and half.hpp from reduction example/profiler/ctest
-
- 05 Apr, 2022 2 commits
-
-
Adam Osewski authored
* Convolution ND * Code unification across dimensions for generating tensor descriptors. * Example * Instances * Move convnd f32 instance file to comply with repo structure. * Conv 1D tensor layouts. * Formatting and use ReferenceConv * Reference ConvFwd supporting 1D and 2D convolution. * Debug printing TensorLayout name. * Conv fwd 1D instance f32 * Refactor conv ND example. Needed to support various conv dimensio. Needed to support various conv dimensions * Rename conv nd example director to prevent conflicts. * Refactor some common utility to single file. Plus some tests. * Refactor GetHostTensorDescriptor + UT. * Add 1D test case. * Test reference convolution 1d/2d * Remove some leftovers. * Fix convolution example error for 1D * Refactor test check errors utility function. * Test Conv2D Fwd XDL * More UT for 1D case. * Parameterize input & weight initializers. * Rename example to prevent conflicts. * Split convnd instance into separate files for 1d/2d * Address review comments. * Fix data type for flops/gbytes calculations. * Assign example number 11. * 3D cases for convolution utility functions. * 3D reference convolution. * Add support for 3D convolution. * Check for inputs bigger than 2GB. * Formatting * Support for bf16/f16/f32/i8 - conv instances + UT. * Use check_err from test_util.hpp. * Split convnd test into separate files for each dim. * Fix data generation and use proper instances. * Formatting * Skip tensor initialization if not necessary. * Fix CMakefiles. * Remove redundant conv2d_fwd test. * Lower problem size for conv3D UT. * 3D case for convnd example. * Remove leftovers after merge. * Add Conv Specialization string to GetTypeString * Skip instance causing numerical errors. * Small fixes. * Remove redundant includes. * Fix namespace name error. * Script for automatic testing and logging convolution fwd UTs * Comment out numactl cmd. * Refine weights initalization and relax rtol for fp16 * Move test_util.hpp to check_err.hpp * Refine weights initalization and relax rtol for fp16 * Refactor common part of test conv utils. * Move utility function to single common place. * Add additional common functions to utility. * Refactor convnd_fwd_xdl examples. * Remove redundant files. * Unify structure. * Add constructor to ConvParams. * And add input parameters validation. * Modify conv examples to use single utility file. * Remove check_error from host_tensor.hpp * Get rid of check_indices function. * Remove bf16_to_f32 function overload for scalars. * Fix namespace. * Add half_float::half for check_err. * Fix conv params size in UT. * Fix weights initialization for int8. * Fix weights initialization for int8. * Add type_convert when store output in ref conv 1D. * Get back old conv2d_fwd_xdl operation. * Silence conv debug print. * format * clean * clean * Fix merge. * Fix namespace for check_err * Formatting. * Fix merge artifacts. * Remove deleted header. * Fix some includes and use ck::utils::check_err. * Remove unused check_indices restored by previous merge. * Fix namespaces after merge. * Fix compilation error. * Small fixes. * Use common functions. * Fix filename * Fix namespaces. * Fix merge artifact - retrieve removed by accident fun. * Fix ConvForwardSpecialization. * Adhere to coding style rules. * Fix merge artifacts. Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
ltqin authored
* change backward weight name * start add bwd weight lib and profiler * change tuning paramter * change output info * add bwd weight test * change test info * using conv_util * change wgt to weight * add } * add fp32
-
- 31 Mar, 2022 4 commits
-
-
Chao Liu authored
* adding compilation for multiple targets * fix build * clean * update Jekinsfile * update readme * update Jenkins * use ck::half_t instead of ushort for bf16 * rename enum classes * clean * rename * clean
-
Anthony Chang authored
-
Jianfeng Yan authored
-
Adam Osewski authored
Co-authored-by:Adam Osewski <aosewski@amd.com>
-
- 30 Mar, 2022 1 commit
-
-
Jianfeng Yan authored
* adding batched_gemm_and_reduction * batched_gemm_reduce works with bactch_count=1 * fix a bug in grid_size; batched_gemm_reduce works for batch_count > 1 * adding profiler for batched_gemm_fp16 * fixed a bug in declaration of d1 and d0; both example and profiler work * clang-format * cleanup * batched_gemm_reduce: add test * minor change * fixed some typo in function names
-
- 29 Mar, 2022 1 commit
-
-
ltqin authored
* start convnd bwd data * add 3d laoyout name * add conv1d reference * add con3d reference * finished example client code * conv1d kernel finished * fix input error * add conv3d * add 3d layout in conv_utils.hpp * fix sepecial check * addconvnd lib * add test for bwd data * finished test * add check slice length * convnd bwd data start * profiler can be compiled * fix some bug * set input to zero * modify readme for example * fix test_convnd_bwd_data bug * test_convnd_bwd_data parameter desc * workaround for 1d * workaroud for 2d * change init value * workaround for 3d int8 * fix init value bug * remove workaround * fix acc data type * add int32 * change select function to template * tilda to tilde * remove int32 instance * fix commit for device hpp * fix comments for profiler * using profile imp to test * add pass verification * fix conv2d reference * fix conflict * remove double batched_gemm * fix exampel conv2d data and test convnd * format * change conv2d_bwd_data return value * remove repeat = 1 * remove conv bwd data Co-authored-by:
ltqin <letaoqin@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
- 28 Mar, 2022 1 commit
-
-
zjing14 authored
* fixed test: return res; rand gemm shapes * fixed return
-
- 25 Mar, 2022 1 commit
-
-
Jianfeng Yan authored
* ctest of batched_gemm returns 0 or 1 * minor change
-
- 24 Mar, 2022 2 commits
-
-
rocking5566 authored
* Add return value * Replace _Float16 to ck::half_t * A test should return 0 if success and return non-zero if fail
-
Chao Liu authored
* add gridwise gemm v4r1 * rename * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * use sfc in shuffling * remove hardcode * remove hardcode * refactor * fix build * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * format * clean * adding gemm+reduce * adding profiler for gemm+reduce * adding gemm+reduce profiler * fix build * clean up * gemm+reduce * fix build * update DeviceGemm_Xdl_CShuffle; update enum to enum class * clean up * add test for gemm+reduce * clean up * refactor * fix build * fix build
-
- 23 Mar, 2022 2 commits
-
-
Adam Osewski authored
* Convolution ND * Code unification across dimensions for generating tensor descriptors. * Example * Instances * Move convnd f32 instance file to comply with repo structure. * Conv 1D tensor layouts. * Formatting and use ReferenceConv * Reference ConvFwd supporting 1D and 2D convolution. * Debug printing TensorLayout name. * Conv fwd 1D instance f32 * Refactor conv ND example. Needed to support various conv dimensio. Needed to support various conv dimensions * Rename conv nd example director to prevent conflicts. * Refactor some common utility to single file. Plus some tests. * Refactor GetHostTensorDescriptor + UT. * Add 1D test case. * Test reference convolution 1d/2d * Remove some leftovers. * Fix convolution example error for 1D * Refactor test check errors utility function. * Test Conv2D Fwd XDL * More UT for 1D case. * Parameterize input & weight initializers. * Rename example to prevent conflicts. * Split convnd instance into separate files for 1d/2d * Address review comments. * Fix data type for flops/gbytes calculations. * Assign example number 11. * 3D cases for convolution utility functions. * 3D reference convolution. * Add support for 3D convolution. * Check for inputs bigger than 2GB. * Formatting * Support for bf16/f16/f32/i8 - conv instances + UT. * Use check_err from test_util.hpp. * Split convnd test into separate files for each dim. * Fix data generation and use proper instances. * Formatting * Skip tensor initialization if not necessary. * Fix CMakefiles. * Remove redundant conv2d_fwd test. * Lower problem size for conv3D UT. * 3D case for convnd example. * Remove leftovers after merge. * Add Conv Specialization string to GetTypeString * Skip instance causing numerical errors. * Small fixes. * Remove redundant includes. * Fix namespace name error. * Script for automatic testing and logging convolution fwd UTs * Comment out numactl cmd. * Refine weights initalization and relax rtol for fp16 * Fix weights initialization for int8. * Add type_convert when store output in ref conv 1D. * Get back old conv2d_fwd_xdl operation. * Silence conv debug print. * format * clean * clean * Fix merge. * Fix namespace for check_err Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
Chao Liu authored
-
- 22 Mar, 2022 2 commits
-
-
zjing14 authored
* init of grouped_gemm * 2 gemm test * perf test * clean * wrap desc into a struct * test cast static_arr to pointer * add ptr to GemmDesc * add grouped gemm profiler * fixed mem issue with unique_ptr * clean * clean * finished ckprofiler * Update README.md * readme * fixed readme * add example * improve code * fixed comments: reserve, seperate ptr and gemm_shapes * merge group and non-group * fixed comments: replace push_back with emplace_back to avoid copy constructor * fixed comments: unified blk2ctile; add test * ci fix * fixed ci * fixed ci * fixed ci
-
Qianfeng authored
* Use thread cluster descriptor and explicit M_K 2d descriptor to simply Blockwise Reduction * Change by replacing ReduceDims by NumReduceDims as Device Reduce interface template parameter * Rename the folder name for the pool2d and reduce examples * Update to reduction test scripts * Add Readme for pool2d_fwd and reduce_blockwise examples * Add support for int8_t reduction (ADD/AVG, MIN/MAX/AMAX) * Tiny fix in reduce profiler and tiny update in reduce testing scripts * Tiny fix in testing script profile_reduce_no_index.sh * Tiny fix in testing script profile_reduce_no_index.sh * Add support for bfp16 reduction (using bhalf_t = ushort) * Tiny fix in amd_buffer_addressing.hpp * Tiny change in script/profile_reduce_with_index.sh * Use AccDataType for Beta value and use element_wise::PassThrough * Use type_convert for type converting in host layer reduction * Renaming and refining in Reduction profiler/device layer/examples * Renaming and refining in Reduction profiler/device layer/examples * Renaming all NumReduceDims to NumReduceDim * Fix the leaked type_convert in ThreadwiseTensorSliceTransfer_v2 * Update to testing scripts to add bf16 support * added more static_assert * Remove buggy tunable configurations defined in device_reduce_instance_xxx.hpp * Add static_assert to give compile-time warning for incorrect thread slice-size/vector-size configurations * minor change * Refine and fix (in GetWorkspaceSizeInBytes of MultiBlockPartialReduce) to make int8 completely pass * Tiny renaming in gridwise_2d_reduction_multiblock_partial_reduce.hpp * Tiny fix in script/profile_reduce_no_index.sh * Refine in DeviceReduce layer with regard to using NumInvariantDim/NumReduceDim or InvariantDims/ReduceDims * Generic renaming in host reduction and DeviceReduce layer * Add support for 4-d all dimension reduction in the profiler and add_device_reduce_xxx instances * Use multi-thread and simplification for host Reduction implementation * Add ctest for reduction * Update to clarify the using of data init method in produce_reduce/example_reduce/test_reduce/ * Update to the reduce CTest executables to enable default testing behavior when no command argument * Renaming Co-authored-by:Jianfeng yan <jfyan008@gmail.com>
-
- 21 Mar, 2022 3 commits
-
-
Jianfeng Yan authored
changed long_index_t to index_t when computing memory offset uncomment other ops in profiler added test for batched_gemm
-
rocking5566 authored
* [What] Separate fixpoint gemm from gemm example [Why] let example of gemm_int8 be pure gemm. [What] 1. Add gemm_requant_relu_requant, 2. Let CDataType be int32 in pure gemm, because no one use int8 CDataType. It is also part of gemm_requant_relu_requant * Fix path * Revise cmakelist due to merge develop * Add gemm fp16 test * Extract PrepareGemmTensor * Extract TestGemm * Add test for different layout * Add 4 layouts of shuffle version of fp32 * Add 4 layouts of shuffle version of int8 * Add 4 layouts of shuffle version of bf16 * replace all DeviceGemmPtr_ with DeviceGemmNoOpPtr to fit naming convension * Add test for non-shuffle verstion of gemm * Fix typo * Print kernel information * Add rest of the fp32 kernel to the test * 1. Add rest of the fp16 device iop. 2. Mark the invalid device operation Co-authored-by:rocking <chunylai@amd.com>
-
ltqin authored
* fix bwd data filter1strid2 bug * fichangeshort to ck::bhalf_t * reset input to zero Co-authored-by:ltqin <letaoqin@amd.com>
-
- 11 Mar, 2022 1 commit
-
-
Jianfeng Yan authored
* fixed a corner case in GetCoordinateResetStep * clean * rename num_accesses to num_access Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 09 Mar, 2022 1 commit
-
-
Chao Liu authored
* delete obselete files * move files * build * update cmake * update cmake * fix build * reorg examples * update cmake for example and test
-
- 05 Mar, 2022 1 commit
-
-
Chao Liu authored
* fix tests * remove useless file * fix test build * reduce parallelism when compiling * fix test
-
- 04 Mar, 2022 3 commits
-
-
rocking5566 authored
* Add int8 of mk_nk_mn to the ckProfiler * Add example of int8 gemm * Fix typo, use ushort instead of half_t for bfloat16 * replace ushortXXX_t to bhalfXXX_t * rename ushort to bhalf_t * Add bf16 example * Add bf16 gemm to ckProfiler * Fix alignment * Fix typo * Add unit test for gemm_xdl int8 * Add gemm_xdl fp32 unit test * Add gemm_xdl bf16 unit test * fix build * fix build issue due to merge conflict * Fix build * Fix build error Co-authored-by:
rocking <chunylai@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
Jianfeng Yan authored
* add space_filling_curve * cleanup and move space_filling_curve into test * WIP: start refactoring threadwise_transfer_v1r3 * threadwise_copy works but needs further refactoring * add some comments * add SpaceFillingCurve::GetIndices() * minor changes * removed GetIndices; refactored GetDstCoordinateResetStep * add DynamicBuffer::Transfer, but Add is not tested * rebased agaist develop * threadwise_copy_v6r1/v6r2/v6r3 using space-filling curve start to work * minor changes * refactored threadcopy v3r1, v2; removed old implementations * clang-format * cleanup * fix a typo in v6r3 * format Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
ltqin authored
* start conv2d bwd api * kernel running * add bwd reference * change to no shuffle * fix bwd reference * pass verification * add Filter1x1Stride1Pad0 and start testing * change some tuning parameter * fix test error * add fp16 tuning parameter * add bf16 tuning parameter * add int8 tuning parameters * change fp32 tuning parameter * add bwd to profiler * fix bug for bwd profiler * fix ckProfiler bug * change conv2d_bwd_xdl to fp16 * fix bug in comments * fix precompile id * fix enum conv name * chage _bwd_ to _bwd_data_ * change conv2d_bwd example id * bwd to bwd data * fix prehead * fix MakeDefaultBlock2CTileMap ,import form merge develop * format bwd instance * bwd to bwd data * change name bwd to bwd data * change name bwd to bwd data in example * formate code * change conv2d bwd data id in example * rewrite readme for example * fix CalculateMagicNumbers about div zero * add workaround CK_WORKAROUND_SWDEV_325164 * change test_conf2d_bwd_data show info * format * fix bug for workaround:CK_WORKAROUND_SWDEV_325164 * formate tuning parameters * formate tuning parameters again * formate tuning parameters 3 * formate tuning parameters 4 * remove add function template * format * update comment Co-authored-by:
ltqin <letaoqin@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
- 03 Mar, 2022 1 commit
-
-
JD authored
* add docker file and make default target buildable * add Jenkinsfile * remove empty env block * fix package stage * remove render group from docker run * clean up Jenkins file * add cppcheck as dev dependency * update cmake file * Add profiler build stage * add hip_version config file for reduction operator * correct jenkins var name * Build release instead of debug * Update test CMakeLists.txt reorg test dir add test stage * reduce compile threads to prevent compiler crash * add optional debug stage, update second test * remove old test target * fix tests to return proper results and self review * Fix package name and make test run without args * change Dockerfile to ues rocm4.3.1 * remove parallelism from build * Lower paralellism Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 25 Feb, 2022 1 commit
-
-
Jianfeng Yan authored
* add space_filling_curve * cleanup and move space_filling_curve into test * add functions for backward and forward step; hard coded results in unit test * minor changes
-
- 23 Feb, 2022 2 commits
-
-
Adam Osewski authored
* Convolution ND * Code unification across dimensions for generating tensor descriptors. * Example * Instances * Move convnd f32 instance file to comply with repo structure. * Conv 1D tensor layouts. * Formatting and use ReferenceConv * Reference ConvFwd supporting 1D and 2D convolution. * Debug printing TensorLayout name. * Conv fwd 1D instance f32 * Refactor conv ND example. Needed to support various conv dimensio. Needed to support various conv dimensions * Rename conv nd example director to prevent conflicts. * Refactor some common utility to single file. Plus some tests. * Refactor GetHostTensorDescriptor + UT. * Add 1D test case. * Test reference convolution 1d/2d * Remove some leftovers. * Fix convolution example error for 1D * Refactor test check errors utility function. * Test Conv2D Fwd XDL * More UT for 1D case. * Parameterize input & weight initializers. * Rename example to prevent conflicts. * Split convnd instance into separate files for 1d/2d * Address review comments. * Fix data type for flops/gbytes calculations. * Assign example number 11. Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
Jianfeng Yan authored
* conv3d compiles but has memory error * conv3d works * fix performance issue by using __builtin_amdgc_readfirstlane * change MakeBlock2CTileMap to MakeDefaultBlock2CTileMap; change c_blockid_to* to cblockid_to* * clang-format * remove CK_EXPERIMENTAL_PASS_TENSOR_DECRIPTOR_BY_*; moved wrapper into DeviceConv3d * format * remove useless marc * add comment Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 12 Feb, 2022 1 commit
-
-
ltqin authored
* add fwd bf16 conv * change tunning parametor * add int8 for conv fwd * remove comments * change tunning parametor for int8 * change init int8 example * add test for conv2d fwd * change device operation file pos because merge develop * fwd int8 use reference * test_conv_fwd use reference * add braket for if statement * rename fwd example name * remove StaticBufferOfVectorTypeV2 * tweak example Co-authored-by:
ltqin <letaoqin@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
- 03 Feb, 2022 1 commit
-
-
ltqin authored
* add DeviceGemmSplitKXdl * add file device_gemm_splitk_xdl.hpp * set c matrix zero * using atomic * add all tuning parameter to f32 mkkn * grid size change to 720 * add tunning parameter for NT * add tunning parameter for TN * add tunning parameter for TT * add m=96tunning parameter * add lost config * add element wise operation * fixed MPerBlock=96 * remove marco for slpitk swtich * add test * add new line at the end of device_gemm_xdl_instance.hpp * remove step hack * seperate split-k instance files * add tunning parameters * change disired grid size to parameters * remove slice length * add desiredgridsize parameter to ckProfiler * add losting file device_gemm_xdl_splitk_instance.hpp * change desired gride size to kbatch * format * format * clean up * add selection of device_instances * clean code * fix build issue Co-authored-by:
ltqin <letaoqin@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
Jing Zhang <jizhan@amd.com>
-
- 30 Nov, 2021 1 commit
-
-
Chao Liu authored
-