- 21 Sep, 2023 1 commit
-
-
Illia Silin authored
* refactor cmake files for the tests * refactor cmake files for examples * fix cmake for gemm example * fix the cmake file for all examples * add splitting by data types in gemm_splitk instance header * rename test to reflect only dl instances are used * clean up CI workspace, update cmake for instances * change the jenkinsfile syntax * build all instances except DL on gfx11 * move workspace cleanup after stages * clean up workspace after every stage * isolate data types in grouped_conv_fwd header * isolate dl instances for grouped_conv2d_fwd * fix syntax * fix cmake and batchnorm instances * fix typo * fix reduction instances * fix grouped_conv headers * fix syntax * replace parsing logic for instances, replace bfp16 with bf16 * fix the client examples build * clean up DTYPES from instances cmake files * update the parsing logic in cmake files * make an exception for reduction kernels * update few remaining cmake files to handle DTYPES * fix syntax * fix cmake conflicts * replace f8 with fp8 test name * resolve conflicts for dpp instances
-
- 10 Nov, 2022 1 commit
-
-
Lauren Wrubleski authored
* Add packages for example and profiler * correct TEST_NAME -> EXAMPLE_NAME
-
- 19 Sep, 2022 1 commit
-
-
Shaojie WANG authored
* init commit of convnd bwd data * begin compiling example * have a first version that produce a right result * refine device level launch kernel code * add more instances in example and get right results * clang-format * format example file * add more instances * fix instances * adding conv_bwd_data multile_d * adding conv_bwd_data multile_d * adding conv_bwd multiple d * adding conv_bwd multiple d * adding conv_bwd multiple d * refactor * refactor * adding conv bwd data multiple d * adding conv bwd data multiple d * adding conv bwd data multiple d * adding conv bwd data multiple d * adding conv bwd data multiple d * adding conv bwd data multiple d * adding conv bwd data multiple d * refactor * update conv fwd's bias impl * refactor * reorg file * clean up cmake * clean * clean * clean Co-authored-by:
Chao Liu <lc.roy86@gmail.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
- 14 Sep, 2022 1 commit
-
-
ltqin authored
* refactor * start * add device gemm file * add BatchStrideD0 * add stridd0 * add gridwise file * add d0 parameters to gridwise gemm * add c layout transformer * add d0 threadwise copy * init kernel * init kernel * regular code * nm desc put to out * kernel parameter can not use reference * host add bias+gelu * run right for bias+gelu * change AddFastGelu into another file * interface add d1 bias parameters * add d1 parameter to argument * add d1 parameter to gridwise * first all code,not verify * gelu change to relu and GetElementSpaceSize bug * add instance * start add to ckprofiler * ckprofiler finish code * change input parameter for ckProfiler * fix host bias+gelu bug * show help for ckProfiler * fix bug for lunch kernel ignore parametes * add pad and fix about bug * mutiple d0 * add dynamic d0_element_op * change profiler and instance to mutiple d0 * example have 2 d0 * remove some comments not using * change 2 d0 have self parameters * change d element_op name * change class name(multiple_d) * fix bug * fix bug that don't find file * update profiler * refactor * update profiler * clean * revert example change * add gon layout * optimize parameter for gno * add gon to gemm+gemm * change helping input parameters * change to GemmPadder_v2 * using ForEach * fix gb_per_sec Co-authored-by:
Chao Liu <lc.roy86@gmail.com> Co-authored-by:
ltqin <letaoqin@amd.com>
-
- 09 Sep, 2022 1 commit
-
-
carlushuang authored
* add gridwise/device sparse embedding * update code * update code * remove useless makefile * code fix * workable * work properly * emb add * add more instance * format * remove useless code * fix format * fix clang-tidy * clean * fix a compile error Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
Chao Liu <lc.roy86@gmail.com>
-
- 31 Aug, 2022 2 commits
-
-
Po Yen Chen authored
* Refactor the design of DeviceGemmMultipleDMultipleR_Xdl_CShuffle * Add 'DeviceGroupedConvFwdMultipleDMultipleR' interface * Add DeviceGroupedConvFwdMultipleDMultipleR_Xdl_CShuffle * Remove 'GridwiseConvFwdMultipleDMultipleR_xdl_cshuffle' * Add 'TransformConvFwdToGemm<>' utility class (from Chao) * Use 'TransformConvFwdToGemm<>' to shorten code * Fix ill-formed method declaration * Re-implement MakeRGridDescriptor_M() function * Change problem description * Use macro to define layout types * Define K-reduced output tensor layout types * Let user to decide R output tensor layout * Rename variables * Add padding to the reduced output tensor if necessary * Extract common code as helper method * Remove debug message * Add missing include directive * Add partial fp16 Conv + Reduction example * Add example verification code for 2D Conv problem * Use type alias to simplify code * Share code across different-dimension Conv problems * Rename file/functions from run_conv_fwd* to run_convnd_fwd* * Make example code more verbose * Add code to support 1D & 3D Conv + Reduction on host * Add more examples for data type: bf16, fp32 * Add example for int8 * Add custom target to group examples * Use more general custom target name * Change the description in error message * Disable testing for example other than fp32 * Add examplel for int4 (just copy from int8) * Fix wrong data type * Use larger data type for intermediate tensors * Finish int4 example * Undefine macro PP_DEFINE_LAYOUT_TYPE() after use * Use named variables to replace magic numbers * Remove debug messages * Use same A/B data type for host Conv in int4 example * Add check for the 'RLayout' type argument * Group same-dim-layouts together in 'LayoutSetting<>' * Add 'final' specifier to utility classes * Use different initialization method for examples * Remove macro PP_DEFINE_LAYOUT_TYPE() * Fix code-comment mismatch * Use more reasonable initialization value for all data types * Default use init_method=1 for all examples * Remove never-used code * Remove confusing out-of-date comments * clean Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
Chao Liu <lc.roy86@gmail.com>
-
Chao Liu authored
* refactor conv * add conv+conv example, 1x1 only
-
- 23 Aug, 2022 1 commit
-
-
zjing14 authored
* add examples into grouped/batched_gemm * adding splitK examples * fixed splitK * add bfp16 int8 example into splitK * formatting * use static_cast * added common for batched_gemm * add commons for examples of splitK/batched/grouped_gemm * return true * adjust splitK check tol * update example Co-authored-by:Chao Liu <lc.roy86@gmail.com>
-
- 17 Aug, 2022 1 commit
-
-
Chao Liu authored
-
- 15 Aug, 2022 1 commit
-
-
Qianfeng authored
* Implement multiple-reduction in one kernel (kernels, device ops, examples) * Add generic elementwise kernel and device interface * Add generator for normal-distributed data initialization * Add host refer implementation of batchnorm-forward and batchnorm-infer * Add examples for implementing batchnorm-forward and batchnorm-infer using generic kernels * Remove un-needed including in batchnorm example * Renaming generic_elementwise to elementiwise in kernel and device classes/functions * Change in gemm_layernorm examples to use DeviceElementwise instead of Device5AryElementwise * Change in exampe 19_binary_elementwise to use DeviceElementwise instead of DeviceBinaryElementwise * Change in device_cgemm_4gemm_xdl_cshuffle.hpp to use kernel_elementwise instead of kernel_binary_elementwise * Add DeviceElementwiseBase and use it in device_normalize_instance.cpp * Removing and renaming files * Update to synchronize gemm_layernorm client example to the generic element-wise device op API * Update to synchronize with the latest headers directory and HostTensorDescriptor interface renaming * Merge two static member functions in device_elementwise.hpp * Remove unary_elementwise_1d kernel and device
-
- 13 Aug, 2022 3 commits
-
-
Anthony Chang authored
* initial stub for gemm_gemm_xdl_cshuffle * set up example code * compiles * prevent integer overflow * harmonize interface between ref_gemm and ref_batched_gemm * batched_gemm_gemm * fix example * host tensor gen: diagonal pattern in lowest two-dimensions only * make c descriptors containing only integral constants * clean up * add BlockwiseGemmXdlops_v2 while exploring an unified approach * implement proper interface * tidy up example * fix compilation warnings * coarsely controlled 2nd gemm padding * remove rocm-cmake's hard requirement for certain revision * clang-format * resolve merge conflict * fix compilation error on gfx10 * adds acc0 elementwise op to interface * add gemm_gemm instances and tests * avoid LDS data hazard * fix build Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
rocking5566 authored
* Imitate XXX_gemm_multiple_d, add XXX_gemm_multiple_d_multiple_r for gemm + reduction * Implement run of kernel * Add example * Fix parameter of typo * Rewrite the reduceMax example * Rewrite the reduceMean + reduceMeanSquare example * Refine naming * Refine folder name * refine naming * Rewrite the gemm + bias + relu + add + layernorm example * Rewrite the gemm + layernorm example * clang-format * Fix bug if sync lds * Fix compile error
-
Anthony Chang authored
* initial stub for gemm_gemm_xdl_cshuffle * set up example code * compiles * prevent integer overflow * harmonize interface between ref_gemm and ref_batched_gemm * batched_gemm_gemm * fix example * host tensor gen: diagonal pattern in lowest two-dimensions only * make c descriptors containing only integral constants * clean up * add BlockwiseGemmXdlops_v2 while exploring an unified approach * implement proper interface * tidy up example * fix compilation warnings * coarsely controlled 2nd gemm padding * remove rocm-cmake's hard requirement for certain revision * clang-format * resolve merge conflict * fix compilation error on gfx10 * adds acc0 elementwise op to interface * attention host validation * add blockwsie softmax v1 * iteratively update softmax+gemm * transpose both gemm0 and gemm1 xdl output so as to avoid broadcasting softmax max/sum * add init method for easier debugging * do away with manual thread cluster calculation * generalize blockwise softmax interface * row-wise softmax sum & max * format * rename to DeviceBatchedGemmSoftmaxGemm * add gemm_softmax_gemm instances and tests * comment Co-authored-by:
ltqin <letao.qin@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
- 12 Aug, 2022 1 commit
-
-
Rostyslav Geyyer authored
* [LWPCK-359] Initial commit * Working version for fp16, add results to readme * Update according to PR #341 * Update results in readme * Add fp32 example * Add bf16 example * Update fp16 and fp32 examples * Add int8 example * Add separate lengths and strides tensors for D tensors Co-authored-by:Rosty Geyyer <rosty.geyyer@amd.com>
-
- 10 Aug, 2022 1 commit
-
-
zjing14 authored
* convnd_fwd fp16 example * update example * update example * update instance * updating refernce conv * update reference conv * update conv fwd profiler * update conv 1d and 3d instance * update include path * clean * update profiler for conv bwd data and weight * update conv bwd weight * clean * update conv example * update profiler for conv bwd weight * update ckprofiler for conv bwd data * fix reference conv bwd data bug; update conv bwd data test * update examples * fix initialization issue * update test for conv fwd * clean * clean * remove test case too sensitive to error threshhold * fix test * clean * fix build * adding conv multiple d * adding conv multiple D * add matrix padder * add gemm padding to convnd * adding group conv * update gemm multi-d * refactor * refactor * refactor * clean * clean * refactor * refactor * reorg * add ds * add bias * clean * add G * adding group * adding group * adding group * update Tensor * clean * update example * update DeviceGemmMultipleD_Xdl_CShuffle * update conv bwd-data and bwd-weight * upate contraction example * update gemm and batch gemm with e permute * fix example build * instance for grouped conv1d * update example * adding group conv instance * update gemm bilinear instance * update gemm+add+add+fastgelu instance * update profiler * update profiler * update test * update test and client example * clean * add grouped conv into profiler * update profiler * clean * add test grouped conv, update all conv test to gtest * update test * change gemm_c_permute with contraction * add grouped_contraction * add contraction in group_gemm * add example of grouped_gemm with contraction * add example of grouped_contraction_bias_e_permute * clean * fixed ds * add m3n2 m2n3 examples into gemm_bias_e_permute Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 29 Jul, 2022 1 commit
-
-
Chao Liu authored
* convnd_fwd fp16 example * update example * update example * update instance * updating refernce conv * update reference conv * update conv fwd profiler * update conv 1d and 3d instance * update include path * clean * update profiler for conv bwd data and weight * update conv bwd weight * clean * update conv example * update profiler for conv bwd weight * update ckprofiler for conv bwd data * fix reference conv bwd data bug; update conv bwd data test * update examples * fix initialization issue * update test for conv fwd * clean * clean * remove test case too sensitive to error threshhold * fix test * clean * fix build * adding conv multiple d * adding conv multiple D * add matrix padder * add gemm padding to convnd * adding group conv * update gemm multi-d * refactor * refactor * refactor * clean * clean * refactor * refactor * reorg * add ds * add bias * clean * add G * adding group * adding group * adding group * update Tensor * clean * update example * update DeviceGemmMultipleD_Xdl_CShuffle * update conv bwd-data and bwd-weight * upate contraction example * update gemm and batch gemm with e permute * fix example build * instance for grouped conv1d * update example * adding group conv instance * update gemm bilinear instance * update gemm+add+add+fastgelu instance * update profiler * update profiler * update test * update test and client example * clean * add grouped conv into profiler * update profiler * clean * add test grouped conv, update all conv test to gtest * update test
-
- 22 Jul, 2022 1 commit
-
-
zjing14 authored
* add batched_gemm_multiD * add ds * rename file * add batched_gemm_bias example * add batch_strides into bmm_c_permute * clean * rename example_28 to example_29 Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 21 Jul, 2022 1 commit
-
-
zjing14 authored
* replace gridwise_v2r3 with multiD * adjust parameters * add instances * fixed test_grouped_gemm * fix standalone softmax race condition around blockwise reduction * fixed ci * fixed comment: remove redundant workspace * use instanceFactory * add test layout * add empty Ds * add bias example * use array * sperate examples Co-authored-by:Anthony Chang <ac.chang@outlook.com>
-
- 13 Jul, 2022 1 commit
-
-
rocking5566 authored
* Implement layernorm kernel and deviceOp * verify gpu kernel with host code * 1. Separate gamma aand beta from affine 2. Check if argument is valid * clean * Sync the naming * Support sweep once mode if we can put k dimension data inside one block * [What] Get length from upper length. [Why] if we get length directly, we may get length after padding. * We only use one block in K dimension. Hence, we can simplify the indexing of global R/W. * Use 1d descriptor for gamma and beta * Add accElementwiseOp * Extract layernorm host code * Support different YVectorDim in GridwiseLayernorm * Rename XSrcVectorDim to XYSrcVectorDim. Because we use same parameter in deviceOp * Gamma and beta can share the VGPR. * Add test for fp32 and fp16 * Fix bug of concurrency and add test case which may fail orignally * Propagate NaN for layernorm Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 07 Jul, 2022 1 commit
-
-
Chao Liu authored
* adding contraction * add contraction example * update examle * update example * format * update readme * clean header * clean header * contraction with multiple D * rename * fix naming issue; add instances for contraction+bilinear * change assumed virtual layout of contraction; add client example * update example * update * contraction+scale * use type_convert * rename
-
- 06 Jul, 2022 1 commit
-
-
zjing14 authored
* init commit * add c_permute * add mnk padding * fixed comments * Fixed comments Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 02 Jul, 2022 1 commit
-
-
Chao Liu authored
* refactor * update example * update example * gemm bilinear * clean * update
-
- 01 Jul, 2022 1 commit
-
-
zjing14 authored
* init commit * add desc * finished c permute * fixed vector lens
-
- 25 Jun, 2022 1 commit
-
-
Chao Liu authored
* ad gelu and fast_gelu * added GeLU and fast GeLU * clean up * add gemm+fastgelu example * add gemm+gelu instances * update profiler * clean up * clean up * adding gemm+bias+activation * clean * adding bias * clean * adding gemm multiple d * debugging * add gemm bias add fastgelu * rename, clean * refactoring; add readme * refactor * refactor * refactor * refactor * refactor * refactor * fix * fix * update example * update example * rename * update example * add ckProfiler * clean * clean * clean * clean * add client app example * update readme * delete obselete files * remove old client app * delete old file * cleaning * clean * remove half * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path for all examples * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * fix header path * revert client app example * clean build * fix build * temporary disable client test on Jenkins * clean * clean * clean
-
- 21 Jun, 2022 1 commit
-
-
Anthony Chang authored
* initial stub for standalone softmax * start device_softmax_mk_to_mk as a wrapper to device_reduce_mk_to_m * host softmax validates * compiles; to implement beta scaling * use NaN trick to efficiently ignore OOB values during sum of exponentials * freeload device_reduce's utility functions * clean up interface * adding prior value (beta scaling) * remove restriction related to perf considerations * apply clang-format * clean; disable diagnostics * resolve conflicts * add exp wrapper * honor HostTensorDesc interface; allow implicit cast from different vector<T> type * test softmax for fp16/fp32 * update readme * amend commit NaN trick * remove redundant param added during development * format * replace ScalarDataType with AccDataType * separate out test programs by precision type * move softmax sample code to its own folder * format * keep up with recent changes in reduction API * remove extra header
-
- 19 Jun, 2022 1 commit
-
-
Chao Liu authored
* ad gelu and fast_gelu * added GeLU and fast GeLU * clean up * add gemm+fastgelu example * add gemm+gelu instances * update profiler * clean up * clean up * adding gemm+bias+activation * clean * adding bias * clean * adding gemm multiple d * debugging * add gemm bias add fastgelu * rename, clean * refactoring; add readme * refactor * refactor * refactor * refactor * refactor * refactor * fix * fix * update example * update example * rename * update example * add ckProfiler * clean * clean * clean * clean * add comment * use type_convert * clean * clean element wise op
-
- 31 May, 2022 1 commit
-
-
myamlak authored
* Reference CGEMM + test stub * Format. * Incomplete simple implementation * Library instances * Sketch of tests * Test fixes. * Example added * Cosmetics * Add elementwise operation kernel and example * Add comment * Add template argument of dim . Prepare to support multiple dimension * Rename example * Support 1 dimension * Add static assert * Add comment * Second auxiliary buffer added * Extract pad * Remove redundant argument * Support any dimension for elementwise operation * Remove line * Let it be the multiple number of CU * Move thread per block to the parameter of constructor * Consuming binary ops to do A+B / A-B * Fix + cosmetics + bf16 test commented out temporarily * Format * Enabling bf16 test * Revert "Enabling bf16 test" This reverts commit f497e2ba441cd38cef062839391ae9fefefdb722. * Fix + test reenabled * fix build * Revert "fix build" This reverts commit d73102384bfbb609e487d6d0cd04a3c8c9c4ec9e. * post PR #235 merge fix * amend * Single workspace for cgemm + helper * Perf calc fix * Review remarks: static_cast * Review remarks: binary ops templated * Cleaning * Removal of instances and their tests * Review remarks from aosew addressed * Review remark: unnecessary attribute * Post-merge fixes * Restrict 4gemm to PassThrough + bug fix * Review remarks * update licence * change cgemm example to fp16 Co-authored-by:
rocking <chunylai@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
Anthony Chang <ac.chang@outlook.com>
-
- 30 May, 2022 1 commit
-
-
rocking5566 authored
* Implement reduction meand and reduction square mean * Refine file name * Add reduce mean and square mean * Fix parameter name * Add normalize device op (not implement invoker::run()) * Remove epislon * Refine deviceop * Add 5ary elementwise for normalization * Add layernorm example * layerNorm verication * Fix compiler error due to merge from develop * Fix typo * Fix compile error * Refine naming * [What] Suport non pointer for invoker and argument [Why] Snyc coding style with gemm * Refine folder name * Refine class name * Evaluate perf of the kernel * Fix compile error * [What] Refine perf evaluation in example of gemm + reduction [Why] evaluation of gemm + reduction may cause verification fail. Because evaluation will not initial global memory * clang-format
-
- 24 May, 2022 1 commit
-
-
Jianfeng Yan authored
* start adding navi21 GEMM * navi_gemm_km_kn_mn_fp32 compiles and passes one test. * rename variables and functions in gridwise_gemm_dlops_v1r3 * add other 3 layouts; format instance * adding more tuning parameters add tuning parameters for other 3 layouts * add gemm_dlops_f16 * tmp * add dependence of DeviceGemm::IsSupportedArg() on arch * minor changes * minor changes * minor changes * minor changes * minor changes * minor changes * minor changes * push gemm_dlops into profiler * minor changes * if using xdl or dlops is moved into profiler_gemm_impl * minor changes * minor changes * remove is_xdl from profile_gemm_impl * make IsSupportedArg dependent on arch for other device_gemm * minor changes * minor changes * fix a bug in f_generate_tensor_value * add 64x64x64 for gemm_dlops_int8 * add 64x64x64 for gemm_dlops_int8 * comment out 3 layouts in gemm_dlops_int8; add 32x32x32 for gemm_dlops_int8; init A values to 1 * fix * start fixing tuning parameters * monir * minor changes * minor changes * minor changes * fixing * adding example * adding example * adding example * add gemm fp32 example * clean up * use 128x128x16 as MNK tile in navi21 gemm example * bug fix * fix test * use new block c tile * clean * fix build Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
shaojiewang <wsjmessi@163.com>
-
- 20 May, 2022 2 commits
-
-
Shaojie WANG authored
* enable example of conv 1d/3d for bwd weight * make bf16 kernel do not use atomic add * using new gridwise gemm for bwd weight on convnd bwd weight Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
rocking5566 authored
* [What] Rename the example [Why] Prepare to add unary reduction * Add global oparation to the parameter * Add atomicmax * Fix compile error * Support atomicMax (hip library) * Rename the reduction example * Fix target name * use p_d1_grid as the indicator directly * Prevent performance issue. Let passthrough handle it. * Implement the function template the specialize the float2 * No need to separate into two lines * Remove empty line * add comment * Fix compile error due to merge from develop * make the implementation of atomic_max / atomic_add explicit for each datatype * Refine typo * For future CI test * Fix compiler error in ckProfiler * Merge commit 'de2769e3a6695b38a20529261273ddc5cdaab2fe' * simply use remove_pointer * Rename type and var * Refine example * Modify reducemax example * Fix bug in reduction * Change initialize range * Implement F64 version of atomicMax * Move reduction code together * Add buffer atomic_max * Fix coding style by clang-format * Integrate new api of DeviceGemmReduce_Xdl_CShuffle * Integrate Batch gemm reduction * Fix example * fix example * clean up * Fix batch gemm tensor operation * Fix coding style * Fix template augument * Fix clang format * Keep flexible of different stride for each D tensor * Fix compile error for ckProfiler * Fix typo * [What] Fix naming [Why] Prepare to add out elementop * Add DoutElementOp Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
rocking <chunylai@amd.com>
-
- 19 May, 2022 1 commit
-
-
rocking5566 authored
* Add elementwise operation kernel and example * Add comment * Add template argument of dim . Prepare to support multiple dimension * Rename example * Support 1 dimension * Add static assert * Add comment * Extract pad * Remove redundant argument * Support any dimension for elementwise operation * Remove line * Let it be the multiple number of CU * Move thread per block to the parameter of constructor * rename threadPerBlock with blockSize * Support double * rename kernel function name * remove redundant include header * Refine type * Need to the final dimension * Refine variable name * Refine type * Use index_t instead of int in API Co-authored-by:rocking <chunylai@amd.com>
-
- 13 May, 2022 1 commit
-
-
Anthony Chang authored
* validate examples in ctest runs * format * fix usage of check_err * amend * add example codes to custom target 'check' Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 05 Apr, 2022 2 commits
-
-
Adam Osewski authored
* Convolution ND * Code unification across dimensions for generating tensor descriptors. * Example * Instances * Move convnd f32 instance file to comply with repo structure. * Conv 1D tensor layouts. * Formatting and use ReferenceConv * Reference ConvFwd supporting 1D and 2D convolution. * Debug printing TensorLayout name. * Conv fwd 1D instance f32 * Refactor conv ND example. Needed to support various conv dimensio. Needed to support various conv dimensions * Rename conv nd example director to prevent conflicts. * Refactor some common utility to single file. Plus some tests. * Refactor GetHostTensorDescriptor + UT. * Add 1D test case. * Test reference convolution 1d/2d * Remove some leftovers. * Fix convolution example error for 1D * Refactor test check errors utility function. * Test Conv2D Fwd XDL * More UT for 1D case. * Parameterize input & weight initializers. * Rename example to prevent conflicts. * Split convnd instance into separate files for 1d/2d * Address review comments. * Fix data type for flops/gbytes calculations. * Assign example number 11. * 3D cases for convolution utility functions. * 3D reference convolution. * Add support for 3D convolution. * Check for inputs bigger than 2GB. * Formatting * Support for bf16/f16/f32/i8 - conv instances + UT. * Use check_err from test_util.hpp. * Split convnd test into separate files for each dim. * Fix data generation and use proper instances. * Formatting * Skip tensor initialization if not necessary. * Fix CMakefiles. * Remove redundant conv2d_fwd test. * Lower problem size for conv3D UT. * 3D case for convnd example. * Remove leftovers after merge. * Add Conv Specialization string to GetTypeString * Skip instance causing numerical errors. * Small fixes. * Remove redundant includes. * Fix namespace name error. * Script for automatic testing and logging convolution fwd UTs * Comment out numactl cmd. * Refine weights initalization and relax rtol for fp16 * Move test_util.hpp to check_err.hpp * Refine weights initalization and relax rtol for fp16 * Refactor common part of test conv utils. * Move utility function to single common place. * Add additional common functions to utility. * Refactor convnd_fwd_xdl examples. * Remove redundant files. * Unify structure. * Add constructor to ConvParams. * And add input parameters validation. * Modify conv examples to use single utility file. * Remove check_error from host_tensor.hpp * Get rid of check_indices function. * Remove bf16_to_f32 function overload for scalars. * Fix namespace. * Add half_float::half for check_err. * Fix conv params size in UT. * Fix weights initialization for int8. * Fix weights initialization for int8. * Add type_convert when store output in ref conv 1D. * Get back old conv2d_fwd_xdl operation. * Silence conv debug print. * format * clean * clean * Fix merge. * Fix namespace for check_err * Formatting. * Fix merge artifacts. * Remove deleted header. * Fix some includes and use ck::utils::check_err. * Remove unused check_indices restored by previous merge. * Fix namespaces after merge. * Fix compilation error. * Small fixes. * Use common functions. * Fix filename * Fix namespaces. * Fix merge artifact - retrieve removed by accident fun. * Fix ConvForwardSpecialization. * Adhere to coding style rules. * Fix merge artifacts. Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
ltqin authored
* change backward weight name * start add bwd weight lib and profiler * change tuning paramter * change output info * add bwd weight test * change test info * using conv_util * change wgt to weight * add } * add fp32
-
- 30 Mar, 2022 1 commit
-
-
Jianfeng Yan authored
* adding batched_gemm_and_reduction * batched_gemm_reduce works with bactch_count=1 * fix a bug in grid_size; batched_gemm_reduce works for batch_count > 1 * adding profiler for batched_gemm_fp16 * fixed a bug in declaration of d1 and d0; both example and profiler work * clang-format * cleanup * batched_gemm_reduce: add test * minor change * fixed some typo in function names
-
- 29 Mar, 2022 1 commit
-
-
ltqin authored
* start convnd bwd data * add 3d laoyout name * add conv1d reference * add con3d reference * finished example client code * conv1d kernel finished * fix input error * add conv3d * add 3d layout in conv_utils.hpp * fix sepecial check * addconvnd lib * add test for bwd data * finished test * add check slice length * convnd bwd data start * profiler can be compiled * fix some bug * set input to zero * modify readme for example * fix test_convnd_bwd_data bug * test_convnd_bwd_data parameter desc * workaround for 1d * workaroud for 2d * change init value * workaround for 3d int8 * fix init value bug * remove workaround * fix acc data type * add int32 * change select function to template * tilda to tilde * remove int32 instance * fix commit for device hpp * fix comments for profiler * using profile imp to test * add pass verification * fix conv2d reference * fix conflict * remove double batched_gemm * fix exampel conv2d data and test convnd * format * change conv2d_bwd_data return value * remove repeat = 1 * remove conv bwd data Co-authored-by:
ltqin <letaoqin@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
- 24 Mar, 2022 1 commit
-
-
Chao Liu authored
* add gridwise gemm v4r1 * rename * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * use sfc in shuffling * remove hardcode * remove hardcode * refactor * fix build * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * format * clean * adding gemm+reduce * adding profiler for gemm+reduce * adding gemm+reduce profiler * fix build * clean up * gemm+reduce * fix build * update DeviceGemm_Xdl_CShuffle; update enum to enum class * clean up * add test for gemm+reduce * clean up * refactor * fix build * fix build
-
- 22 Mar, 2022 1 commit
-
-
zjing14 authored
* init of grouped_gemm * 2 gemm test * perf test * clean * wrap desc into a struct * test cast static_arr to pointer * add ptr to GemmDesc * add grouped gemm profiler * fixed mem issue with unique_ptr * clean * clean * finished ckprofiler * Update README.md * readme * fixed readme * add example * improve code * fixed comments: reserve, seperate ptr and gemm_shapes * merge group and non-group * fixed comments: replace push_back with emplace_back to avoid copy constructor * fixed comments: unified blk2ctile; add test * ci fix * fixed ci * fixed ci * fixed ci
-
- 12 Mar, 2022 1 commit
-
-
rocking5566 authored
* [What] Separate fixpoint gemm from gemm example [Why] let example of gemm_int8 be pure gemm. [What] 1. Add gemm_requant_relu_requant, 2. Let CDataType be int32 in pure gemm, because no one use int8 CDataType. It is also part of gemm_requant_relu_requant * Fix path * Revise cmakelist due to merge develop Co-authored-by:rocking <chunylai@amd.com>
-