"...composable_kernel_rocm.git" did not exist on "2a3f32b187ebe240e7bbf1f8e7db65d47a36cbe5"
- 06 Sep, 2022 28 commits
-
-
Po-Yen, Chen authored
This kernel is a clone of 'GridwiseElementwise_1D'
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
This device op is clone of 'DeviceElementwise'
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
- 05 Sep, 2022 8 commits
-
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
- 02 Sep, 2022 1 commit
-
-
zjing14 authored
* add scripts * fixed splitK_gemm_fp32 * clean * clean * use gemm_xdl_splitK_c_shuffle into profiler * remove device_gemm_xdl_splitk.hpp
-
- 01 Sep, 2022 1 commit
-
-
Chao Liu authored
* refactor * refactor * adding int4/int8/fp16/bf16 for conv+conv and gemm+gemm * adding int4/int8/fp16/bf16 for conv+conv and gemm+gemm * clean
-
- 31 Aug, 2022 2 commits
-
-
Po Yen Chen authored
* Refactor the design of DeviceGemmMultipleDMultipleR_Xdl_CShuffle * Add 'DeviceGroupedConvFwdMultipleDMultipleR' interface * Add DeviceGroupedConvFwdMultipleDMultipleR_Xdl_CShuffle * Remove 'GridwiseConvFwdMultipleDMultipleR_xdl_cshuffle' * Add 'TransformConvFwdToGemm<>' utility class (from Chao) * Use 'TransformConvFwdToGemm<>' to shorten code * Fix ill-formed method declaration * Re-implement MakeRGridDescriptor_M() function * Change problem description * Use macro to define layout types * Define K-reduced output tensor layout types * Let user to decide R output tensor layout * Rename variables * Add padding to the reduced output tensor if necessary * Extract common code as helper method * Remove debug message * Add missing include directive * Add partial fp16 Conv + Reduction example * Add example verification code for 2D Conv problem * Use type alias to simplify code * Share code across different-dimension Conv problems * Rename file/functions from run_conv_fwd* to run_convnd_fwd* * Make example code more verbose * Add code to support 1D & 3D Conv + Reduction on host * Add more examples for data type: bf16, fp32 * Add example for int8 * Add custom target to group examples * Use more general custom target name * Change the description in error message * Disable testing for example other than fp32 * Add examplel for int4 (just copy from int8) * Fix wrong data type * Use larger data type for intermediate tensors * Finish int4 example * Undefine macro PP_DEFINE_LAYOUT_TYPE() after use * Use named variables to replace magic numbers * Remove debug messages * Use same A/B data type for host Conv in int4 example * Add check for the 'RLayout' type argument * Group same-dim-layouts together in 'LayoutSetting<>' * Add 'final' specifier to utility classes * Use different initialization method for examples * Remove macro PP_DEFINE_LAYOUT_TYPE() * Fix code-comment mismatch * Use more reasonable initialization value for all data types * Default use init_method=1 for all examples * Remove never-used code * Remove confusing out-of-date comments * clean Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
Chao Liu <lc.roy86@gmail.com>
-
Chao Liu authored
* refactor conv * add conv+conv example, 1x1 only
-