- 06 Sep, 2022 23 commits
-
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
This device op is clone of 'DeviceElementwise'
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
- 05 Sep, 2022 8 commits
-
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
- 02 Sep, 2022 1 commit
-
-
zjing14 authored
* add scripts * fixed splitK_gemm_fp32 * clean * clean * use gemm_xdl_splitK_c_shuffle into profiler * remove device_gemm_xdl_splitk.hpp
-
- 01 Sep, 2022 1 commit
-
-
Chao Liu authored
* refactor * refactor * adding int4/int8/fp16/bf16 for conv+conv and gemm+gemm * adding int4/int8/fp16/bf16 for conv+conv and gemm+gemm * clean
-
- 31 Aug, 2022 2 commits
-
-
Po Yen Chen authored
* Refactor the design of DeviceGemmMultipleDMultipleR_Xdl_CShuffle * Add 'DeviceGroupedConvFwdMultipleDMultipleR' interface * Add DeviceGroupedConvFwdMultipleDMultipleR_Xdl_CShuffle * Remove 'GridwiseConvFwdMultipleDMultipleR_xdl_cshuffle' * Add 'TransformConvFwdToGemm<>' utility class (from Chao) * Use 'TransformConvFwdToGemm<>' to shorten code * Fix ill-formed method declaration * Re-implement MakeRGridDescriptor_M() function * Change problem description * Use macro to define layout types * Define K-reduced output tensor layout types * Let user to decide R output tensor layout * Rename variables * Add padding to the reduced output tensor if necessary * Extract common code as helper method * Remove debug message * Add missing include directive * Add partial fp16 Conv + Reduction example * Add example verification code for 2D Conv problem * Use type alias to simplify code * Share code across different-dimension Conv problems * Rename file/functions from run_conv_fwd* to run_convnd_fwd* * Make example code more verbose * Add code to support 1D & 3D Conv + Reduction on host * Add more examples for data type: bf16, fp32 * Add example for int8 * Add custom target to group examples * Use more general custom target name * Change the description in error message * Disable testing for example other than fp32 * Add examplel for int4 (just copy from int8) * Fix wrong data type * Use larger data type for intermediate tensors * Finish int4 example * Undefine macro PP_DEFINE_LAYOUT_TYPE() after use * Use named variables to replace magic numbers * Remove debug messages * Use same A/B data type for host Conv in int4 example * Add check for the 'RLayout' type argument * Group same-dim-layouts together in 'LayoutSetting<>' * Add 'final' specifier to utility classes * Use different initialization method for examples * Remove macro PP_DEFINE_LAYOUT_TYPE() * Fix code-comment mismatch * Use more reasonable initialization value for all data types * Default use init_method=1 for all examples * Remove never-used code * Remove confusing out-of-date comments * clean Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
Chao Liu <lc.roy86@gmail.com>
-
Chao Liu authored
* refactor conv * add conv+conv example, 1x1 only
-
- 30 Aug, 2022 2 commits
-
-
Adam Osewski authored
* GEMM + Reduce max fp16+fp32 * GEmm + Max bf16 + int8 * Refactor common definitions. * Refactor common func of mean meansquare example. * More examples for mean meansquare. * Update int8 examples and skip them cause of random errors. * Int4 examples. * Fix examples for max int4/8 * Tensor conversion for int4 input data for mean meansquare example. * Remove int4 mean_meansquare example * Fix int8 mean_meansquare example. -All ReductionAccData and R<N>DataType have to be F32. The INT32 data type is giving wrong results. * Guard int4 with ifdef * Change int8 example to add_addsquare due to div rounding err. * Clang format * Change the return type of common function. * Get back int8 example with division. * Remove int8 mean meansquare. * Use proper cast for BF16 data type. * Use ck::literals. * Use proper data type for host tensors & reference. - Use ReduceAccDataType for reference gemm output data type. - Cast host reference output tensor to EDataType - Fix ifdefs for int4. Co-authored-by:Adam Osewski <aosewski@amd.com>
-
Shaojie WANG authored
* add padding algo for bmm+scale+softmax+bmm. Version for verification * remove verification code * remove comments * add padded bmm scale softmax bmm example * format * refactor * add comments for usages of padding bmm+scale+softmax+bmm Co-authored-by:Chao Liu <lc.roy86@gmail.com>
-
- 29 Aug, 2022 2 commits
-
-
Anthony Chang authored
* avoid potential hazard; flaky test issue persists * pin down the random seed to avoid flakiness
-
Illia Silin authored
* fix the performance of the batched gemm verification * fix tabs
-
- 26 Aug, 2022 1 commit
-
-
Illia Silin authored
* replace hipcc compiler with clang++ * build client app with hipcc * build client app with clang * add an option to build with hipcc ro clang * fix the environment for client app * fix setting up compiler in cmake_build * change the way the compiler is set
-