- 06 Oct, 2021 1 commit
-
-
Qianfeng authored
* Tiny fix in using data type template parameters in blockwise and direct_threadwise kernel * Fix with regard to implementing GetZeroVal() in both kernel and host * Avoid convert to compType from dstDataType before writting the output value * Add half_t support to NumericLimits and make constexpr GetZeroVal() of binary operator * Add CONSTANT decorator for descriptor read buffer * Use get_thread_local_1d_id() for thread local Id * Rename GetZeroVal() to GetReductionZeroVal() in the kernels * Remove constexpr from initialized zeroVal and tiny fix in reduction_operator.hpp * Occasional tiny simplification and update in the kernel files * Update to re-order tensor dimensions on the host, split second_call kernel wrapper files and simplify reduce_all kernel wrappers * Update to remove OpenCL tidy checking failures * Update for better readability * Remove unused codes and not-needed template parameters in the kernel wrappers Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 27 Aug, 2021 1 commit
-
-
Qianfeng authored
* add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files * make inner product compatible on gfx900 * Update src/include/miopen/solver/ck_utility_common.hpp * compiler parameter use stream * use int instead of index_t in kernel wrapper * DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element * Add dynamic generic reduction kernel layer (kernel wrappers, kernel implementations and utilities) * Some updates to dynamic composable kernel facility for the need of dynamic generic reduction * Update to generic reduction C++ host interface layer to support dynamic generic reduction * Update to remove tidy complaints in host interface layer * Change the unary operator form from void op(T &x) to T op(T x) * Update to pass single workspace pointer for all kernels (fix for OpenCL backend) * Use cppcheck-suppress to prevent some strange warnings * Re-use operator [] and () for DynamicBuffer and update to depending codes * Remove useless codes in first call threadwise/warpwise/blockwise kernel wrappers * [performance] Remove un-needed local buffer initialization Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
JD <Jehandad.Khan@amd.com>
-