"vscode:/vscode.git/clone" did not exist on "4fdee96b3c9e955d732cca7849a8a22f7a801271"
- 06 May, 2023 5 commits
-
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
- 04 May, 2023 30 commits
-
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Rostyslav Geyyer authored
* Add TypeConvert class and start refactoring * Refactor TypeConvert as a struct * Get back to template functions type_convert * Add a type_convert_bf16_rtn, set rtz as default * Clean up * Add UnaryConvertPrecision struct for high-precision workloads * Format * Update type_convert to UnaryConvert on threadwise level * Update UnaryConvertPrecision * Format * Fix chmod * Add a flag to pick converion method * Format * Remove the added flag * Merge elementwise op with type conversion * Move type_convert to elemwise op, update the op * Update type_convert_precision -> bf16_convert_rtn * Clean up * Update comments * Update the CK_WORKAROUND_DENORM_FIX flag handling * Update the unneeded op to work but warn user * Remove the message * Use a PassThrough instead of ConvertBF16RTN to calcaulate reference * Format * Add missing include
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
Po-Yen, Chen authored
-
- 03 May, 2023 3 commits
-
-
Illia Silin authored
* replace amd_buffer_atomic_add with hip_atomic_add * fix grouped_gemm_splitk kernels on mi300 * fix syntax * revert experimental atomic_add changes * fix the group of kernels from ticket 723 on MI300 --------- Co-authored-by:Jing Zhang <jizhan@amd.com>
-
Illia Silin authored
* replace amd_buffer_atomic_add with hip_atomic_add * fix grouped_gemm_splitk kernels on mi300 * fix syntax * revert experimental atomic_add changes --------- Co-authored-by:Jing Zhang <jizhan@amd.com>
-
Illia Silin authored
-
- 02 May, 2023 1 commit
-
-
zjing14 authored
-
- 28 Apr, 2023 1 commit
-
-
Illia Silin authored
* enable gfx940 * switch between intrinsic mfma routines on mi100/200 and mi300 * fix mfma_int8 on MI300 * disable 2 int8 examples on MI300 * Update cmake-ck-dev.sh * restore gitignore file * modify Jenkinsfile to the internal repo --------- Co-authored-by:
Jing Zhang <jizha@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com>
-