- 09 Jun, 2023 4 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
- 24 May, 2023 3 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Illia Silin authored
* fix headers for gpu instances * remove unused headers --------- Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
- 23 May, 2023 7 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
- 18 May, 2023 1 commit
-
-
Rostyslav Geyyer authored
-
- 16 May, 2023 1 commit
-
-
Rostyslav Geyyer authored
-
- 15 May, 2023 2 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
- 12 May, 2023 5 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
- 11 May, 2023 3 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
- 08 May, 2023 4 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
- 04 May, 2023 1 commit
-
-
Rostyslav Geyyer authored
* Add TypeConvert class and start refactoring * Refactor TypeConvert as a struct * Get back to template functions type_convert * Add a type_convert_bf16_rtn, set rtz as default * Clean up * Add UnaryConvertPrecision struct for high-precision workloads * Format * Update type_convert to UnaryConvert on threadwise level * Update UnaryConvertPrecision * Format * Fix chmod * Add a flag to pick converion method * Format * Remove the added flag * Merge elementwise op with type conversion * Move type_convert to elemwise op, update the op * Update type_convert_precision -> bf16_convert_rtn * Clean up * Update comments * Update the CK_WORKAROUND_DENORM_FIX flag handling * Update the unneeded op to work but warn user * Remove the message * Use a PassThrough instead of ConvertBF16RTN to calcaulate reference * Format * Add missing include
-
- 28 Apr, 2023 1 commit
-
-
Illia Silin authored
* enable gfx940 * switch between intrinsic mfma routines on mi100/200 and mi300 * fix mfma_int8 on MI300 * disable 2 int8 examples on MI300 * Update cmake-ck-dev.sh * restore gitignore file * modify Jenkinsfile to the internal repo --------- Co-authored-by:
Jing Zhang <jizha@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com>
-
- 10 Apr, 2023 1 commit
-
-
rocking5566 authored
* Rename to proper naming * Add example of groupnorm + swish * Extract duplicate code in example * Add groupnorm + swish instances * Ractor instance generation, split into multiple cpp file * Add external api and client example * Refine profiler message * Use ck math version of exp * Refine problem size in example * Add host version of exp
-
- 29 Mar, 2023 2 commits
-
-
Rostyslav Geyyer authored
* Add type_convert implementations for bf16 * Add the fix for conv_fwd * Add the fix for conv_bwd_data * Add the fix for conv_bwd_weight * Format * Format * Another format * Add a macro to use workaround on MI200 only * Format --------- Co-authored-by:
Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com>
-
rocking5566 authored
* Rename file. Prepare to support another activation * Add comment for quantization * Extract out_elementop * Add tanh example * Add conv + bias + tanh quantization instance * Add missing parameter * Refine cmake * Add external api and client example * Extract variable in example * Fix the comment --------- Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
- 20 Mar, 2023 1 commit
-
-
Dan Yao authored
* rtn in ternary way * Check both flags to preserve NaN * Format * Rearrange flag1 * Apply suggestions from code review Co-authored-by:
Ronan Keryell <ronan@keryell.fr> --------- Co-authored-by:
Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by:
Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com> Co-authored-by:
Ronan Keryell <ronan@keryell.fr>
-
- 15 Mar, 2023 2 commits
-
-
rocking5566 authored
* Add conv perlayer quantization * Add gemm_dlops quantization * Support int8 for innerproduct * Refine gemm dlops int8 kernel parameter * Support gfx908(MI100) and gfx90a(MI200) * clang-format * Rename example number * Support different layout for d tensor * Add conv dlops perchannel quantization example * Move to example 40 * Extract the common code for different platform (dlops and xdlops) * Move ot subfolder. Prepare to add other op of quantization * Refine the quantization instance library * Add conv dl instances and client example * Remove unnecessary type * Add gemm quantization instance * Add external api and client example * Refine num_bytes * Separete different layout to different cpp * Add more xdl instances * Revert "Remove unnecessary type" This reverts commit 82086918 . * Remove CShuffleDataType in dlops Let acc and CShuffleDataType be the same in xdlops --------- Co-authored-by:
zjing14 <zhangjing14@gmail.com>
-
Haocong WANG authored
-
- 10 Mar, 2023 1 commit
-
-
Haocong WANG authored
* Change gridwise gemm mD blockwise gemm to naive * RRR Gemm fix * Fix RCR gemm bug * Isolate wmma instructions * Update amd_inline_asm.hpp * Update amd_wmma.hpp * Update amd_wmma.hpp * fix syntax and update Jenkinsfile --------- Co-authored-by:
zjing14 <zhangjing14@gmail.com> Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by:
illsilin <Illia.Silin@amd.com>
-
- 09 Mar, 2023 1 commit
-
-
carlushuang authored
Co-authored-by:zjing14 <zhangjing14@gmail.com>
-