"...resnet50_tensorflow.git" did not exist on "cd700eaf706659e6d122d51a620e6ab88e7a799e"
- 04 May, 2023 1 commit
-
-
Rostyslav Geyyer authored
* Add TypeConvert class and start refactoring * Refactor TypeConvert as a struct * Get back to template functions type_convert * Add a type_convert_bf16_rtn, set rtz as default * Clean up * Add UnaryConvertPrecision struct for high-precision workloads * Format * Update type_convert to UnaryConvert on threadwise level * Update UnaryConvertPrecision * Format * Fix chmod * Add a flag to pick converion method * Format * Remove the added flag * Merge elementwise op with type conversion * Move type_convert to elemwise op, update the op * Update type_convert_precision -> bf16_convert_rtn * Clean up * Update comments * Update the CK_WORKAROUND_DENORM_FIX flag handling * Update the unneeded op to work but warn user * Remove the message * Use a PassThrough instead of ConvertBF16RTN to calcaulate reference * Format * Add missing include
-
- 03 May, 2023 2 commits
-
-
Illia Silin authored
* replace amd_buffer_atomic_add with hip_atomic_add * fix grouped_gemm_splitk kernels on mi300 * fix syntax * revert experimental atomic_add changes * fix the group of kernels from ticket 723 on MI300 --------- Co-authored-by:Jing Zhang <jizhan@amd.com>
-
Illia Silin authored
* replace amd_buffer_atomic_add with hip_atomic_add * fix grouped_gemm_splitk kernels on mi300 * fix syntax * revert experimental atomic_add changes --------- Co-authored-by:Jing Zhang <jizhan@amd.com>
-
- 28 Apr, 2023 1 commit
-
-
Illia Silin authored
* enable gfx940 * switch between intrinsic mfma routines on mi100/200 and mi300 * fix mfma_int8 on MI300 * disable 2 int8 examples on MI300 * Update cmake-ck-dev.sh * restore gitignore file * modify Jenkinsfile to the internal repo --------- Co-authored-by:
Jing Zhang <jizha@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com>
-
- 26 Apr, 2023 1 commit
-
-
Haocong WANG authored
Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
- 24 Apr, 2023 1 commit
-
-
Adam Osewski authored
* simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * B2C with 3D grid for KSplit * Remove unused code. * Use default B2C (3D grid) in grid gemm v2r4r2. * Device gemm splitk use B2C map. * Device GroupedGemmXdlSplitKCShuffle * Example for GroupedGemm Xdl SplitK * Introduce Device GroupedGemmSplitK * Fix updating kbatch size. * Add instance mk-nk-mn * Enable set kbatch in profiler. * Add GGemmSplitK mk-kn-mn instances * Add more instances & split into multiple files. * minor fix * tuning * clean * disabled failed instances * use pipe v2 * Ignore arg on not supported arch. * fix warning --------- Co-authored-by:
carlushuang <carlus.huang@amd.com> Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com> Co-authored-by:
Jing Zhang <jizhan@amd.com> Co-authored-by:
root <root@ctr-ubbsmc15.amd.com>
-
- 22 Apr, 2023 1 commit
-
-
Illia Silin authored
* simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * use name from tensor layout --------- Co-authored-by:carlushuang <carlus.huang@amd.com>
-
- 16 Apr, 2023 2 commits
-
-
Haocong WANG authored
-
Rostyslav Geyyer authored
Co-authored-by:Rosty Geyyer <rosty.geyyer@amd.com>
-
- 11 Apr, 2023 2 commits
-
-
Haocong WANG authored
-
zjing14 authored
* add a marco to turn off denorm fix by default * expose the marco --------- Co-authored-by:root <root@ctr-ubbsmc15.amd.com>
-
- 10 Apr, 2023 1 commit
-
-
rocking5566 authored
* Rename to proper naming * Add example of groupnorm + swish * Extract duplicate code in example * Add groupnorm + swish instances * Ractor instance generation, split into multiple cpp file * Add external api and client example * Refine profiler message * Use ck math version of exp * Refine problem size in example * Add host version of exp
-
- 07 Apr, 2023 1 commit
-
- 30 Mar, 2023 2 commits
-
-
Haocong WANG authored
-
carlushuang authored
* simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * use name from tensor layout
-
- 29 Mar, 2023 2 commits
-
-
Rostyslav Geyyer authored
* Add type_convert implementations for bf16 * Add the fix for conv_fwd * Add the fix for conv_bwd_data * Add the fix for conv_bwd_weight * Format * Format * Another format * Add a macro to use workaround on MI200 only * Format --------- Co-authored-by:
Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com>
-
rocking5566 authored
* Rename file. Prepare to support another activation * Add comment for quantization * Extract out_elementop * Add tanh example * Add conv + bias + tanh quantization instance * Add missing parameter * Refine cmake * Add external api and client example * Extract variable in example * Fix the comment --------- Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
- 23 Mar, 2023 1 commit
-
-
Haocong WANG authored
* Add CMake Option "USE_OPT_NAVI3X" * fix bug
-
- 22 Mar, 2023 1 commit
-
-
Illia Silin authored
* remove XDL parameters from WMMA kernel string * get rid f two more parameters
-
- 20 Mar, 2023 2 commits
-
-
Dan Yao authored
* rtn in ternary way * Check both flags to preserve NaN * Format * Rearrange flag1 * Apply suggestions from code review Co-authored-by:
Ronan Keryell <ronan@keryell.fr> --------- Co-authored-by:
Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by:
Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com> Co-authored-by:
Ronan Keryell <ronan@keryell.fr>
-
ltqin authored
* add workaround 637 * format * change id --------- Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
- 15 Mar, 2023 5 commits
-
-
rocking5566 authored
* Add conv perlayer quantization * Add gemm_dlops quantization * Support int8 for innerproduct * Refine gemm dlops int8 kernel parameter * Support gfx908(MI100) and gfx90a(MI200) * clang-format * Rename example number * Support different layout for d tensor * Add conv dlops perchannel quantization example * Move to example 40 * Extract the common code for different platform (dlops and xdlops) * Move ot subfolder. Prepare to add other op of quantization * Refine the quantization instance library * Add conv dl instances and client example * Remove unnecessary type * Add gemm quantization instance * Add external api and client example * Refine num_bytes * Separete different layout to different cpp * Add more xdl instances * Revert "Remove unnecessary type" This reverts commit 820869182f6a8f62b2c9004101ba6bf76b96be14. * Remove CShuffleDataType in dlops Let acc and CShuffleDataType be the same in xdlops --------- Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
Adam Osewski authored
* Pass shared mem pointer as pointer to void. * Device Op GroupedGEMM Multiple D * Example for grouped gemm multiple d. * Add MI200 to supported archs. --------- Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com>
-
Rostyslav Geyyer authored
* Add layout check to IsSupportedArgument * Format --------- Co-authored-by:
Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com>
-
Illia Silin authored
* make conv_fwd_bias_activation kernel id unique * add more parameters to conv and gemm kernel names * update GetTypeString for conv and gemm kernels * fix two more kernel strings
-
Haocong WANG authored
-
- 10 Mar, 2023 2 commits
-
-
Rostyslav Geyyer authored
Co-authored-by:Rosty Geyyer <rosty.geyyer@amd.com>
-
Haocong WANG authored
* Change gridwise gemm mD blockwise gemm to naive * RRR Gemm fix * Fix RCR gemm bug * Isolate wmma instructions * Update amd_inline_asm.hpp * Update amd_wmma.hpp * Update amd_wmma.hpp * fix syntax and update Jenkinsfile --------- Co-authored-by:
zjing14 <zhangjing14@gmail.com> Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by:
illsilin <Illia.Silin@amd.com>
-
- 09 Mar, 2023 2 commits
-
-
carlushuang authored
Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
Illia Silin authored
* enable building on Nav31 * fix syntax * replace GPU_TARGETS with offload-arch * add gfx1102 rachitecture * fix typo * update changelog
-
- 08 Mar, 2023 1 commit
-
-
Adam Osewski authored
* Grouped gemm + Gelu instances. * Device Instance Factory for GroupedGemm+Gelu * Client example * Rangify fill helper functions. * Fix name clash. * Profiler for grouped_gemm+gelu * No need to use full namespace name. * Add check for MRaw divisible by vector load. * Ugly fix for big errors. * Add grouped_gemm+gelu to profiler CMakelists. * Store in argument additional info. * Information about Mraw, Nraw, Kraw values. * Use FastGelu instead of Gelu. * Change client ex to use FastGelu * Remove relaxed error precision. * Remove duplicate output elementwise-op --------- Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com>
-
- 06 Mar, 2023 2 commits
-
-
Rostyslav Geyyer authored
Co-authored-by:Rosty Geyyer <rosty.geyyer@amd.com>
-
pmaybank authored
* Modify Doxygen config to pick up include directories recursively * Add DeviceMem struct to API Reference guide * Add classes that are used in Flash Attention kernel * Add a reference and config for generating bibliography Co-authored-by:Philip Maybank <Philip.Maybank@amd.com>
-
- 01 Mar, 2023 1 commit
-
-
Haocong WANG authored
* fix a bug blocking wmma_gemm_multipleD * Utilize matrix padder in device_wmma_op * cosmetic change for gemmpadding format * clang format * Change gridwise gemm from FIFO to KMN loop fashion
-
- 27 Feb, 2023 1 commit
-
-
Chao Liu authored
* clean up * fast gelu using builtin function * clean * clean * clean * clean: * clean * fix compilation * clean * clean --------- Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
- 24 Feb, 2023 1 commit
-
-
zjing14 authored
-
- 22 Feb, 2023 1 commit
-
-
Rostyslav Geyyer authored
* Add DeviceOp and examples * Format DeviceOp template arguments * Remove bf16 example * Format * Format * Update MakeABCGridDescriptor_A_K0_M_K1_B_K0_N_K1_C_M_N * Refactor argument preparation * Update conv_bwd_weight_dl to grouped_conv_bwd_weight_dl * Rename device op file * Update include directive in the example file * Update descriptor preparation for grouped op * Update the argument * Update batch handling * Add gridwise gemm supporting batched input * Update blockwise indexing, working version * Update copyright year * Update check if argument is supported * Refactor and make consistent with xdl examples * Update check if argument is supported * Add changelog entry * Added comments on Dl op split_k>1 support --------- Co-authored-by:
Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com>
-
- 16 Feb, 2023 1 commit
-
-
Illia Silin authored
* fix a bug while building for gfx1030 and add gfx1030 to targets * fix syntax
-
- 15 Feb, 2023 2 commits
-
-
Illia Silin authored
* clean up output from kernel_launch * set RUN_WARMUP to 0 by default * split the warm-up into a separate issue --------- Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
zjing14 authored
* add contraction_bilinear * add contraction_scale_xdl_fp64 * reduce tile size to avoid register spill --------- Co-authored-by:root <root@ctr-ubbsmc16.amd.com>
-