- 27 Nov, 2023 1 commit
-
-
Bartlomiej Wroblewski authored
-
- 25 Nov, 2023 1 commit
-
-
Bartlomiej Wroblewski authored
* Add basic support for direct loads from global to LDS * Clean the code and comments * Add support for fp16 * Add comments * Add check for thread cluster lengths * Align non-direct-load fp16 example * Small fixes * Extend IsSupported to check for supported GPU gens * Build examples only on the supported HW * Do not throw when instance not supported in 04 example * Review: Apply review suggestions * Review: small fix * Review: small fix
-
- 17 Nov, 2023 1 commit
-
-
zjing14 authored
* improve 4k gemm perf * add f8 instances * format --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 14 Nov, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Introduce multiABD api and deprecate multiD * Replace multiD with multiABD * Mark structures as deprecated * Change doxygen deprecated to note to avoid warnings
-
- 13 Nov, 2023 1 commit
-
-
arai713 authored
* adding files for F32 example * adding functioning implementation with scalar multiplication and unary operator support * added fp 16 type check in unary square * updating scalar multiplication as an operator * functioning version with scalar operator * changing strides for col major * updated column major implementation * working column major implementation * cleaned up comments, rearranged/renamed files
-
- 10 Nov, 2023 2 commits
-
-
Bartłomiej Kocot authored
* Support multi AB for grouped conv fwd xdl * Add instances * Add client example * Add example * Add interface test * Minor fixes Minor fixes Minor fixes * Comment fixes * Fixes * Reference fix * Test xdl fixes * Improve multi_ab interface test
-
rocking authored
* Add layernorm backward reference code * Add groupnorm backward reference code * Add example * clang format * Fixc bug of reference layernorm and groupnorm * Fix naming * Refine naming * Add device op for normalization bwd gamma and beta * Refine template parameter * Add bwd gamma & beta of kernel * 1. Add groupnorm example 2. Refine layernorm naming * Narrow down the static check for performance * Refine variable name
-
- 09 Nov, 2023 2 commits
-
-
arai713 authored
* added working example for 5D input using 1D kernel * example with 5D input tensor and 2d kernel - not working: issues with arguments * added updated version of 3d device op - changed descriptors/dims * added example file to check kernel * fixed descriptor and isSupportedArgument stride problem * added and modified kernel for 3d - updated tids/loop * adding some more 5d example files * fixed some issues * changes made for testing * working version: fixed error in stride for A, still a bit inefficient * cleaned up formatting/comments * updating formatting * more formatting fixes * fixing cmake, adding back gpu targets in cmake script * adding client example * added instances for client example * fixed errors in client example * implemented client ex with device_elementwise.hpp and device_elementwise_3d_impl.hpp * removed extra files * minor formatting and naming fixes * adding test files and profiler * fixing minor error * minor fix * removed unneccesary comments, renamed files * updated instance list for client example, added different layout example * removing instances * fixed error in instance generation * remove comments * update profiler and client example tensor layouts * fixed errors in test/profiler * updated vector dim access to enable vector load * updated test/profiler files * updated example with 1d kernel * updating profiler * renamed files --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
rocking authored
* Rename folder * Add layernorm 4d fwd example * Rename original layernorm example * Add layernorm 4d f16 test * Add layernorm4d_fwd client example * Support layernorm4D in ckProfiler * Rename groupnorm to groupnorm fwd in example * Rename layernorm and group fwd in test * Rename normalization to normalization_fwd (instances) * Add fwd to DeviceNormalization * Rename external api header * Rename folder, because we can also add bwd in this folder * Add fwd in layernorm and groupnorm (profiler * Fix compile error --------- Co-authored-by:Po Yen Chen <PoYen.Chen@amd.com>
-
- 03 Nov, 2023 1 commit
-
-
Bartlomiej Wroblewski authored
-
- 02 Nov, 2023 1 commit
-
-
Bartlomiej Wroblewski authored
* Add support for mixed precision in contraction scale and bilinear (#936) * Extract common functionality to separate files * Reference contraction: Remove incorrect consts from type_converts * Reference contraction: Add missing type_convert for dst value * Reference contraction: Fix incorrect order of B matrix dimensions * Add support for mixed precision in contraction scale and bilinear * Move using statements from instances to a common file * Move using statements from examples to a common file * Fix the order of B matrix dimensions across examples and profiler * Fix the computation of error threshold * Make ComputeDataType an optional argument * Include possible DataType -> ComputeDataType casting error in the threshold * Remove commented code * Make the ComputeDataType an optional argument in instance --------- Co-authored-by:Illia Silin <98187287+illsilin@users.noreply.github.com>
-
- 01 Nov, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Add ScaleAddScaleAddRelu post op for conv fwd * Fixes * Fix instance file name * Minor fix
-
- 31 Oct, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Add support for groups in Img2Col/Col2Img * Fix interface test * Fix interface test G to N * Improve performance * Change gemm layout to 3d * Fixes
-
- 28 Oct, 2023 1 commit
-
-
Illia Silin authored
* Fix the fp8 conversion * Try clipping value before conversion * Fix return * Simplify with a const * reduce the gemm input tensor values to reduce round-off error * replace if-else with lambda * fix syntax --------- Co-authored-by:Rostyslav Geyyer <rosty.geyyer@amd.com>
-
- 21 Oct, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Fix instances dtype check * Fix source dtypes seletor for examples and tests * Sync with new cmakefile changes * Remove not needed ifdefs * Remove not needed ifdefs
-
- 20 Oct, 2023 1 commit
-
-
Rostyslav Geyyer authored
* Fix the conversion * Add bf8 functionality * Enable example on MI200 as well
-
- 19 Oct, 2023 2 commits
-
-
Bartłomiej Kocot authored
* Extend available elementwise operations with conv examples * Fixes * Remove not needed convert * Update CMakeFile and dir name
-
Bartlomiej Wroblewski authored
-
- 18 Oct, 2023 3 commits
-
-
rocking authored
* save mean and inverse std in normalization * Save mean and inverse std in splitK * Vector save mean and inv std * Modify instance for save mean and std * simplify the layernorm example * Save mean and std in groupnorm example * Save mean and inv std in ckProfiler and test * Remove compute data type from base class * Save mean and inv std in client example * Add changelog * clang format * Fix compile error * Refine naming * Avoid error in bf16 * revert changelog
-
zjing14 authored
* Add a condition to build fp8 instances * simplified buffer_load/store * add bfp8/fp8 * fixed * remove all f8/bf8 condition include folder * fixed cmake conditions * fixed DTYPES=fp16/bfp16 * fix * fixed buffer_load * fixed buffer_store * fix * clean example cmake files * fixed ci * fixed cit --------- Co-authored-by:
Rostyslav Geyyer <rosty.geyyer@amd.com> Co-authored-by:
Jing Zhang <jizha@amd.com>
-
zjing14 authored
* add gridwise_multi_abd * move element_op into RunRead * merge element_wise op with data read * add multiABD example * allow packed elementwise_op * changed example * clean * clean * add is_detected * fix * minor fix * add scaleAdd_vec4 example * init commit for contraction_multi_ABD * add examples * add examples of multiA and broadcast * update example * fixed comments * Update cmake-ck-dev.sh * Update cmake-ck-dev.sh * Add comments into the example * Update CMakeLists.txt --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 17 Oct, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Add grouped conv bwd weight wmma * Update README, changelog, profiler * Minor fixes * Fix grouped conv bwd wei dl kernel * Minor fixes * Minor stylistic fixes
-
- 13 Oct, 2023 1 commit
-
-
zjing14 authored
* add vector_type support into thread_copy_v3r1 * remove unncessary type_convert * fixed datatype * fixed dataType * changed API with is_packx_invocable * changed example * add missing cmake file * fixed ci * fixed cmake --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 10 Oct, 2023 1 commit
-
-
zjing14 authored
* workaround nan problem by changing output to fp16 * enable f8/bf8 gemm tests on MI200 * workaround f16 to f8 conversion --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 05 Oct, 2023 3 commits
-
-
Lauren Wrubleski authored
-
Illia Silin authored
* Revert "Add support for mixed precision in contraction scale and bilinear (#936)" This reverts commit f0748506. * revert commits #957 and #960
-
zjing14 authored
Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 04 Oct, 2023 1 commit
-
-
Rostyslav Geyyer authored
* Add f8 bf8 gemm example * Add element-wise ops * Add intrinsics * Update reference calculation * Add an additional type option for xdlops gemm * Fix build process * Add bf8 to buffer addressing * Update blockwise op, split typeA and typeB * Update for compatibility * Uppdate naming to f8->fp8 * Update naming * Format * Update naming (#937) * Add a client example * Add computetypes to device and gridwise ops * Add instances, update instance factory * Format * Fix a flag * Add ckProfiler mode * Fix typos * Add an example * Add bf8 generator * add bf8 mfma; fixed type_convert for bf8 * move verfication ahead of timing * Update reference calculation * Fix reference * Narrow down float init range * Fix bf8 bf8 mfma * Add bf8 @ fp8 mfma * Update example * Update instances * Update profiler api * Update for compatibility * Format * Remove extra example * Clean up * workaround convert --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 03 Oct, 2023 2 commits
-
-
zjing14 authored
Co-authored-by:Jing Zhang <jizha@amd.com>
-
zjing14 authored
* add missing ComputeType * fixed * Update cmake-ck-dev.sh --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 02 Oct, 2023 3 commits
-
-
Rostyslav Geyyer authored
* Add f8 bf8 gemm example * Add element-wise ops * Add intrinsics * Update reference calculation * Add an additional type option for xdlops gemm * Fix build process * Add bf8 to buffer addressing * Update blockwise op, split typeA and typeB * Update for compatibility * Uppdate naming to f8->fp8 * Update naming * Format
-
Illia Silin authored
-
zjing14 authored
* add gridwise_multi_abd * move element_op into RunRead * merge element_wise op with data read * add multiABD example * allow packed elementwise_op * changed example * clean * clean * add is_detected * fix * minor fix * add scaleAdd_vec4 example * init commit for contraction_multi_ABD * add examples * add examples of multiA and broadcast * update example * fixed comments * Update cmake-ck-dev.sh * Update cmake-ck-dev.sh * Add comments into the example --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 29 Sep, 2023 1 commit
-
-
Bartlomiej Wroblewski authored
* Extract common functionality to separate files * Reference contraction: Remove incorrect consts from type_converts * Reference contraction: Add missing type_convert for dst value * Reference contraction: Fix incorrect order of B matrix dimensions * Add support for mixed precision in contraction scale and bilinear * Move using statements from instances to a common file * Move using statements from examples to a common file * Fix the order of B matrix dimensions across examples and profiler * Fix the computation of error threshold * Make ComputeDataType an optional argument * Include possible DataType -> ComputeDataType casting error in the threshold * Remove commented code
-
- 28 Sep, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Add grouped conv bwd data wmma * Fix copyrights * Add instances with smaller NPerBlock * Update interface test * Minor stylistic fixes * Minor stylistic fixes
-
- 27 Sep, 2023 2 commits
-
-
Bartłomiej Kocot authored
* Add column to image kernel * Minor fixes for dtypes and client examples * Disable tests for disabled dtypes * Disable add instances functions for disabled data types * Minor stylistic fixes * Revert "Disable add instances functions for disabled data types" This reverts commit 728b8695. * Instances reduction * Add comments in device_column_to_image_impl * Update changelog and Copyrights * Improve changelog
-
zjing14 authored
* add gridwise_multi_abd * move element_op into RunRead * merge element_wise op with data read * add multiABD example * allow packed elementwise_op * changed example * clean * clean * add is_detected * fix * minor fix * add scaleAdd_vec4 example --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 21 Sep, 2023 1 commit
-
-
Illia Silin authored
* refactor cmake files for the tests * refactor cmake files for examples * fix cmake for gemm example * fix the cmake file for all examples * add splitting by data types in gemm_splitk instance header * rename test to reflect only dl instances are used * clean up CI workspace, update cmake for instances * change the jenkinsfile syntax * build all instances except DL on gfx11 * move workspace cleanup after stages * clean up workspace after every stage * isolate data types in grouped_conv_fwd header * isolate dl instances for grouped_conv2d_fwd * fix syntax * fix cmake and batchnorm instances * fix typo * fix reduction instances * fix grouped_conv headers * fix syntax * replace parsing logic for instances, replace bfp16 with bf16 * fix the client examples build * clean up DTYPES from instances cmake files * update the parsing logic in cmake files * make an exception for reduction kernels * update few remaining cmake files to handle DTYPES * fix syntax * fix cmake conflicts * replace f8 with fp8 test name * resolve conflicts for dpp instances
-
- 15 Sep, 2023 1 commit
-
-
zjing14 authored
* move all arguments into device * add b2c_tile_map * add examples * add SetDeviceKernelArgs * dedicated fixed_nk solution * init client api * add grouped_gemm_bias example * add a instance * add instances * formatting * fixed cmake * Update EnableCompilerWarnings.cmake * Update cmake-ck-dev.sh * clean; fixed comments * fixed comment * add instances for fp32 output * add instances for fp32 output * add fp32 out client example * fixed CI * init commit for kbatch * add splitk gridwise * format * fixed * clean deviceop * clean code * finish splitk * fixed instances * change m_loops to tile_loops * add setkbatch * clean code * add splitK+bias * add instances * opt mk_nk instances * clean examples * fixed CI * remove zero * finished non-zero * clean * clean code * optimized global_barrier * fixed ci * fixed CI * instance and client * removed AddBias * format * fixed CI * fixed CI * move 20_grouped_gemm to 21_grouped_gemm * clean * formatting * clean * clean * fixed computeType --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 13 Sep, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Add grouped conv bwd weight dl instances and new layout * Add M and N padding * Remove todo comment * Enable grouped conv fwd dl k,c=1 generic instance * Comment fixes
-