- 29 Sep, 2023 1 commit
-
-
Bartlomiej Wroblewski authored
* Extract common functionality to separate files * Reference contraction: Remove incorrect consts from type_converts * Reference contraction: Add missing type_convert for dst value * Reference contraction: Fix incorrect order of B matrix dimensions * Add support for mixed precision in contraction scale and bilinear * Move using statements from instances to a common file * Move using statements from examples to a common file * Fix the order of B matrix dimensions across examples and profiler * Fix the computation of error threshold * Make ComputeDataType an optional argument * Include possible DataType -> ComputeDataType casting error in the threshold * Remove commented code
-
- 28 Sep, 2023 2 commits
-
-
Bartłomiej Kocot authored
* Add grouped conv bwd data wmma * Fix copyrights * Add instances with smaller NPerBlock * Update interface test * Minor stylistic fixes * Minor stylistic fixes
-
Bartłomiej Kocot authored
* Add grouped convolution changes to changelog * Fix 0.2.0 ck release rocm version * Suggested CHANGELOG.md edits * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md --------- Co-authored-by:Lisa <lisajdelaney@gmail.com>
-
- 27 Sep, 2023 5 commits
-
-
Illia Silin authored
* Added error check after kernel launch (#919) Co-authored-by:
Xiaodong Wang <xdwang@meta.com> Co-authored-by:
Xiaodong Wang <xw285@cornell.edu> * remove M=0 test cases for test_gemm_splitk --------- Co-authored-by:
Xiaodong Wang <xdwang@meta.com> Co-authored-by:
Xiaodong Wang <xw285@cornell.edu>
-
Bartlomiej Wroblewski authored
* Handle type conversions to a const datatype * Review: Handle X being const data type as well * Review: Remove typo
-
Bartłomiej Kocot authored
* Add column to image kernel * Minor fixes for dtypes and client examples * Disable tests for disabled dtypes * Disable add instances functions for disabled data types * Minor stylistic fixes * Revert "Disable add instances functions for disabled data types" This reverts commit 728b8695. * Instances reduction * Add comments in device_column_to_image_impl * Update changelog and Copyrights * Improve changelog
-
zjing14 authored
* add gridwise_multi_abd * move element_op into RunRead * merge element_wise op with data read * add multiABD example * allow packed elementwise_op * changed example * clean * clean * add is_detected * fix * minor fix * add scaleAdd_vec4 example --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
Illia Silin authored
* split ckProfiler gfx9 package into gfx90 and gfx94 * use lower case for package names
-
- 26 Sep, 2023 4 commits
-
-
zjing14 authored
* added kpad support into v2r3 * add generic instances * fixed comments * fixed mnk padding * Update device_batched_gemm_xdl.hpp * fixed kpad --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
Rostyslav Geyyer authored
* Add fp8 gemm instances * Update instance naming
-
Illia Silin authored
-
Illia Silin authored
* split the types in gemm_bilinear instances, add condition to cmake policy * fix syntax * split the data types in batchnorm examples * fix the batchnorm_bwd test * fix types in the batchnorm_bwd test
-
- 23 Sep, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Add 3d grouped conv fwd wmma instances * Refactor fwd conv tests * Split wmma instances for each specialization * Minor stylistic fixes
-
- 22 Sep, 2023 1 commit
-
-
Rostyslav Geyyer authored
-
- 21 Sep, 2023 1 commit
-
-
Illia Silin authored
* refactor cmake files for the tests * refactor cmake files for examples * fix cmake for gemm example * fix the cmake file for all examples * add splitting by data types in gemm_splitk instance header * rename test to reflect only dl instances are used * clean up CI workspace, update cmake for instances * change the jenkinsfile syntax * build all instances except DL on gfx11 * move workspace cleanup after stages * clean up workspace after every stage * isolate data types in grouped_conv_fwd header * isolate dl instances for grouped_conv2d_fwd * fix syntax * fix cmake and batchnorm instances * fix typo * fix reduction instances * fix grouped_conv headers * fix syntax * replace parsing logic for instances, replace bfp16 with bf16 * fix the client examples build * clean up DTYPES from instances cmake files * update the parsing logic in cmake files * make an exception for reducti...
-
- 20 Sep, 2023 1 commit
-
-
Illia Silin authored
-
- 19 Sep, 2023 2 commits
-
-
Illia Silin authored
* update to rocm5.7 by default * fix jenkinsfile syntax
-
Illia Silin authored
-
- 18 Sep, 2023 2 commits
-
-
Bartlomiej Wroblewski authored
* Fix vector lengths of DL GEMM instances with padding * Add checks for correctness of vector lenghts in DL GEMM
-
Rostyslav Geyyer authored
* Add native conversions * Add bf8 conversions
-
- 15 Sep, 2023 2 commits
-
-
Bartlomiej Kocot authored
Remove unnecessary ignoring Update test/grouped_convnd_bwd_weight/test_grouped_convnd_bwd_weight.cpp
-
zjing14 authored
* move all arguments into device * add b2c_tile_map * add examples * add SetDeviceKernelArgs * dedicated fixed_nk solution * init client api * add grouped_gemm_bias example * add a instance * add instances * formatting * fixed cmake * Update EnableCompilerWarnings.cmake * Update cmake-ck-dev.sh * clean; fixed comments * fixed comment * add instances for fp32 output * add instances for fp32 output * add fp32 out client example * fixed CI * init commit for kbatch * add splitk gridwise * format * fixed * clean deviceop * clean code * finish splitk * fixed instances * change m_loops to tile_loops * add setkbatch * clean code * add splitK+bias * add instances * opt mk_nk instances * clean examples * fixed CI * remove zero * finished non-zero * clean * clean code * optimized global_barrier * fixed ci * fixed CI * instance and client...
-
- 14 Sep, 2023 1 commit
-
-
Illia Silin authored
-
- 13 Sep, 2023 4 commits
-
-
Jun Liu authored
-
Bartłomiej Kocot authored
* Add grouped conv bwd weight dl instances and new layout * Add M and N padding * Remove todo comment * Enable grouped conv fwd dl k,c=1 generic instance * Comment fixes
-
zjing14 authored
* fixed fp8 init; and reference gemm * Update host_tensor_generator.hpp * fixed convert * fixed reference gemm * fixed comments * fixed comments * fixed ci * fixed computeType --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
Illia Silin authored
* enable building DL kernels with the daily staging compiler * move the DL_KERNELS flag to another function
-
- 12 Sep, 2023 3 commits
-
-
Rostyslav Geyyer authored
* Refactor f8_t to add bf8_t * Add check_err impl for f8_t * Update fp8 test * Format * Revert the fix * Update vector_type implementation * Add bf8 test * Add bf8, use BitInt types * Add bf8 conversion methods * Update type_convert for fp8/bf8 * Add check_err fp8/bf8 support * Add subnorm fp8 tests * Add subnorm bf8 tests * Fix conversion * Add bf8 cmake bindings * Add macros to enable build with disabled fp8/bf8 * Remove is_native method * Update flag combination for mixed precision instances * Add more flag checks * Add another flag to a client example * Add type traits, decouple f8/bf8 casting * Clean up * Decouple fp8 and bf8 flags * Remove more redundant flags * Remove leftover comments
-
Illia Silin authored
-
Bartlomiej Wroblewski authored
-
- 11 Sep, 2023 1 commit
-
-
Sam Wu authored
Co-authored-by:samjwu <samjwu@users.noreply.github.com>
-
- 08 Sep, 2023 2 commits
-
-
Bartlomiej Wroblewski authored
-
Haocong WANG authored
* fix wmma gemm int8; add grouped conv int8 example * Add int8 gemm-bilinear instances * compile sanity check unknown * Sanity pass + clang-format * add int8 conv profiler instances * solve merge conflict --------- Co-authored-by:
zjing14 <zhangjing14@gmail.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
- 06 Sep, 2023 3 commits
-
-
Bartlomiej Wroblewski authored
* Redesign the DPP8 GEMM kernel to use warp-wise component * Review: Improve error messages * Review: Remove unnecessary empty lines * Review: Fix M, N per thread names * Review: Rename mfma_input_type to dpp_input_type * Review: Fix tensor adaptor; remove unnecessary element * Review: Remove calls to dpp_gemm's MakeCDescriptor * Review: Add blockwise doc, change function names to include dimension names * Review: Remove duplicated code; Move Block2CtileMap alias to the top of the file * Review: Add __restrict__ keywords * Review: Use MatrixPadder for padding A, B, C matrices * Review: Remove hardcoded datatypes * Review: Change names from FloatX to XDataType * Review: Introduce AK0 and BK0 instead of a single K0 * Review: Remove construction of dpp_datatypes object * Review: Rename DppInstrRunner to DppLanegroupGemm
-
zjing14 authored
* added kpad support into v2r3 * add generic instances * fixed comments * fixed mnk padding * Update device_batched_gemm_xdl.hpp --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
zjing14 authored
* add generic instances; fixed initi with fp8 * fixed comment --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 05 Sep, 2023 4 commits
-
-
Illia Silin authored
-
Bartlomiej Wroblewski authored
Add contribution guidelines to the documentation
-
Illia Silin authored
-
Bartłomiej Kocot authored
* Add image to column kernel * Add instances, tests, profiler, example * Add client example * Several fixes of image to column * Fix variable name in device_image_to_column_impl * Several fixes of image to column profiler * Fix num_btype calculation * Make new mesaurements for correct bytes calculation
-