- 14 Sep, 2023 1 commit
-
-
Illia Silin authored
-
- 13 Sep, 2023 4 commits
-
-
Jun Liu authored
-
Bartłomiej Kocot authored
* Add grouped conv bwd weight dl instances and new layout * Add M and N padding * Remove todo comment * Enable grouped conv fwd dl k,c=1 generic instance * Comment fixes
-
zjing14 authored
* fixed fp8 init; and reference gemm * Update host_tensor_generator.hpp * fixed convert * fixed reference gemm * fixed comments * fixed comments * fixed ci * fixed computeType --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
Illia Silin authored
* enable building DL kernels with the daily staging compiler * move the DL_KERNELS flag to another function
-
- 12 Sep, 2023 3 commits
-
-
Rostyslav Geyyer authored
* Refactor f8_t to add bf8_t * Add check_err impl for f8_t * Update fp8 test * Format * Revert the fix * Update vector_type implementation * Add bf8 test * Add bf8, use BitInt types * Add bf8 conversion methods * Update type_convert for fp8/bf8 * Add check_err fp8/bf8 support * Add subnorm fp8 tests * Add subnorm bf8 tests * Fix conversion * Add bf8 cmake bindings * Add macros to enable build with disabled fp8/bf8 * Remove is_native method * Update flag combination for mixed precision instances * Add more flag checks * Add another flag to a client example * Add type traits, decouple f8/bf8 casting * Clean up * Decouple fp8 and bf8 flags * Remove more redundant flags * Remove leftover comments
-
Illia Silin authored
-
Bartlomiej Wroblewski authored
-
- 11 Sep, 2023 1 commit
-
-
Sam Wu authored
Co-authored-by:samjwu <samjwu@users.noreply.github.com>
-
- 08 Sep, 2023 2 commits
-
-
Bartlomiej Wroblewski authored
-
Haocong WANG authored
* fix wmma gemm int8; add grouped conv int8 example * Add int8 gemm-bilinear instances * compile sanity check unknown * Sanity pass + clang-format * add int8 conv profiler instances * solve merge conflict --------- Co-authored-by:
zjing14 <zhangjing14@gmail.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
- 06 Sep, 2023 3 commits
-
-
Bartlomiej Wroblewski authored
* Redesign the DPP8 GEMM kernel to use warp-wise component * Review: Improve error messages * Review: Remove unnecessary empty lines * Review: Fix M, N per thread names * Review: Rename mfma_input_type to dpp_input_type * Review: Fix tensor adaptor; remove unnecessary element * Review: Remove calls to dpp_gemm's MakeCDescriptor * Review: Add blockwise doc, change function names to include dimension names * Review: Remove duplicated code; Move Block2CtileMap alias to the top of the file * Review: Add __restrict__ keywords * Review: Use MatrixPadder for padding A, B, C matrices * Review: Remove hardcoded datatypes * Review: Change names from FloatX to XDataType * Review: Introduce AK0 and BK0 instead of a single K0 * Review: Remove construction of dpp_datatypes object * Review: Rename DppInstrRunner to DppLanegroupGemm
-
zjing14 authored
* added kpad support into v2r3 * add generic instances * fixed comments * fixed mnk padding * Update device_batched_gemm_xdl.hpp --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
zjing14 authored
* add generic instances; fixed initi with fp8 * fixed comment --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 05 Sep, 2023 6 commits
-
-
Illia Silin authored
-
Bartlomiej Wroblewski authored
Add contribution guidelines to the documentation
-
Illia Silin authored
-
Bartłomiej Kocot authored
* Add image to column kernel * Add instances, tests, profiler, example * Add client example * Several fixes of image to column * Fix variable name in device_image_to_column_impl * Several fixes of image to column profiler * Fix num_btype calculation * Make new mesaurements for correct bytes calculation
-
Bartłomiej Kocot authored
-
Bartłomiej Kocot authored
* Fix K padding calculation for grouped conv data * Restore previous padd for 1x1 specialization
-
- 04 Sep, 2023 1 commit
-
-
Lauren Wrubleski authored
-
- 31 Aug, 2023 3 commits
-
-
zjing14 authored
* move all arguments into device * add b2c_tile_map * add examples * add SetDeviceKernelArgs * dedicated fixed_nk solution * init client api * add grouped_gemm_bias example * add a instance * add instances * formatting * fixed cmake * Update EnableCompilerWarnings.cmake * Update cmake-ck-dev.sh * clean; fixed comments * fixed comment * add instances for fp32 output * add instances for fp32 output * add fp32 out client example * fixed CI * init commit for kbatch * add splitk gridwise * format * fixed * clean deviceop * clean code * finish splitk * fixed instances * change m_loops to tile_loops * add setkbatch * clean code * add splitK+bias * add instances * opt mk_nk instances * clean examples * fixed CI * remove zero * finished non-zero * clean * clean code * optimized global_barrier * fixed ci * fixed CI * removed AddBias * format * fixed CI * fixed CI * move 20_grouped_gemm to 21_grouped_gemm --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
rocking authored
* Add maxpool instances * Rename index pool to max pool. * Add maxpool bwd bf16 instances * Add avg pool bwd instances * Rename avgpool and maxpool to avg_pool3d and max_pool * Add bf16 pool fwd instances * Add max pool bwd to ckProfiler * Add avg pool3d bwd to ckProfiler * Add avg pool bwd test * Fix bug of reference pool fwd (dilation) * Fix bug of max pool bwd (dilation and initZero) * Support bf16 compute data type * Force compute type be f32. Because atomicAdd only support f32 * Add max pool bwd test * Rename folder * Rename pool * Add max pool bwd client example * Add avg pool bwd client example * Add missing workspace * clang format * Rename macro * remove useless header * remove useless layout
-
Illia Silin authored
-
- 30 Aug, 2023 1 commit
-
-
Bartłomiej Kocot authored
-
- 29 Aug, 2023 1 commit
-
-
zjing14 authored
* add an example of customized bfp16_rtn * fixed threadwise_copy --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 28 Aug, 2023 1 commit
-
-
zjing14 authored
* add compute_type * add multiply_add ckProfiler * add f8_fp16 support * clean * clean * fixed lds size calc * format --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 23 Aug, 2023 4 commits
-
-
Jun Liu authored
* experiment with config file * experiment with version.h config * add more info to version.h * minor updates * minor updates * fix case where DTYPE is not used * large amount of files but minor changes * remove white space * minor changes to add more MACROs * fix cmakedefine01 * fix issue with CK internal conflict * fix define and define value * fix clang-format * fix formatting issue * experiment with cmake * clang format v12 to be consistent with miopen * avoid clang-format for config file
-
Qianfeng authored
-
Illia Silin authored
-
zjing14 authored
Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 22 Aug, 2023 3 commits
-
-
zjing14 authored
* updated regular gemm * update ckProfiler * fixed gtests --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
Bartłomiej Kocot authored
* Fix transform and instances for grouped conv bwd data * Add instances for small K and small C * Remove workaround after fix * Fix interface tests
-
Rostyslav Geyyer authored
* Add ComputeType arg to splitk device and gridwise ops * Update for gridwise op compatibility * Update bf16 and int8 splitk gemm examples with ComputeType * Add instances * Update ckProfiler for mixed precision cases * Add a mixed precision splitK gemm client example --------- Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
- 18 Aug, 2023 1 commit
-
-
cloudhan authored
-
- 17 Aug, 2023 1 commit
-
-
Bartlomiej Wroblewski authored
-
- 14 Aug, 2023 2 commits
-
-
Bartlomiej Wroblewski authored
-
rocking authored
* Do not hardcode stride * devicePool2DFwd Inherit devicePool3DFwd * Move instance declaration out of common * Add dilation * use the pool3d rank, because pool2d inherit pooo3d * calculate Do Ho Wo for the dilation * Fix header name * Modify ckProfiler * Remove pool2d instance * Remove pool2d in profiler * Remove pool2d and add dilation * In to client example, this commit revise following: 1. Add dilation. 2. Use pool3d to implement pool2d * Refine naming and IsSupportedArgument() * Add dilation to maxpool bwd example * clang format * 1. Remove useless header 2. Fix copyright 3. Refine naming * Add layout parameter to pool fwd * clang format * Fix merge error * Fix compile error * Remove layout parameter in derived class * Refine changlog * Fix compile error * Fix compiler error * Add layout to external api and profiler
-
- 11 Aug, 2023 2 commits
-
-
rocking authored
* Add normalization splitK to layernorm and groupnorm instances * Fix bug of GetKPerThread() * Refine naming * clang format
-
dependabot[bot] authored
* Bump rocm-docs-core from 0.10.3 to 0.20.0 in /docs/sphinx Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.10.3 to 0.20.0. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.10.3...v0.20.0 ) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by:
dependabot[bot] <support@github.com> * set min version of rocm-docs-core --------- Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by:
Sam Wu <sam.wu2@amd.com>
-