"vscode:/vscode.git/clone" did not exist on "83b340e371a0151c9fdefac9f07e0f89ba5e6c37"
- 18 Mar, 2024 1 commit
-
-
Illia Silin authored
* update the changelog for ROCm6.1 release * modifty the order of items in changelog, capitalize GEMMs
-
- 31 Jan, 2024 1 commit
-
-
Bartłomiej Kocot authored
* Add blockwise gemm to ck wrapper * Add blockwise gemm traits * Disable test_gemm for non xdl devices * Fixes * Add c layout descritpions
-
- 24 Jan, 2024 1 commit
-
-
Bartłomiej Kocot authored
* Fix possible linting errors in changelog * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md
-
- 19 Jan, 2024 1 commit
-
-
Bartłomiej Kocot authored
* Add optimized copy to ck wrapper * Example optimizations * Fixes * Move img2col test to client example * Refactor example * Fix docs * Fixes * Fix * Fixes * Fixes * Fixes * Fixes * Fixes --------- Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
- 03 Jan, 2024 1 commit
-
-
Bartłomiej Kocot authored
* Add tensor partition and generic copy for ck wrapper * Update changelog * Stylistic fixes * Change shape/strides logic to descriptor transforms * Fixes * Fix client example * Fix comments
-
- 15 Dec, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Add tensor structure to wrapper * update changelog * Fix names * Comment fixes
-
- 06 Dec, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Introduce wrapper library * Update cmake files * Revert "Update cmake files" This reverts commit c27f88b56590c11a88e26d5d0df7aca51a08133d. * Fix comments
-
- 19 Oct, 2023 1 commit
-
-
Bartlomiej Wroblewski authored
-
- 17 Oct, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Add grouped conv bwd weight wmma * Update README, changelog, profiler * Minor fixes * Fix grouped conv bwd wei dl kernel * Minor fixes * Minor stylistic fixes
-
- 28 Sep, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Add grouped convolution changes to changelog * Fix 0.2.0 ck release rocm version * Suggested CHANGELOG.md edits * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md --------- Co-authored-by:Lisa <lisajdelaney@gmail.com>
-
- 27 Sep, 2023 1 commit
-
-
Bartłomiej Kocot authored
* Add column to image kernel * Minor fixes for dtypes and client examples * Disable tests for disabled dtypes * Disable add instances functions for disabled data types * Minor stylistic fixes * Revert "Disable add instances functions for disabled data types" This reverts commit 728b8695. * Instances reduction * Add comments in device_column_to_image_impl * Update changelog and Copyrights * Improve changelog
-
- 14 Aug, 2023 1 commit
-
-
rocking authored
* Do not hardcode stride * devicePool2DFwd Inherit devicePool3DFwd * Move instance declaration out of common * Add dilation * use the pool3d rank, because pool2d inherit pooo3d * calculate Do Ho Wo for the dilation * Fix header name * Modify ckProfiler * Remove pool2d instance * Remove pool2d in profiler * Remove pool2d and add dilation * In to client example, this commit revise following: 1. Add dilation. 2. Use pool3d to implement pool2d * Refine naming and IsSupportedArgument() * Add dilation to maxpool bwd example * clang format * 1. Remove useless header 2. Fix copyright 3. Refine naming * Add layout parameter to pool fwd * clang format * Fix merge error * Fix compile error * Remove layout parameter in derived class * Refine changlog * Fix compile error * Fix compiler error * Add layout to external api and profiler
-
- 26 Jul, 2023 1 commit
-
-
Illia Silin authored
-
- 19 Jun, 2023 1 commit
-
-
rocking authored
* Add maxpool f32 kernel and example * Revise copyright * Add device pool bwd device op * Support f16 and bf16 * Add compute datatype for reference code. Prevent error in bf16 * Fix type error * Remove layout * Fix bf16 error * Add f16 and bf16 example * Add more operations * Implement IsSupportedArgument * Add changelog * Add comment * Add comment * Remove useless header * Move initialize of workspace to the run * Move set din zero to the device operator * Save din_length_raw * Remove useless header * Calculate gridsize according to the number of CU * Calculate gridSize according to the number of CU. Remove useless header * Add put example * Remove useless header * Fix CI fail
-
- 09 Mar, 2023 1 commit
-
-
Illia Silin authored
* enable building on Nav31 * fix syntax * replace GPU_TARGETS with offload-arch * add gfx1102 rachitecture * fix typo * update changelog
-
- 22 Feb, 2023 1 commit
-
-
Rostyslav Geyyer authored
* Add DeviceOp and examples * Format DeviceOp template arguments * Remove bf16 example * Format * Format * Update MakeABCGridDescriptor_A_K0_M_K1_B_K0_N_K1_C_M_N * Refactor argument preparation * Update conv_bwd_weight_dl to grouped_conv_bwd_weight_dl * Rename device op file * Update include directive in the example file * Update descriptor preparation for grouped op * Update the argument * Update batch handling * Add gridwise gemm supporting batched input * Update blockwise indexing, working version * Update copyright year * Update check if argument is supported * Refactor and make consistent with xdl examples * Update check if argument is supported * Add changelog entry * Added comments on Dl op split_k>1 support --------- Co-authored-by:
Rosty Geyyer <rosty.geyyer@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com>
-
- 15 Feb, 2023 1 commit
-
-
rocking5566 authored
* Sync the order of type string with template parameter * Add more instances * Check the vector size and remove redundant var * Extract var to static, prepare to separate sweep once kernel * Separate sweeponce flow and optimize the flow * 1. Rename AccDatatype in normalization to computeData 2. Rename AccElementwiseOperation to YElementwiseOperation in normalization * Remove useless code * Update naive variance kernel * Refine string * Fix typo * Support naive variance for device_normalization * Check the blocksize * Share the VGPR of x and y * Share the VGPR of gamma and beta * Add more instances * Support fp16 sqrt for experiment * Add CHANGELOG * Fix typo * clang-format
-
- 08 Feb, 2023 1 commit
-
-
Illia Silin authored
* adding the first draft of changelog * second draft of changelog
-