"...git@developer.sourcefind.cn:renzhc/diffusers_dcu.git" did not exist on "dc8da1d44922c2e2a6731c70a4fa7789cd983bb2"
- 12 Sep, 2023 2 commits
-
-
Rostyslav Geyyer authored
-
Bartlomiej Wroblewski authored
-
- 11 Sep, 2023 4 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Sam Wu authored
Co-authored-by:samjwu <samjwu@users.noreply.github.com>
-
- 08 Sep, 2023 3 commits
-
-
Rostyslav Geyyer authored
-
Bartlomiej Wroblewski authored
-
Haocong WANG authored
* fix wmma gemm int8; add grouped conv int8 example * Add int8 gemm-bilinear instances * compile sanity check unknown * Sanity pass + clang-format * add int8 conv profiler instances * solve merge conflict --------- Co-authored-by:
zjing14 <zhangjing14@gmail.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
- 07 Sep, 2023 2 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
- 06 Sep, 2023 4 commits
-
-
Rostyslav Geyyer authored
-
Bartlomiej Wroblewski authored
* Redesign the DPP8 GEMM kernel to use warp-wise component * Review: Improve error messages * Review: Remove unnecessary empty lines * Review: Fix M, N per thread names * Review: Rename mfma_input_type to dpp_input_type * Review: Fix tensor adaptor; remove unnecessary element * Review: Remove calls to dpp_gemm's MakeCDescriptor * Review: Add blockwise doc, change function names to include dimension names * Review: Remove duplicated code; Move Block2CtileMap alias to the top of the file * Review: Add __restrict__ keywords * Review: Use MatrixPadder for padding A, B, C matrices * Review: Remove hardcoded datatypes * Review: Change names from FloatX to XDataType * Review: Introduce AK0 and BK0 instead of a single K0 * Review: Remove construction of dpp_datatypes object * Review: Rename DppInstrRunner to DppLanegroupGemm
-
zjing14 authored
* added kpad support into v2r3 * add generic instances * fixed comments * fixed mnk padding * Update device_batched_gemm_xdl.hpp --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
zjing14 authored
* add generic instances; fixed initi with fp8 * fixed comment --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
- 05 Sep, 2023 6 commits
-
-
Illia Silin authored
-
Bartlomiej Wroblewski authored
Add contribution guidelines to the documentation
-
Illia Silin authored
-
Bartłomiej Kocot authored
* Add image to column kernel * Add instances, tests, profiler, example * Add client example * Several fixes of image to column * Fix variable name in device_image_to_column_impl * Several fixes of image to column profiler * Fix num_btype calculation * Make new mesaurements for correct bytes calculation
-
Bartłomiej Kocot authored
-
Bartłomiej Kocot authored
* Fix K padding calculation for grouped conv data * Restore previous padd for 1x1 specialization
-
- 04 Sep, 2023 1 commit
-
-
Lauren Wrubleski authored
-
- 01 Sep, 2023 1 commit
-
-
Rostyslav Geyyer authored
-
- 31 Aug, 2023 7 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
zjing14 authored
* move all arguments into device * add b2c_tile_map * add examples * add SetDeviceKernelArgs * dedicated fixed_nk solution * init client api * add grouped_gemm_bias example * add a instance * add instances * formatting * fixed cmake * Update EnableCompilerWarnings.cmake * Update cmake-ck-dev.sh * clean; fixed comments * fixed comment * add instances for fp32 output * add instances for fp32 output * add fp32 out client example * fixed CI * init commit for kbatch * add splitk gridwise * format * fixed * clean deviceop * clean code * finish splitk * fixed instances * change m_loops to tile_loops * add setkbatch * clean code * add splitK+bias * add instances * opt mk_nk instances * clean examples * fixed CI * remove zero * finished non-zero * clean * clean code * optimized global_barrier * fixed ci * fixed CI * removed AddBias * format * fixed CI * fixed CI * move 20_grouped_gemm to 21_grouped_gemm --------- Co-authored-by:Jing Zhang <jizha@amd.com>
-
rocking authored
* Add maxpool instances * Rename index pool to max pool. * Add maxpool bwd bf16 instances * Add avg pool bwd instances * Rename avgpool and maxpool to avg_pool3d and max_pool * Add bf16 pool fwd instances * Add max pool bwd to ckProfiler * Add avg pool3d bwd to ckProfiler * Add avg pool bwd test * Fix bug of reference pool fwd (dilation) * Fix bug of max pool bwd (dilation and initZero) * Support bf16 compute data type * Force compute type be f32. Because atomicAdd only support f32 * Add max pool bwd test * Rename folder * Rename pool * Add max pool bwd client example * Add avg pool bwd client example * Add missing workspace * clang format * Rename macro * remove useless header * remove useless layout
-
Illia Silin authored
-
- 30 Aug, 2023 6 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Bartłomiej Kocot authored
-
- 29 Aug, 2023 4 commits
-
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-
Rostyslav Geyyer authored
-