- 07 Jun, 2023 3 commits
-
-
Alan Turner authored
-
Alan Turner authored
-
Alan Turner authored
-
- 06 Jun, 2023 1 commit
-
-
Alan Turner authored
-
- 02 Jun, 2023 1 commit
-
-
Paul authored
-
- 01 Jun, 2023 2 commits
-
-
Paul Fultz II authored
* Move functions to cpp file * Move another function to cpp file * Fix semicolon * Move solution to common.hpp * Fix compile errors * Use enum for data types * Remove -Werror * Fix header install * Fix relative path * Fix header path * Install all headers
-
Alan Turner authored
No commit message
-
- 31 May, 2023 2 commits
-
-
Illia Silin authored
-
Paul authored
-
- 30 May, 2023 2 commits
-
-
Adam Osewski authored
* Add license header. * Reduce number of logged output. Add constant initialization. * Add functional tests for grouped_gemm with different kbatch value. * Add debug log informations + remove unused code. * Don't pass kbatch to CalculateKPadded. * Turn on logging in grouped gemm and gemm splitk profiler * Debug: limit number of test cases to run; * Log more information and initialize with constant value. * Turn on DEBUG_LOG * Add more debug log informations. * Limit the number of instances to compile. * Use GridwiseGemmPipeline * Use KBatch to calculate K0 * Multiple DebugLog messages. * Unit tests for multiple KBatch values. * Refactoring * Disable logging * extract out of if statement KBatch update. * Uncomment instances. * Disable DebugLog. * Use Kbatch when calculate KPadded. * Fix CGridDesc padding. * Use available helper functions. * Uncomment code commented for debuggin. * Remove unnecessary debug log messages. * Uncomment previously commented code for debug purposes. * Add KBatch info to profiler output summary log. * Add gtests for gemm splitk using ckProfiler API. * Add more test-cases for different data layout. * Add more test cases for gemm splitk * Remove old test. * Unit tests for MKNK ggemm interface. * Fix and add more unit-tests. * Constepxr everything! * Increase error threshold for fp16 and splitk. Since we're using fp16 atomic add for splitk there's a known precision loss. --------- Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com>
-
Bartłomiej Kocot authored
* Add instances for fp16/int8 Gemm kernels (Navi21) * Extend instances with smaller tiles * Fix SrcVectorTensor for km_kn_mn int8
-
- 25 May, 2023 11 commits
- 24 May, 2023 2 commits
-
-
Alan Turner authored
-
rocking authored
* Expand the base class of pool2d, prepare to share base class with pool3d * Add pool3d device op * Add pool3d f16 example * Refactor the base class. implement generic pooling in the future * clang format * get original index in max pooling * Add outputindex to base class * Fix dimension * Add pooling instance * Use indexType instead * Remove useless header * Extract IndexDataType to template * Extract pooling reference code * clang format * clang format * Fix typo * Add tensor stride * Add missing header * Add index stride and output stride * Refine naming * Add type to base class * Rename file * Use proper size * Fix typo * Refine naming * Modify the argument into vector. * Add max pool profiler * Refine naming * Support f32 pool * Fix typo * Add avg pool2d fwd in profiler * clang format * Rename AccDatatype to ComputeDatatype * Fix init * test pool * Extract variable * Add client example * Check the pooling dim * clang format * Connect argv and arg_parser * Add found check * Remove useless header * Refine naming * Adjust the order of device_pool_fwd
-
- 25 Apr, 2023 1 commit
-
-
Alan Turner authored
-
- 24 Apr, 2023 4 commits
-
-
Adam Osewski authored
* simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * B2C with 3D grid for KSplit * Remove unused code. * Use default B2C (3D grid) in grid gemm v2r4r2. * Device gemm splitk use B2C map. * Device GroupedGemmXdlSplitKCShuffle * Example for GroupedGemm Xdl SplitK * Introduce Device GroupedGemmSplitK * Fix updating kbatch size. * Add instance mk-nk-mn * Enable set kbatch in profiler. * Add GGemmSplitK mk-kn-mn instances * Add more instances & split into multiple files. * minor fix * tuning * clean * disabled failed instances * use pipe v2 * Ignore arg on not supported arch. * fix warning --------- Co-authored-by:
carlushuang <carlus.huang@amd.com> Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com> Co-authored-by:
Jing Zhang <jizhan@amd.com> Co-authored-by:
root <root@ctr-ubbsmc15.amd.com>
-
Alan Turner authored
-
Alan Turner authored
-
rocking authored
* [What] Remove pure conv int8 instance [Why] We will never use pure int8 conv in AI, use int8 quantization instead * Change layout * Share the kernel parameter * Support more type of NHWGC for group conv * Revise client example of conv 2d, use NHWGC layout * Add instance to cmake * Revise layout of group conv quantization instance * Revise layout of external api of group conv quantization * Revise layout of group conv quantization client example * Fix clang format * Add comment to describe meaning of each parameter
-
- 22 Apr, 2023 1 commit
-
-
Illia Silin authored
* simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * use name from tensor layout --------- Co-authored-by:carlushuang <carlus.huang@amd.com>
-
- 17 Apr, 2023 1 commit
-
-
rocking5566 authored
-
- 10 Apr, 2023 1 commit
-
-
rocking5566 authored
* Rename to proper naming * Add example of groupnorm + swish * Extract duplicate code in example * Add groupnorm + swish instances * Ractor instance generation, split into multiple cpp file * Add external api and client example * Refine profiler message * Use ck math version of exp * Refine problem size in example * Add host version of exp
-
- 07 Apr, 2023 1 commit
-
- 30 Mar, 2023 2 commits
-
-
zjing14 authored
Co-authored-by:root <root@ctr-ubbsmc15.amd.com>
-
carlushuang authored
* simplify karg in device/grid split-k op * fix mk_kn_mn instances * add more instances * use name from tensor layout
-
- 29 Mar, 2023 1 commit
-
-
rocking5566 authored
* Rename file. Prepare to support another activation * Add comment for quantization * Extract out_elementop * Add tanh example * Add conv + bias + tanh quantization instance * Add missing parameter * Refine cmake * Add external api and client example * Extract variable in example * Fix the comment --------- Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
- 20 Mar, 2023 1 commit
-
-
ltqin authored
* add workaround 637 * format * change id --------- Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
- 15 Mar, 2023 1 commit
-
-
rocking5566 authored
* Add conv perlayer quantization * Add gemm_dlops quantization * Support int8 for innerproduct * Refine gemm dlops int8 kernel parameter * Support gfx908(MI100) and gfx90a(MI200) * clang-format * Rename example number * Support different layout for d tensor * Add conv dlops perchannel quantization example * Move to example 40 * Extract the common code for different platform (dlops and xdlops) * Move ot subfolder. Prepare to add other op of quantization * Refine the quantization instance library * Add conv dl instances and client example * Remove unnecessary type * Add gemm quantization instance * Add external api and client example * Refine num_bytes * Separete different layout to different cpp * Add more xdl instances * Revert "Remove unnecessary type" This reverts commit 82086918 . * Remove CShuffleDataType in dlops Let acc and CShuffleDataType be the same in xdlops --------- Co-authored-by:
zjing14 <zhangjing14@gmail.com>
-
- 08 Mar, 2023 1 commit
-
-
Adam Osewski authored
* Grouped gemm + Gelu instances. * Device Instance Factory for GroupedGemm+Gelu * Client example * Rangify fill helper functions. * Fix name clash. * Profiler for grouped_gemm+gelu * No need to use full namespace name. * Add check for MRaw divisible by vector load. * Ugly fix for big errors. * Add grouped_gemm+gelu to profiler CMakelists. * Store in argument additional info. * Information about Mraw, Nraw, Kraw values. * Use FastGelu instead of Gelu. * Change client ex to use FastGelu * Remove relaxed error precision. * Remove duplicate output elementwise-op --------- Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com>
-
- 15 Feb, 2023 1 commit
-
-
rocking5566 authored
* Sync the order of type string with template parameter * Add more instances * Check the vector size and remove redundant var * Extract var to static, prepare to separate sweep once kernel * Separate sweeponce flow and optimize the flow * 1. Rename AccDatatype in normalization to computeData 2. Rename AccElementwiseOperation to YElementwiseOperation in normalization * Remove useless code * Update naive variance kernel * Refine string * Fix typo * Support naive variance for device_normalization * Check the blocksize * Share the VGPR of x and y * Share the VGPR of gamma and beta * Add more instances * Support fp16 sqrt for experiment * Add CHANGELOG * Fix typo * clang-format
-