- 11 Oct, 2023 2 commits
-
-
Adam Osewski authored
* Introduce LocalBlockToCTileMap. * Change the signature of CalculateBottomIndex() function which now does not accept any argument. The B2C map which is already passed as an argument to the kernel Run function is calculating block's local id already outside at kernel entry point __global__ function. The LocalB2C map stores as members local block ID. * Use LocalBlockToCTile map in device ops. * First draft of tile loop work distribution. * Fix typo. * Simplify kernel arguments. Calculate descriptors & B2C maps on the device. * Use looping kernel. * Fix B2C constructor. * Fix Navi21 errors. * Calculate tile start/end in device kernel. * Change Run API to accept user provided workspace buffer. * Add new line at EOF. * Move Gemm KernelArguments to device op interface. * Remove unused code. * Update API. * Launch grid size which is min of occupancy vs tile count * Get back to use constant memory for gemm descriptors. * Remove unused code. * Add default virtual method implementation. * Update comments to conform with doxygen style. * Fix doc style and unused parameters. * Add thread cluster lengths to kernel name. * Remove old splitk impl and replace it with tile looping one. * Modify instances. * set KPerBlock to 64 * maximize wherever possible vector load size. * Fix instances cluster lengths. * Change comment style. * Use 128b store where possible in instances. * Update test cases, since KPerBlock has doubled. * Update output stream operator for Sequence. * Add pipeline version to GroupedGEMM device op type string. * Fix pipeline version type logging. * Fix input tensors type after merge. * Fix compiler error. * Fix output stream operator for Pipeline version. * Store using 128b. * Set of instances with kpb 32/64 * Limit number of instances * Remove commented out instances. * Fix function name. * Limit the number of instances. Add pipline version to the regular instances * Change thr cluster layout for reading B tensor. * disabled failed instances --------- Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com> Co-authored-by:
Jing Zhang <jizha@amd.com>
- 31 May, 2023 1 commit
-
-
Illia Silin authored
-
- 07 Jul, 2022 1 commit
-
-
Chao Liu authored
* adding contraction * add contraction example * update examle * update example * format * update readme * clean header * clean header * contraction with multiple D * rename * fix naming issue; add instances for contraction+bilinear * change assumed virtual layout of contraction; add client example * update example * update * contraction+scale * use type_convert * rename
-
- 25 Jun, 2022 1 commit
-
-
Chao Liu authored
-
- 19 Jun, 2022 1 commit
-
-
Chao Liu authored
* ad gelu and fast_gelu * added GeLU and fast GeLU * clean up * add gemm+fastgelu example * add gemm+gelu instances * update profiler * clean up * clean up * adding gemm+bias+activation * clean * adding bias * clean * adding gemm multiple d * debugging * add gemm bias add fastgelu * rename, clean * refactoring; add readme * refactor * refactor * refactor * refactor * refactor * refactor * fix * fix * update example * update example * rename * update example * add ckProfiler * clean * clean * clean * clean * add comment * use type_convert * clean * clean element wise op
-
- 22 Mar, 2022 1 commit
-
-
Qianfeng authored
* Use thread cluster descriptor and explicit M_K 2d descriptor to simply Blockwise Reduction * Change by replacing ReduceDims by NumReduceDims as Device Reduce interface template parameter * Rename the folder name for the pool2d and reduce examples * Update to reduction test scripts * Add Readme for pool2d_fwd and reduce_blockwise examples * Add support for int8_t reduction (ADD/AVG, MIN/MAX/AMAX) * Tiny fix in reduce profiler and tiny update in reduce testing scripts * Tiny fix in testing script profile_reduce_no_index.sh * Tiny fix in testing script profile_reduce_no_index.sh * Add support for bfp16 reduction (using bhalf_t = ushort) * Tiny fix in amd_buffer_addressing.hpp * Tiny change in script/profile_reduce_with_index.sh * Use AccDataType for Beta value and use element_wise::PassThrough * Use type_convert for type converting in host layer reduction * Renaming and refining in Reduction profiler/device layer/examples * Renaming and refining in Reduction profiler/device layer/examples * Renaming all NumReduceDims to NumReduceDim * Fix the leaked type_convert in ThreadwiseTensorSliceTransfer_v2 * Update to testing scripts to add bf16 support * added more static_assert * Remove buggy tunable configurations defined in device_reduce_instance_xxx.hpp * Add static_assert to give compile-time warning for incorrect thread slice-size/vector-size configurations * minor change * Refine and fix (in GetWorkspaceSizeInBytes of MultiBlockPartialReduce) to make int8 completely pass * Tiny renaming in gridwise_2d_reduction_multiblock_partial_reduce.hpp * Tiny fix in script/profile_reduce_no_index.sh * Refine in DeviceReduce layer with regard to using NumInvariantDim/NumReduceDim or InvariantDims/ReduceDims * Generic renaming in host reduction and DeviceReduce layer * Add support for 4-d all dimension reduction in the profiler and add_device_reduce_xxx instances * Use multi-thread and simplification for host Reduction implementation * Add ctest for reduction * Update to clarify the using of data init method in produce_reduce/example_reduce/test_reduce/ * Update to the reduce CTest executables to enable default testing behavior when no command argument * Renaming Co-authored-by:Jianfeng yan <jfyan008@gmail.com>
-
- 09 Mar, 2022 1 commit
-
-
Chao Liu authored
* delete obselete files * move files * build * update cmake * update cmake * fix build * reorg examples * update cmake for example and test
-
- 19 Aug, 2021 1 commit
-
-
Chao Liu authored
* Squashed 'src/composable_kernel/' content from commit f6edda61 git-subtree-dir: src/composable_kernel git-subtree-split: f6edda61 * add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files * Squashed 'src/composable_kernel/' changes from f6edda61..5781adf5 5781adf5 Update develop (#5) (#6) 97e6d514 Merge pull request #4 from ROCmSoftwarePlatform/separate_online_compile 7b1ec41e refactor 49c33aae refactor 54b3e73d rename git-subtree-dir: src/composable_kernel git-subtree-split: 5781adf5 * fix * refactor * remove online compilation from CK * refactor * fix * add ctest * add c-style pointer cast * vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast * fix clang warning suppression * tidy * suppress cppcheck * fix enum issue * revert chagnes to hip build * fix kernel filename * update CK build script * rename * rename * make innner product compatiable on gfx900 * Update src/include/miopen/solver/ck_utility_common.hpp Co-authored-by:
JD <Jehandad.Khan@amd.com> * compiler parameter use stream * use int instead of index_t in kernel wrapper * DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element * refactor * refactor * change cmakelist * change ck common utility * fix Co-authored-by:
JD <Jehandad.Khan@amd.com>
-
- 09 Aug, 2021 1 commit
-
-
Chao Liu authored
-
- 25 Mar, 2021 1 commit
-
-
Chao Liu authored
* support dynamic tensor descriptor * use buffer load OOB feature for padding case * add navi support * add int8x4 inference kernel Co-authored-by:
Chao Liu <chao@ixt-rack-81.local.lan> Co-authored-by:
Jing Zhang <jizhan@amd.com>
-
- 26 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 25 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 24 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 22 Sep, 2019 1 commit
-
-
Chao Liu authored
WIP: explicitly separate offset component into compile-time, block-invariant and per-thread components
-
- 21 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 11 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 10 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 09 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 05 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 06 Aug, 2019 2 commits
- 03 Aug, 2019 1 commit
-
-
Chao Liu authored
-
- 29 Jul, 2019 1 commit
-
-
Chao Liu authored
-
- 20 Jun, 2019 1 commit
-
-
Chao Liu authored
-
- 18 Jun, 2019 1 commit
-
-
Chao Liu authored
-
- 17 Jun, 2019 2 commits
- 13 Jun, 2019 1 commit
-
-
Chao Liu authored
-
- 12 Jun, 2019 1 commit
-
-
Chao Liu authored
-
- 11 Jun, 2019 2 commits
- 07 Jun, 2019 1 commit
-
-
Chao Liu authored
-
- 06 Jun, 2019 1 commit
-
-
Chao Liu authored
-
- 05 Jun, 2019 1 commit
-
-
Chao Liu authored
-
- 04 Jun, 2019 1 commit
-
-
Chao Liu authored
-
- 30 May, 2019 1 commit
-
-
Chao Liu authored
-
- 24 May, 2019 1 commit
-
-
Chao Liu authored
-
- 23 May, 2019 1 commit
-
-
Chao Liu authored
-
- 21 May, 2019 1 commit
-
-
Chao Liu authored
-