- 25 Jun, 2022 1 commit
-
-
Chao Liu authored
-
- 19 Jun, 2022 1 commit
-
-
Chao Liu authored
* ad gelu and fast_gelu * added GeLU and fast GeLU * clean up * add gemm+fastgelu example * add gemm+gelu instances * update profiler * clean up * clean up * adding gemm+bias+activation * clean * adding bias * clean * adding gemm multiple d * debugging * add gemm bias add fastgelu * rename, clean * refactoring; add readme * refactor * refactor * refactor * refactor * refactor * refactor * fix * fix * update example * update example * rename * update example * add ckProfiler * clean * clean * clean * clean * add comment * use type_convert * clean * clean element wise op
-
- 22 Mar, 2022 1 commit
-
-
Qianfeng authored
* Use thread cluster descriptor and explicit M_K 2d descriptor to simply Blockwise Reduction * Change by replacing ReduceDims by NumReduceDims as Device Reduce interface template parameter * Rename the folder name for the pool2d and reduce examples * Update to reduction test scripts * Add Readme for pool2d_fwd and reduce_blockwise examples * Add support for int8_t reduction (ADD/AVG, MIN/MAX/AMAX) * Tiny fix in reduce profiler and tiny update in reduce testing scripts * Tiny fix in testing script profile_reduce_no_index.sh * Tiny fix in testing script profile_reduce_no_index.sh * Add support for bfp16 reduction (using bhalf_t = ushort) * Tiny fix in amd_buffer_addressing.hpp * Tiny change in script/profile_reduce_with_index.sh * Use AccDataType for Beta value and use element_wise::PassThrough * Use type_convert for type converting in host layer reduction * Renaming and refining in Reduction profiler/device layer/examples * Renaming and refining in Reduction profiler/device layer/examples * Renaming all NumReduceDims to NumReduceDim * Fix the leaked type_convert in ThreadwiseTensorSliceTransfer_v2 * Update to testing scripts to add bf16 support * added more static_assert * Remove buggy tunable configurations defined in device_reduce_instance_xxx.hpp * Add static_assert to give compile-time warning for incorrect thread slice-size/vector-size configurations * minor change * Refine and fix (in GetWorkspaceSizeInBytes of MultiBlockPartialReduce) to make int8 completely pass * Tiny renaming in gridwise_2d_reduction_multiblock_partial_reduce.hpp * Tiny fix in script/profile_reduce_no_index.sh * Refine in DeviceReduce layer with regard to using NumInvariantDim/NumReduceDim or InvariantDims/ReduceDims * Generic renaming in host reduction and DeviceReduce layer * Add support for 4-d all dimension reduction in the profiler and add_device_reduce_xxx instances * Use multi-thread and simplification for host Reduction implementation * Add ctest for reduction * Update to clarify the using of data init method in produce_reduce/example_reduce/test_reduce/ * Update to the reduce CTest executables to enable default testing behavior when no command argument * Renaming Co-authored-by:Jianfeng yan <jfyan008@gmail.com>
-
- 09 Mar, 2022 1 commit
-
-
Chao Liu authored
* delete obselete files * move files * build * update cmake * update cmake * fix build * reorg examples * update cmake for example and test
-
- 19 Aug, 2021 1 commit
-
-
Chao Liu authored
* Squashed 'src/composable_kernel/' content from commit f6edda61 git-subtree-dir: src/composable_kernel git-subtree-split: f6edda61 * add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files * Squashed 'src/composable_kernel/' changes from f6edda61..5781adf5 5781adf5 Update develop (#5) (#6) 97e6d514 Merge pull request #4 from ROCmSoftwarePlatform/separate_online_compile 7b1ec41e refactor 49c33aae refactor 54b3e73d rename git-subtree-dir: src/composable_kernel git-subtree-split: 5781adf5 * fix * refactor * remove online compilation from CK * refactor * fix * add ctest * add c-style pointer cast * vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast * fix clang warning suppression * tidy * suppress cppcheck * fix enum issue * revert chagnes to hip build * fix kernel filename * update CK build script * rename * rename * make innner product compatiable on gfx900 * Update src/include/miopen/solver/ck_utility_common.hpp Co-authored-by:
JD <Jehandad.Khan@amd.com> * compiler parameter use stream * use int instead of index_t in kernel wrapper * DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element * refactor * refactor * change cmakelist * change ck common utility * fix Co-authored-by:
JD <Jehandad.Khan@amd.com>
-
- 09 Aug, 2021 1 commit
-
-
Chao Liu authored
-
- 25 Mar, 2021 1 commit
-
-
Chao Liu authored
* support dynamic tensor descriptor * use buffer load OOB feature for padding case * add navi support * add int8x4 inference kernel Co-authored-by:
Chao Liu <chao@ixt-rack-81.local.lan> Co-authored-by:
Jing Zhang <jizhan@amd.com>
-
- 26 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 25 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 24 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 22 Sep, 2019 1 commit
-
-
Chao Liu authored
WIP: explicitly separate offset component into compile-time, block-invariant and per-thread components
-
- 21 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 11 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 10 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 09 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 05 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 06 Aug, 2019 2 commits
- 03 Aug, 2019 1 commit
-
-
Chao Liu authored
-
- 29 Jul, 2019 1 commit
-
-
Chao Liu authored
-
- 20 Jun, 2019 1 commit
-
-
Chao Liu authored
-
- 18 Jun, 2019 1 commit
-
-
Chao Liu authored
-
- 17 Jun, 2019 2 commits
- 13 Jun, 2019 1 commit
-
-
Chao Liu authored
-
- 12 Jun, 2019 1 commit
-
-
Chao Liu authored
-
- 11 Jun, 2019 2 commits
- 07 Jun, 2019 1 commit
-
-
Chao Liu authored
-
- 06 Jun, 2019 1 commit
-
-
Chao Liu authored
-
- 05 Jun, 2019 1 commit
-
-
Chao Liu authored
-
- 04 Jun, 2019 1 commit
-
-
Chao Liu authored
-
- 30 May, 2019 1 commit
-
-
Chao Liu authored
-
- 24 May, 2019 1 commit
-
-
Chao Liu authored
-
- 23 May, 2019 1 commit
-
-
Chao Liu authored
-
- 21 May, 2019 1 commit
-
-
Chao Liu authored
-
- 19 May, 2019 1 commit
-
-
Chao Liu authored
-
- 17 May, 2019 2 commits
- 16 May, 2019 1 commit
-
-
Chao Liu authored
-