1. 21 Oct, 2021 1 commit
  2. 19 Oct, 2021 2 commits
    • Chao Liu's avatar
      bug fix (#39) · c3018794
      Chao Liu authored
      c3018794
    • ltqin's avatar
      add nchw atomic , nhwc and nhwc atomic method for backward weight (#30) · fd49ff80
      ltqin authored
      
      
      * add add new algorithm from v4r4r2
      
      * program once issue
      
      * add split k functiion
      
      * redefine code
      
      * add a matrix unmerge
      
      * add b matrix unmerge k0
      
      * trans a and b to gridegemm
      
      * nhwc init
      
      * no hacks and vector load
      
      * add hacks
      
      * modify some parameter
      
      * fix tuning prometer for fp32
      
      * fix tuning prometer for fp16
      
      * start change gridwise k split
      
      * init ok
      
      * revome a b matrix k0mk1 desc in grid
      
      * carewrite lculate gridsize
      
      * add kbatch to CalculateBottomIndex
      
      * remove some unused funtion
      
      * add clear data function before call kernel
      
      * out hacks
      
      * in hacks
      
      * rename device convolution file and function name
      
      * modify kBatch value
      
      * fix some tuning code
      
      * start from v4r4 nhwc
      
      * nhwc atomic is able to run
      
      * just for fp32
      
      * enable nchw atomic
      
      * tweak
      
      * tweak
      
      * re-arrange gridwise gemm hot loop for wrw
      
      * add wrw v4r5
      
      * v4r4r5 fp16
      
      * v4r4r4 fp16
      
      * v4r4r2 fp16
      
      * V4R4R4XDLNHWC fp16
      
      * V4R4R2XDLATOMICNCHW fp16
      
      * adjust for fp16
      
      * input gridsize
      
      * change kbatch to gridsize
      
      * testing wrw
      
      * clean up
      
      * k_batch to gridsize
      
      * fix bug
      
      * wrw v4r4r4 kbatch change to gride size
      
      * wrw v4r4r2 kbatch change to gride size
      
      * after merge , change gridwise gemm v2r4
      
      * change MakeCBlockClusterAdaptor
      
      * other method use new gridwise gemm
      
      * clean up
      
      * chapad method nge to make_right_pad_transform
      
      * kbatch out from transform function
      
      * clean up and fix bug
      
      * fix bug
      
      * using function type reduce template parameters
      
      * using auto replace define fuction type
      
      * clean up
      Co-authored-by: default avatarltqin <letaoqin@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      fd49ff80
  3. 06 Oct, 2021 3 commits
    • Qianfeng's avatar
      [MIOpen Downstream] Fix Reduction Kernel (#34) · b2dc55f8
      Qianfeng authored
      
      
      * Tiny fix in using data type template parameters in blockwise and direct_threadwise kernel
      
      * Fix with regard to implementing GetZeroVal() in both kernel and host
      
      * Avoid convert to compType from dstDataType before writting the output value
      
      * Add half_t support to NumericLimits and make constexpr GetZeroVal() of binary operator
      
      * Add CONSTANT decorator for descriptor read buffer
      
      * Use get_thread_local_1d_id() for thread local Id
      
      * Rename GetZeroVal() to GetReductionZeroVal() in the kernels
      
      * Remove constexpr from initialized zeroVal and tiny fix in reduction_operator.hpp
      
      * Occasional tiny simplification and update in the kernel files
      
      * Update to re-order tensor dimensions on the host, split second_call kernel wrapper files and simplify reduce_all kernel wrappers
      
      * Update to remove OpenCL tidy checking failures
      
      * Update for better readability
      
      * Remove unused codes and not-needed template parameters in the kernel wrappers
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      b2dc55f8
    • Chao Liu's avatar
      Tweak GEMM kernel (#38) · b3e8d57d
      Chao Liu authored
      * add parameters
      
      * tweak gemm
      
      * tweak
      
      * update conv
      
      * update script
      
      * adding bwd 1x1
      
      * update script
      
      * adding 1x1 bwd
      
      * debugging bwd 1x1 failure
      
      * update script
      
      * update script
      
      * test
      
      * test v100
      
      * clean up
      b3e8d57d
    • zjing14's avatar
      Add VectorType support into StaticBuffer (#27) · 846f462b
      zjing14 authored
      
      
      * init StaticBufferV2
      
      * clean
      
      * adopt old output stage for staticBufferV2
      
      * clean
      
      * remove hack
      
      * clean
      
      * clean
      
      * clean code
      
      * move c_buffer alloc into blockwise gemm
      
      * add adaptors for m/n_thread_data_on_grid
      
      * adjust blockwise_gemm_xdlops
      
      * reorder ops in GEMM hot loop
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      846f462b
  4. 21 Sep, 2021 3 commits
  5. 05 Sep, 2021 2 commits
  6. 31 Aug, 2021 1 commit
    • ltqin's avatar
      Backward weight v4r4r2 with xdlops (#18) · 627d8ef3
      ltqin authored
      
      
      * start
      
      * modify transformat
      
      * modify device convolutiion
      
      * modify host
      
      * added host conv bwd and wrw
      
      * remove bwd, seperate wrw
      
      * clean
      
      * hacall k to zero
      
      * out log
      
      * fixed
      
      * fixed
      
      * change to (out in wei)
      
      * input hack
      
      * hack to out
      
      * format
      
      * fix by comments
      
      * change wei hacks(wei transform has not merge)
      
      * fix program once issue
      
      * fix review comment
      
      * fix vector load issue
      
      * tweak
      Co-authored-by: default avatarltqin <letaoqin@amd.com>
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      627d8ef3
  7. 27 Aug, 2021 2 commits
    • Chao Liu's avatar
      Misc fixes (#24) · 10bb8110
      Chao Liu authored
      * use cast_pointer_to_generic_address_space() in v6r1 kernel wrapper, DynamcBuffer and buffer_load take customized invalid-element-value, add buffer_load/store for fp64
      
      * use remove_cvref_t
      10bb8110
    • Qianfeng's avatar
      [SWDEV-281541][MSRCHA-100] Implementation of Dynamic Generic Reduction (#1108) · 9e80cdce
      Qianfeng authored
      
      
      * add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files
      
      * make inner product compatible on gfx900
      
      * Update src/include/miopen/solver/ck_utility_common.hpp
      
      * compiler parameter use stream
      
      * use int instead of index_t in kernel wrapper
      
      * DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element
      
      * Add dynamic generic reduction kernel layer (kernel wrappers, kernel implementations and utilities)
      
      * Some updates to dynamic composable kernel facility for the need of dynamic generic reduction
      
      * Update to generic reduction C++ host interface layer to support dynamic generic reduction
      
      * Update to remove tidy complaints in host interface layer
      
      * Change the unary operator form from void op(T &x) to T op(T x)
      
      * Update to pass single workspace pointer for all kernels (fix for OpenCL backend)
      
      * Use cppcheck-suppress to prevent some strange warnings
      
      * Re-use operator [] and () for DynamicBuffer and update to depending codes
      
      * Remove useless codes in first call threadwise/warpwise/blockwise kernel wrappers
      
      * [performance] Remove un-needed local buffer initialization
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: default avatarJD <Jehandad.Khan@amd.com>
      9e80cdce
  8. 25 Aug, 2021 1 commit
  9. 23 Aug, 2021 2 commits
  10. 19 Aug, 2021 3 commits
    • Chao Liu's avatar
      Composable kernel init integration v3 (#1097) · 6fe3627a
      Chao Liu authored
      * Squashed 'src/composable_kernel/' content from commit f6edda61
      
      git-subtree-dir: src/composable_kernel
      git-subtree-split: f6edda61
      
      * add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files
      
      * Squashed 'src/composable_kernel/' changes from f6edda61..5781adf5
      
      5781adf5 Update develop (#5) (#6)
      97e6d514 Merge pull request #4 from ROCmSoftwarePlatform/separate_online_compile
      7b1ec41e refactor
      49c33aae refactor
      54b3e73d rename
      
      git-subtree-dir: src/composable_kernel
      git-subtree-split: 5781adf5
      
      
      
      * fix
      
      * refactor
      
      * remove online compilation from CK
      
      * refactor
      
      * fix
      
      * add ctest
      
      * add c-style pointer cast
      
      * vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast
      
      * fix clang warning suppression
      
      * tidy
      
      * suppress cppcheck
      
      * fix enum issue
      
      * revert chagnes to hip build
      
      * fix kernel filename
      
      * update CK build script
      
      * rename
      
      * rename
      
      * make innner product compatiable on gfx900
      
      * Update src/include/miopen/solver/ck_utility_common.hpp
      Co-authored-by: default avatarJD <Jehandad.Khan@amd.com>
      
      * compiler parameter use stream
      
      * use int instead of index_t in kernel wrapper
      
      * DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element
      
      * refactor
      
      * refactor
      
      * change cmakelist
      
      * change ck common utility
      
      * fix
      Co-authored-by: default avatarJD <Jehandad.Khan@amd.com>
      6fe3627a
    • zjing14's avatar
      refactor dynamic xdlops iGemm (#13) · a2ad6d35
      zjing14 authored
      * xdlops refactor
      
      * fixed commnt
      
      * clean xdlops_gemm
      
      * add make c into xldops-gemm
      
      * change mfma_info
      
      * refactor xdlops, hide c desc
      
      * clean
      
      * clean
      
      * clean
      
      * apply hacks changes to v4r4r4_nhwc
      
      * rename hacks and use single stage adapter
      
      * enable fp16 mfma
      a2ad6d35
    • zjing14's avatar
      Added host_conv_wrw for verification (#15) · ba6f79a7
      zjing14 authored
      * added host conv wrw
      ba6f79a7
  11. 18 Aug, 2021 1 commit
  12. 16 Aug, 2021 4 commits
  13. 13 Aug, 2021 3 commits
  14. 11 Aug, 2021 2 commits
  15. 10 Aug, 2021 8 commits
  16. 09 Aug, 2021 2 commits