1. 23 Feb, 2022 1 commit
    • Jianfeng Yan's avatar
      Conv3d new (#94) · 6dfb92bb
      Jianfeng Yan authored
      
      
      * conv3d compiles but has memory error
      
      * conv3d works
      
      * fix performance issue by using __builtin_amdgc_readfirstlane
      
      * change MakeBlock2CTileMap to MakeDefaultBlock2CTileMap; change c_blockid_to* to cblockid_to*
      
      * clang-format
      
      * remove CK_EXPERIMENTAL_PASS_TENSOR_DECRIPTOR_BY_*; moved wrapper into DeviceConv3d
      
      * format
      
      * remove useless marc
      
      * add comment
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      6dfb92bb
  2. 14 Nov, 2021 1 commit
    • Chao Liu's avatar
      ckProfiler and device-level XDL GEMM operator (#48) · e823d518
      Chao Liu authored
      * add DeviceGemmXdl
      
      * update script
      
      * fix naming issue
      
      * fix comment
      
      * output HostTensorDescriptor
      
      * rename
      
      * padded GEMM for fwd v4r4r4 nhwc
      
      * refactor
      
      * refactor
      
      * refactor
      
      * adding ckProfiler
      
      * adding ckProfiler
      
      * refactor
      
      * fix tuning parameter bug
      
      * add more gemm instances
      
      * add more fp16 GEMM instances
      
      * fix profiler driver
      
      * fix bug in tuning parameter
      
      * add fp32 gemm instances
      
      * small fix
      
      * refactor
      
      * rename
      
      * refactor gemm profiler; adding DeviceConv and conv profiler
      
      * refactor
      
      * fix
      
      * add conv profiler
      
      * refactor
      
      * adding more GEMM and Conv instance
      
      * Create README.md
      
      Add build instruction for ckProfiler
      
      * Create README.md
      
      Add Readme for gemm_xdl example
      
      * Update README.md
      
      Remove build instruction from top most folder
      
      * Update README.md
      
      * clean up
      e823d518