1. 04 Mar, 2022 2 commits
    • rocking5566's avatar
      [Bf16 & int8] [example & ckprofiler] (#100) · 7e9a9d32
      rocking5566 authored
      
      
      * Add int8 of mk_nk_mn to the ckProfiler
      
      * Add example of int8 gemm
      
      * Fix typo, use ushort instead of half_t for bfloat16
      
      * replace ushortXXX_t to bhalfXXX_t
      
      * rename ushort to bhalf_t
      
      * Add bf16 example
      
      * Add bf16 gemm to ckProfiler
      
      * Fix alignment
      
      * Fix typo
      
      * Add unit test for gemm_xdl int8
      
      * Add gemm_xdl fp32 unit test
      
      * Add gemm_xdl bf16 unit test
      
      * fix build
      
      * fix build issue due to merge conflict
      
      * Fix build
      
      * Fix build error
      Co-authored-by: default avatarrocking <chunylai@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      7e9a9d32
    • Jianfeng Yan's avatar
      Refactor threadwise copy using sfcurve (#101) · 0619ebf7
      Jianfeng Yan authored
      
      
      * add space_filling_curve
      
      * cleanup and move space_filling_curve into test
      
      * WIP: start refactoring threadwise_transfer_v1r3
      
      * threadwise_copy works but needs further refactoring
      
      * add some comments
      
      * add SpaceFillingCurve::GetIndices()
      
      * minor changes
      
      * removed GetIndices; refactored GetDstCoordinateResetStep
      
      * add DynamicBuffer::Transfer, but Add is not tested
      
      * rebased agaist develop
      
      * threadwise_copy_v6r1/v6r2/v6r3 using space-filling curve start to work
      
      * minor changes
      
      * refactored threadcopy v3r1, v2; removed old implementations
      
      * clang-format
      
      * cleanup
      
      * fix a typo in v6r3
      
      * format
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      0619ebf7
  2. 03 Mar, 2022 1 commit
    • JD's avatar
      Update test CMakeLists to add new tests automatically and add Jenkins stage for tests (#88) · 992f71e3
      JD authored
      
      
      * add docker file and make default target buildable
      
      * add Jenkinsfile
      
      * remove empty env block
      
      * fix package stage
      
      * remove render group from docker run
      
      * clean up Jenkins file
      
      * add cppcheck as dev dependency
      
      * update cmake file
      
      * Add profiler build stage
      
      * add hip_version config file for reduction operator
      
      * correct jenkins var name
      
      * Build release instead of debug
      
      * Update test CMakeLists.txt
      reorg test dir
      add test stage
      
      * reduce compile threads to prevent compiler crash
      
      * add optional debug stage, update second test
      
      * remove old test target
      
      * fix tests to return proper results and self review
      
      * Fix package name and make test run without args
      
      * change Dockerfile to ues rocm4.3.1
      
      * remove parallelism from build
      
      * Lower paralellism
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      992f71e3
  3. 23 Feb, 2022 1 commit
    • Jianfeng Yan's avatar
      Conv3d new (#94) · 6dfb92bb
      Jianfeng Yan authored
      
      
      * conv3d compiles but has memory error
      
      * conv3d works
      
      * fix performance issue by using __builtin_amdgc_readfirstlane
      
      * change MakeBlock2CTileMap to MakeDefaultBlock2CTileMap; change c_blockid_to* to cblockid_to*
      
      * clang-format
      
      * remove CK_EXPERIMENTAL_PASS_TENSOR_DECRIPTOR_BY_*; moved wrapper into DeviceConv3d
      
      * format
      
      * remove useless marc
      
      * add comment
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      6dfb92bb
  4. 12 Feb, 2022 1 commit
    • ltqin's avatar
      NHWC conv 2d: fwd bfp16/int8, Device level tuning and host API (#73) · 880fbee9
      ltqin authored
      
      
      * add fwd bf16 conv
      
      * change tunning parametor
      
      * add int8 for conv fwd
      
      * remove comments
      
      * change tunning parametor for int8
      
      * change init int8 example
      
      * add test for conv2d fwd
      
      * change device operation file pos because merge develop
      
      * fwd int8 use reference
      
      * test_conv_fwd use reference
      
      * add braket for if statement
      
      * rename fwd example name
      
      * remove StaticBufferOfVectorTypeV2
      
      * tweak example
      Co-authored-by: default avatarltqin <letaoqin@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      880fbee9