1. 25 Feb, 2022 1 commit
    • Jianfeng Yan's avatar
      Space filling curve (#96) · bdedf64b
      Jianfeng Yan authored
      * add space_filling_curve
      
      * cleanup and move space_filling_curve into test
      
      * add functions for backward and forward step; hard coded results in unit test
      
      * minor changes
      bdedf64b
  2. 23 Feb, 2022 2 commits
    • Adam Osewski's avatar
      Unify Convolution FWD XDL 1D/2D implementation. (#93) · 756a7617
      Adam Osewski authored
      
      
      * Convolution ND
      
      * Code unification across dimensions for generating tensor descriptors.
      * Example
      * Instances
      
      * Move convnd f32 instance file to comply with repo structure.
      
      * Conv 1D tensor layouts.
      
      * Formatting and use ReferenceConv
      
      * Reference ConvFwd supporting 1D and 2D convolution.
      
      * Debug printing TensorLayout name.
      
      * Conv fwd 1D instance f32
      
      * Refactor conv ND example.
      
      Needed to support various conv dimensio.
      
      Needed to support various conv dimensions
      
      * Rename conv nd example director to prevent conflicts.
      
      * Refactor some common utility to single file.
      
      Plus some tests.
      
      * Refactor GetHostTensorDescriptor + UT.
      
      * Add 1D test case.
      
      * Test reference convolution 1d/2d
      
      * Remove some leftovers.
      
      * Fix convolution example error for 1D
      
      * Refactor test check errors utility function.
      
      * Test Conv2D Fwd XDL
      
      * More UT for 1D case.
      
      * Parameterize input & weight initializers.
      
      * Rename example to prevent conflicts.
      
      * Split convnd instance into separate files for 1d/2d
      
      * Address review comments.
      
      * Fix data type for flops/gbytes calculations.
      
      * Assign example number 11.
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      756a7617
    • Jianfeng Yan's avatar
      Conv3d new (#94) · 6dfb92bb
      Jianfeng Yan authored
      
      
      * conv3d compiles but has memory error
      
      * conv3d works
      
      * fix performance issue by using __builtin_amdgc_readfirstlane
      
      * change MakeBlock2CTileMap to MakeDefaultBlock2CTileMap; change c_blockid_to* to cblockid_to*
      
      * clang-format
      
      * remove CK_EXPERIMENTAL_PASS_TENSOR_DECRIPTOR_BY_*; moved wrapper into DeviceConv3d
      
      * format
      
      * remove useless marc
      
      * add comment
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      6dfb92bb
  3. 12 Feb, 2022 1 commit
    • ltqin's avatar
      NHWC conv 2d: fwd bfp16/int8, Device level tuning and host API (#73) · 880fbee9
      ltqin authored
      
      
      * add fwd bf16 conv
      
      * change tunning parametor
      
      * add int8 for conv fwd
      
      * remove comments
      
      * change tunning parametor for int8
      
      * change init int8 example
      
      * add test for conv2d fwd
      
      * change device operation file pos because merge develop
      
      * fwd int8 use reference
      
      * test_conv_fwd use reference
      
      * add braket for if statement
      
      * rename fwd example name
      
      * remove StaticBufferOfVectorTypeV2
      
      * tweak example
      Co-authored-by: default avatarltqin <letaoqin@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      880fbee9
  4. 03 Feb, 2022 1 commit
    • ltqin's avatar
      add split-k GEMM (#59) · 4be7f019
      ltqin authored
      
      
      * add DeviceGemmSplitKXdl
      
      * add file device_gemm_splitk_xdl.hpp
      
      * set c matrix zero
      
      * using atomic
      
      * add all tuning parameter to f32 mkkn
      
      * grid size change to 720
      
      * add tunning parameter for NT
      
      * add tunning parameter for TN
      
      * add tunning parameter for TT
      
      * add m=96tunning parameter
      
      * add lost config
      
      * add element wise operation
      
      * fixed MPerBlock=96
      
      * remove marco for slpitk swtich
      
      * add test
      
      * add new line at the end of device_gemm_xdl_instance.hpp
      
      * remove step hack
      
      * seperate split-k instance files
      
      * add tunning parameters
      
      * change disired grid size to parameters
      
      * remove slice length
      
      * add desiredgridsize parameter to ckProfiler
      
      * add losting file device_gemm_xdl_splitk_instance.hpp
      
      * change desired gride size to kbatch
      
      * format
      
      * format
      
      * clean up
      
      * add selection of device_instances
      
      * clean code
      
      * fix build issue
      Co-authored-by: default avatarltqin <letaoqin@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      4be7f019
  5. 30 Nov, 2021 1 commit