• ltqin's avatar
    add nchw atomic , nhwc and nhwc atomic method for backward weight (#30) · fd49ff80
    ltqin authored
    
    
    * add add new algorithm from v4r4r2
    
    * program once issue
    
    * add split k functiion
    
    * redefine code
    
    * add a matrix unmerge
    
    * add b matrix unmerge k0
    
    * trans a and b to gridegemm
    
    * nhwc init
    
    * no hacks and vector load
    
    * add hacks
    
    * modify some parameter
    
    * fix tuning prometer for fp32
    
    * fix tuning prometer for fp16
    
    * start change gridwise k split
    
    * init ok
    
    * revome a b matrix k0mk1 desc in grid
    
    * carewrite lculate gridsize
    
    * add kbatch to CalculateBottomIndex
    
    * remove some unused funtion
    
    * add clear data function before call kernel
    
    * out hacks
    
    * in hacks
    
    * rename device convolution file and function name
    
    * modify kBatch value
    
    * fix some tuning code
    
    * start from v4r4 nhwc
    
    * nhwc atomic is able to run
    
    * just for fp32
    
    * enable nchw atomic
    
    * tweak
    
    * tweak
    
    * re-arrange gridwise gemm hot loop for wrw
    
    * add wrw v4r5
    
    * v4r4r5 fp16
    
    * v4r4r4 fp16
    
    * v4r4r2 fp16
    
    * V4R4R4XDLNHWC fp16
    
    * V4R4R2XDLATOMICNCHW fp16
    
    * adjust for fp16
    
    * input gridsize
    
    * change kbatch to gridsize
    
    * testing wrw
    
    * clean up
    
    * k_batch to gridsize
    
    * fix bug
    
    * wrw v4r4r4 kbatch change to gride size
    
    * wrw v4r4r2 kbatch change to gride size
    
    * after merge , change gridwise gemm v2r4
    
    * change MakeCBlockClusterAdaptor
    
    * other method use new gridwise gemm
    
    * clean up
    
    * chapad method nge to make_right_pad_transform
    
    * kbatch out from transform function
    
    * clean up and fix bug
    
    * fix bug
    
    * using function type reduce template parameters
    
    * using auto replace define fuction type
    
    * clean up
    Co-authored-by: default avatarltqin <letaoqin@amd.com>
    Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
    Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
    fd49ff80
device.hpp 1.81 KB