• zjing14's avatar
    v5r1 fusion kernels for inference (#49) · 970fa3e9
    zjing14 authored
    
    
    * init
    
    * refactor for 1x1
    
    * rename e0_e1
    
    * add e1 with bugs
    
    * debug
    
    * fixed
    
    * fixed e1
    
    * add timer
    
    * imprve threadwise gemm with dot2
    
    * add e2
    
    * tuning
    
    * seperate c2
    
    * add nhwc
    
    * restore nchwc
    
    * clean
    
    * opt
    
    * fixed; tuning
    
    * add BGlobalMoveSliceWindowStepHacks{}
    
    * tuning
    
    * repeat running
    
    * adjust
    
    * merge v5r1 nchwc
    
    * add adaptors
    
    * split k0 k1 in c_thread_grid
    
    * split h and w
    
    * remove v5r1 nhwc
    
    * clean for pr
    
    * remove host_conv_add
    
    * clean code
    
    * clean
    
    * add dynamic support
    
    * static mode
    
    * test static
    
    * add conv+add fusion
    
    * fixed validation
    
    * naming fix
    
    * use activ_enum
    
    * make static
    
    * refactor conv_add for InMem::add
    
    * add bias
    
    * add conv_out
    
    * add configurable makeddesc
    
    * add maxpool fusion
    
    * add maxpool host for validation
    
    * enable static desc
    
    * conv-only use v5r1_add
    
    * test
    
    * test
    
    * for binary dumps
    
    * fixed incorrect results due to typo
    
    * clean
    
    * debugging maxpool
    
    * workaround with offset trick
    
    * clean code
    
    * modularize ops of fusion
    
    * add gridwise_gemm_v3
    
    * create seperate fusion fun
    
    * enable dynamic mode of conv and conv+resize_add
    
    * add dynamic mode of maxpool
    
    * add pass by point
    
    * add activ_type as arguments
    
    * merge develop
    
    * clean
    
    * reset config to old default
    Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
    970fa3e9
CMakeLists.txt 2.16 KB