• Chao Liu's avatar
    Fusion Conv+Bias+ReLU(+Add) (#62) · acbd7bd7
    Chao Liu authored
    * fix relu
    
    * clean up
    
    * clean up
    
    * adding 1x1 conv
    
    * adding 1x1 conv
    
    * added 1x1 conv
    
    * refactor
    
    * refactor
    
    * refactor
    
    * added profiler for conv+bias+relu+add
    
    * clean up
    
    * adding conv+bias+relu
    
    * adding conv+bias+relu
    
    * added conv+bias+relu
    
    * Update README.md
    
    * update cpu verification
    
    * adding c shuffle
    
    * update static_tensor for dealing with invalid element
    
    * adding c shuffle
    
    * debugging
    
    * fix bug
    
    * convert to fp16 before shuffle
    
    * shuffle more than one M/NRepeat
    
    * clean up
    
    * remove coordinate step hack from GridwiseGemm_k0mk1_k0nk1_mn_xdlops_v3r1
    
    * clean up
    
    * remove coordinate step hack from all gridwise gemm xdl
    
    * clean up coordinate step hack
    
    * clean up coordinate step hack
    
    * ThreadwiseTensorSliceTransfer_v3r2 support pointwise op on both src and dst
    
    * adding output shuffle in conv+bias+relu+add
    
    * update
    
    * added conv+bias+relu+add with c shuffle
    
    * added conv+bias+relu+add with c shuffle
    
    * fix forward_sweep bugs in threadwise copy
    
    * clean up
    
    * refactor
    
    * clean up
    
    * clean up
    
    * added conv_c_shuffle+bias_relu
    
    * clean up
    
    * added conv+bias+relu+atomic_add
    
    * clean up
    
    * clean up
    
    * clean up
    
    * clean up
    
    * clean up
    
    * clean up
    
    * misc fixes; add 1x1 specialization
    
    * clean up
    
    * delete unused device op
    
    * clean up
    
    * add support for odd C value
    acbd7bd7
profiler.cpp 1.44 KB