• Jinze Xue's avatar
    CUAEV backward (#554) · 23c9816c
    Jinze Xue authored
    
    
    * preparation
    
    * radial preparation 30%
    
    * radial backward kernel done
    
    * reuse Gmr (exp part) result for gradient
    
    * radial kernel every block run by column major, to avoid atomicAdd waiting
    
    * apply code review
    
    * static_cast
    
    * implicit cast
    
    * format
    
    * angular preparation
    
    * angular backward works, but slow, AtomicAdd should be avoided
    
    * angular opti: use share memory to avoid AtomicAdd
    
    * format
    
    * equation optimization
    
    * remove unnecessary shared mem for atomi
    
    * remove a lot (warpsize * nbr) unnecessary shared mem for atomj
    
    * format
    
    * update
    
    * clean
    
    * fix
    
    * fix
    
    * test file
    
    * fix
    Co-authored-by: default avatarGao, Xiang <qasdfgtyuiop@gmail.com>
    23c9816c
test_cuaev.py 9.86 KB