CUAEV backward (#554)
* preparation
* radial preparation 30%
* radial backward kernel done
* reuse Gmr (exp part) result for gradient
* radial kernel every block run by column major, to avoid atomicAdd waiting
* apply code review
* static_cast
* implicit cast
* format
* angular preparation
* angular backward works, but slow, AtomicAdd should be avoided
* angular opti: use share memory to avoid AtomicAdd
* format
* equation optimization
* remove unnecessary shared mem for atomi
* remove a lot (warpsize * nbr) unnecessary shared mem for atomj
* format
* update
* clean
* fix
* fix
* test file
* fix
Co-authored-by:
Gao, Xiang <qasdfgtyuiop@gmail.com>
Showing
This diff is collapsed.
Please register or sign in to comment