-
Jinze Xue authored
* preparation * radial preparation 30% * radial backward kernel done * reuse Gmr (exp part) result for gradient * radial kernel every block run by column major, to avoid atomicAdd waiting * apply code review * static_cast * implicit cast * format * angular preparation * angular backward works, but slow, AtomicAdd should be avoided * angular opti: use share memory to avoid AtomicAdd * format * equation optimization * remove unnecessary shared mem for atomi * remove a lot (warpsize * nbr) unnecessary shared mem for atomj * format * update * clean * fix * fix * test file * fix Co-authored-by:Gao, Xiang <qasdfgtyuiop@gmail.com>
23c9816c