Optimize FWD kernel: reduced tail effect
* Added a new CSR array, psi_row_index, containing "ho" values sorted in descending order of CSR row length; this is used to process (ho, wo) points corresponding to longer rows before shorter ones, improving overlap and reducing the tail effect.
Showing
This diff is collapsed.
Please register or sign in to comment