"src/git@developer.sourcefind.cn:renzhc/diffusers_dcu.git" did not exist on "759ea587082aa0e77449952d8f3523f28ddc61f3"
Optimize bwd kernel: incremental qdot_max and alpha/integral/etc
Leverage the same qdotk_max "trick" for the backward kernel. This avoids 1 loop and saves about 20% of performance.
Showing
Please register or sign in to comment