Use LERP to implement EMA
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/493 Currently the EMA implementation first does the multiplication and then does the addition. It requires two round trips from HBM. With the lerp operator, one kernel can do both. This change uses LERP to compute EMA instead. It reduces the GPU EMA computation time by 40%. Reviewed By: newstzpz Differential Revision: D43525938 fbshipit-source-id: ca1e14453bdfda958d3c412a52ff48efa65b3dd4
Showing
Please register or sign in to comment