Fix attention mask type for Flash Attention + CP + THD (#1354)
* always have padding mask type for both flash and fused attentions Signed-off-by:Xiaowei Ren <xren@nvidia.com> * remove an redundant assert Signed-off-by:
Xiaowei Ren <xren@nvidia.com> --------- Signed-off-by:
Xiaowei Ren <xren@nvidia.com>
Showing
Please register or sign in to comment