fix: fixes multi head attention for context parallel: rotary embedding to use...
fix: fixes multi head attention for context parallel: rotary embedding to use padded cu_seq_lens (#2077)
fix: fixes mha to use padded cu_seq_lens during cp
Signed-off-by:
Jonathan Mitchell <jomitchell@nvidia.com>
Showing
Please register or sign in to comment