[performance] ensure `causal_mask` is created directly on device (#22378)
* ensure causal_mask is created directly on device * add copy tag to opt, update bart implementation * add device to all _make_causal_mask copies * formatting fixes * more manual fixes due to unlinked versions of _prepare_decoder_attention_mask
Showing
Please register or sign in to comment