only init encoder_attention_mask if stack is decoder
We currently initialize `encoder_attention_mask` when it is `None`, whether the stack is that of an encoder or a decoder. Since this may lead to bugs that are difficult to tracks down, I added a condition that assesses whether the current stack is a decoder.
Showing
Please register or sign in to comment