Update arguments checks.
hidden_size % attention_heads == 0 is handled above when dealing with kv_channels. Adding check for decoder sequence length.
Showing
Please register or sign in to comment
hidden_size % attention_heads == 0 is handled above when dealing with kv_channels. Adding check for decoder sequence length.