-
Jared Casper authored
hidden_size % attention_heads == 0 is handled above when dealing with kv_channels. Adding check for decoder sequence length.
8044c7b4
hidden_size % attention_heads == 0 is handled above when dealing with kv_channels. Adding check for decoder sequence length.