[mistral] Support passing `head_dim` through config (and do not require...
[mistral] Support passing `head_dim` through config (and do not require `head_dim * num_heads == hidden_size`) (#32050) * Allow `head_dim` to be set in Mistral config * Add docstring * Do not require `head_dim * num_heads == hidden_size` * [run-slow] mistral
Showing
Please register or sign in to comment