[Attn,KV-cache] Use per-head scales in the attention selector (#34281)
Signed-off-by:Your Name <you@example.com> Signed-off-by:
Eldar Kurtic <research@neuralmagic.com> Co-authored-by:
Eldar Kurtic <research@neuralmagic.com> Co-authored-by:
Your Name <you@example.com>
Showing
Please register or sign in to comment