Move query quantization to attention layer for Flashinfer & Triton. (#26534)
Signed-off-by:adabeyta <aabeyta@redhat.com> Signed-off-by:
Adrian Abeyta <aabeyta@redhat.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com>
Showing
Please register or sign in to comment