[Perf] Mem align KV caches for CUDA devices (MLA perf improvement) (#12676)
Signed-off-by:simon-mo <xmo@berkeley.edu> Signed-off-by:
Lucas Wilkinson <lcwilkins@redhat.com> Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by:
Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by:
simon-mo <xmo@berkeley.edu>
Showing
Please register or sign in to comment