[Bugfix] Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding (#37442)
Signed-off-by:Elvir Crncevic <elvircrn@gmail.com> Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Co-authored-by:
Matthew Bonanni <mbonanni@redhat.com> (cherry picked from commit ef2c4f77)
Showing
Please register or sign in to comment