[Bugfix] Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding (#37442)
Signed-off-by:Elvir Crncevic <elvircrn@gmail.com> Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Co-authored-by:
Matthew Bonanni <mbonanni@redhat.com>
Showing
Please register or sign in to comment