SM100 Cutlass MLA decode with unrestricted num_heads (< 128) for DeepSeek TP (#20769)
Signed-off-by:
Alexander Matveev <amatveev@redhat.com>
Showing
This diff is collapsed.
Please register or sign in to comment
Signed-off-by:
Alexander Matveev <amatveev@redhat.com>