[Attention][Perf][Kernel] Replace torch.cat with vectorized CUDA kernel MLA...
[Attention][Perf][Kernel] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2 (#34917) Signed-off-by:LopezCastroRoberto <rocastro@redhat.com> Signed-off-by:
Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Showing
csrc/concat_mla_q.cuh
0 → 100644
Please register or sign in to comment