[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model (#27165)
Signed-off-by:
zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Showing
Please register or sign in to comment
Signed-off-by:
zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>