[Perf] Optimize batch invariant with fused rms norm, 2.1% E2E latency improvement (#40413)
Signed-off-by:
yewentao256 <zhyanwentao@126.com>
Showing
Please register or sign in to comment
Signed-off-by:
yewentao256 <zhyanwentao@126.com>