Reduce the kernel overhead when num of active loras is smaller than max...
Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005)
Signed-off-by:
Yu Gong <yu3.gong@gmail.com>
Showing
Please register or sign in to comment