"vllm/vscode:/vscode.git/clone" did not exist on "0565f1fdec86dd0f38438d50db2219246f0e196d"
-
yugong333 authored
Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005) Signed-off-by:Yu Gong <yu3.gong@gmail.com>
ffe1fc7a