vllm/engine/arg_utils.py · ffe1fc7a28841973135b981fb68ce515b409a236 · OpenDAS / vllm_cscc

Reduce the kernel overhead when num of active loras is smaller than max... · ffe1fc7a

yugong333 authored Feb 02, 2026


  Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>

ffe1fc7a

arg_utils.py 86.6 KB

Replace arg_utils.py