tests/lora/test_punica_ops.py · ffe1fc7a28841973135b981fb68ce515b409a236 · OpenDAS / vllm_cscc

"vllm/vscode:/vscode.git/clone" did not exist on "0565f1fdec86dd0f38438d50db2219246f0e196d"

Reduce the kernel overhead when num of active loras is smaller than max... · ffe1fc7a

yugong333 authored Feb 02, 2026


  Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>

ffe1fc7a

test_punica_ops.py 11 KB

Replace test_punica_ops.py