tests/lora/test_punica_ops.py · 15d76f74e2fdb12a95ea00f0ca283acf6219a2b7 · OpenDAS / vllm_cscc

"tests/kernels/test_cache.py" did not exist on "c9d5b6d4a8b3f51ff6c9eee7eb52bb5149d89b6a"

Reduce the kernel overhead when num of active loras is smaller than max... · ffe1fc7a

yugong333 authored Feb 02, 2026


  Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>

ffe1fc7a

test_punica_ops.py 11 KB

Replace test_punica_ops.py