[misc] Do not allow to use lora with chunked prefill. (#5538)

Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

[misc] Do not allow to use lora with chunked prefill. (#5538)
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
e691918e · SangBin Cho · GitHub · 81fbb365 · e691918e
Unverified Commit e691918e authored Jun 15, 2024 by SangBin Cho Committed by GitHub Jun 15, 2024
Show whitespace changes
Inline Side-by-side

Showing with 2 additions and 0 deletions

vllm/config.py vllm/config.py +2 -0

No files found.
--- a/vllm/config.py
+++ b/vllm/config.py
@@ -1092,6 +1092,8 @@ class LoRAConfig:
                "Due to limitations of the custom LoRA CUDA kernel, "
                "max_num_batched_tokens must be <= 65528 when "
                "LoRA is enabled.")
+        if scheduler_config.chunked_prefill_enabled:
+            raise ValueError("LoRA is not supported with chunked prefill yet.")


 @dataclass