Unverified Commit e691918e authored by SangBin Cho's avatar SangBin Cho Committed by GitHub
Browse files

[misc] Do not allow to use lora with chunked prefill. (#5538)


Co-authored-by: default avatarCyrus Leung <tlleungac@connect.ust.hk>
parent 81fbb365
......@@ -1092,6 +1092,8 @@ class LoRAConfig:
"Due to limitations of the custom LoRA CUDA kernel, "
"max_num_batched_tokens must be <= 65528 when "
"LoRA is enabled.")
if scheduler_config.chunked_prefill_enabled:
raise ValueError("LoRA is not supported with chunked prefill yet.")
@dataclass
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment