Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
48abee9e
Unverified
Commit
48abee9e
authored
Aug 08, 2024
by
Cherilyn Buren
Committed by
GitHub
Aug 08, 2024
Browse files
[Frontend] remove max_num_batched_tokens limit for lora (#7288)
parent
74670964
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
0 additions
and
5 deletions
+0
-5
vllm/config.py
vllm/config.py
+0
-5
No files found.
vllm/config.py
View file @
48abee9e
...
...
@@ -1377,11 +1377,6 @@ class LoRAConfig:
model_config
.
quantization
)
def
verify_with_scheduler_config
(
self
,
scheduler_config
:
SchedulerConfig
):
if
scheduler_config
.
max_num_batched_tokens
>
65528
:
raise
ValueError
(
"Due to limitations of the custom LoRA CUDA kernel, "
"max_num_batched_tokens must be <= 65528 when "
"LoRA is enabled."
)
if
scheduler_config
.
chunked_prefill_enabled
:
raise
ValueError
(
"LoRA is not supported with chunked prefill yet."
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment