Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
3f52738d
Unverified
Commit
3f52738d
authored
Aug 13, 2025
by
633WHU
Committed by
GitHub
Aug 13, 2025
Browse files
[Doc] Add max_lora_rank configuration guide (#22782)
Signed-off-by:
chiliu
<
cliu_whu@yeah.net
>
parent
a01e0018
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
19 additions
and
0 deletions
+19
-0
docs/features/lora.md
docs/features/lora.md
+19
-0
No files found.
docs/features/lora.md
View file @
3f52738d
...
...
@@ -351,3 +351,22 @@ vllm serve ibm-granite/granite-speech-3.3-2b \
```
Note: Default multimodal LoRAs are currently only available for
`.generate`
and chat completions.
## Using Tips
### Configuring `max_lora_rank`
The
`--max-lora-rank`
parameter controls the maximum rank allowed for LoRA adapters. This setting affects memory allocation and performance:
-
**Set it to the maximum rank**
among all LoRA adapters you plan to use
-
**Avoid setting it too high**
- using a value much larger than needed wastes memory and can cause performance issues
For example, if your LoRA adapters have ranks [16, 32, 64], use
`--max-lora-rank 64`
rather than 256
```
bash
# Good: matches actual maximum rank
vllm serve model
--enable-lora
--max-lora-rank
64
# Bad: unnecessarily high, wastes memory
vllm serve model
--enable-lora
--max-lora-rank
256
```
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment