Unverified Commit 3f52738d authored by 633WHU's avatar 633WHU Committed by GitHub
Browse files

[Doc] Add max_lora_rank configuration guide (#22782)


Signed-off-by: default avatarchiliu <cliu_whu@yeah.net>
parent a01e0018
......@@ -351,3 +351,22 @@ vllm serve ibm-granite/granite-speech-3.3-2b \
```
Note: Default multimodal LoRAs are currently only available for `.generate` and chat completions.
## Using Tips
### Configuring `max_lora_rank`
The `--max-lora-rank` parameter controls the maximum rank allowed for LoRA adapters. This setting affects memory allocation and performance:
- **Set it to the maximum rank** among all LoRA adapters you plan to use
- **Avoid setting it too high** - using a value much larger than needed wastes memory and can cause performance issues
For example, if your LoRA adapters have ranks [16, 32, 64], use `--max-lora-rank 64` rather than 256
```bash
# Good: matches actual maximum rank
vllm serve model --enable-lora --max-lora-rank 64
# Bad: unnecessarily high, wastes memory
vllm serve model --enable-lora --max-lora-rank 256
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment