Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
norm
vllm
Commits
a8683102
Unverified
Commit
a8683102
authored
Feb 28, 2024
by
Ganesh Jagadeesan
Committed by
GitHub
Feb 27, 2024
Browse files
multi-lora documentation fix (#3064)
parent
71bcaf99
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
13 additions
and
1 deletion
+13
-1
docs/source/models/lora.rst
docs/source/models/lora.rst
+13
-1
No files found.
docs/source/models/lora.rst
View file @
a8683102
...
@@ -58,7 +58,7 @@ LoRA adapted models can also be served with the Open-AI compatible vLLM server.
...
@@ -58,7 +58,7 @@ LoRA adapted models can also be served with the Open-AI compatible vLLM server.
.. code-block:: bash
.. code-block:: bash
python -m vllm.entrypoints.api_server \
python -m vllm.entrypoints.
openai.
api_server \
--model meta-llama/Llama-2-7b-hf \
--model meta-llama/Llama-2-7b-hf \
--enable-lora \
--enable-lora \
--lora-modules sql-lora=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
--lora-modules sql-lora=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
...
@@ -89,3 +89,15 @@ with its base model:
...
@@ -89,3 +89,15 @@ with its base model:
Requests can specify the LoRA adapter as if it were any other model via the ``model`` request parameter. The requests will be
Requests can specify the LoRA adapter as if it were any other model via the ``model`` request parameter. The requests will be
processed according to the server-wide LoRA configuration (i.e. in parallel with base model requests, and potentially other
processed according to the server-wide LoRA configuration (i.e. in parallel with base model requests, and potentially other
LoRA adapter requests if they were provided and ``max_loras`` is set high enough).
LoRA adapter requests if they were provided and ``max_loras`` is set high enough).
The following is an example request
.. code-block::bash
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "sql-lora",
"prompt": "San Francisco is a",
"max_tokens": 7,
"temperature": 0
}' | jq
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment