multi-lora documentation fix (#3064)

a8683102 · Ganesh Jagadeesan · GitHub · 71bcaf99 · a8683102
Unverified Commit a8683102 authored Feb 28, 2024 by Ganesh Jagadeesan Committed by GitHub Feb 27, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 13 additions and 1 deletion

docs/source/models/lora.rst docs/source/models/lora.rst +13 -1

No files found.
--- a/docs/source/models/lora.rst
+++ b/docs/source/models/lora.rst
@@ -58,7 +58,7 @@ LoRA adapted models can also be served with the Open-AI compatible vLLM server.
 .. code-block:: bash
-    python -m vllm.entrypoints.api_server \
+    python -m vllm.entrypoints.openai.api_server \
        --model meta-llama/Llama-2-7b-hf \
        --enable-lora \
        --lora-modules sql-lora=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
@@ -89,3 +89,15 @@ with its base model:
 Requests can specify the LoRA adapter as if it were any other model via the ``model`` request parameter. The requests will be
 processed according to the server-wide LoRA configuration (i.e. in parallel with base model requests, and potentially other
 LoRA adapter requests if they were provided and ``max_loras`` is set high enough).
+The following is an example request 
+.. code-block::bash 
+    curl http://localhost:8000/v1/completions \
+        -H "Content-Type: application/json" \
+        -d '{
+            "model": "sql-lora",
+            "prompt": "San Francisco is a",
+            "max_tokens": 7,
+            "temperature": 0
+        }' | jq