Unverified Commit c095bd85 authored by Ziqi Fan's avatar Ziqi Fan Committed by GitHub
Browse files

docs: add start vllm + KVBM using dynamo into runbook (#3213)


Signed-off-by: default avatarZiqi Fan <ziqif@nvidia.com>
parent fb12b67f
...@@ -43,8 +43,13 @@ export DYN_KVBM_CPU_CACHE_GB=4 ...@@ -43,8 +43,13 @@ export DYN_KVBM_CPU_CACHE_GB=4
# 8 means 8GB of disk would be used # 8 means 8GB of disk would be used
export DYN_KVBM_DISK_CACHE_GB=8 export DYN_KVBM_DISK_CACHE_GB=8
# serve an example LLM model # [DYNAMO] start dynamo frontend
vllm serve --kv-transfer-config '{"kv_connector":"DynamoConnector","kv_role":"kv_both", "kv_connector_module_path": "dynamo.llm.vllm_integration.connector"}' deepseek-ai/DeepSeek-R1-Distill-Llama-8B python -m dynamo.frontend --http-port 8000 &
# [DYNAMO] serve an LLM model using KVBM with dynamo
python -m dynamo.vllm \
--model deepseek-ai/DeepSeek-R1-Distill-Llama-8B \
--connector kvbm &
# make a call to LLM # make a call to LLM
curl localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{ curl localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
...@@ -60,6 +65,11 @@ curl localhost:8000/v1/chat/completions -H "Content-Type: application/json" ...@@ -60,6 +65,11 @@ curl localhost:8000/v1/chat/completions -H "Content-Type: application/json"
}' }'
``` ```
Alternatively, can use "vllm serve" with KVBM by replacing the above two [DYNAMO] cmds with below:
```bash
vllm serve --kv-transfer-config '{"kv_connector":"DynamoConnector","kv_role":"kv_both", "kv_connector_module_path": "dynamo.llm.vllm_integration.connector"}' deepseek-ai/DeepSeek-R1-Distill-Llama-8B
```
## Enable and View KVBM Metrics ## Enable and View KVBM Metrics
Follow below steps to enable metrics collection and view via Grafana dashboard: Follow below steps to enable metrics collection and view via Grafana dashboard:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment