Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
c75bb583
"...ssh:/git@developer.sourcefind.cn:2222/OpenDAS/dynamo.git" did not exist on "09f2314df031aab007f1fde8506966f34ae0c6fa"
Unverified
Commit
c75bb583
authored
Dec 19, 2025
by
Biswa Panda
Committed by
GitHub
Dec 19, 2025
Browse files
feat: approx kv router deployment example (#5037)
parent
6caac575
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
52 additions
and
0 deletions
+52
-0
examples/backends/vllm/deploy/agg_router_kv_approx.yaml
examples/backends/vllm/deploy/agg_router_kv_approx.yaml
+52
-0
No files found.
examples/backends/vllm/deploy/agg_router_kv_approx.yaml
0 → 100644
View file @
c75bb583
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
## This example demonstrates KV-aware routing with the --no-kv-events flag.
## Instead of receiving KV events from workers, the router predicts cache state
## locally based on routing decisions with TTL-based expiration and pruning.
## Note: This mode does not require NATS or JetStream during dynamo platform deployment.
apiVersion
:
nvidia.com/v1alpha1
kind
:
DynamoGraphDeployment
metadata
:
name
:
vllm-agg-router-kv-approx
spec
:
services
:
Frontend
:
dynamoNamespace
:
vllm-agg-router-kv-approx
componentType
:
frontend
replicas
:
1
extraPodSpec
:
mainContainer
:
image
:
nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag
command
:
-
python3
args
:
-
-m
-
dynamo.frontend
-
--router-mode
-
kv
-
--no-kv-events
envs
:
-
name
:
DYN_ROUTER_MODE
value
:
kv
VllmDecodeWorker
:
envFromSecret
:
hf-token-secret
dynamoNamespace
:
vllm-agg-router-kv-approx
componentType
:
worker
replicas
:
2
resources
:
limits
:
gpu
:
"
1"
extraPodSpec
:
mainContainer
:
image
:
nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag
workingDir
:
/workspace/examples/backends/vllm
command
:
-
python3
-
-m
-
dynamo.vllm
args
:
-
--model
-
Qwen/Qwen3-0.6B
-
--kv-events-config
-
'
{"enable_kv_cache_events":
false}'
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment