Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
5f7c1f7e
Unverified
Commit
5f7c1f7e
authored
Oct 09, 2025
by
Ziqi Fan
Committed by
GitHub
Oct 09, 2025
Browse files
chore: add sample KVBM related k8s deployment manifests (#3523)
Signed-off-by:
Ziqi Fan
<
ziqif@nvidia.com
>
parent
e9a71009
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
295 additions
and
0 deletions
+295
-0
components/backends/vllm/deploy/agg_kvbm.yaml
components/backends/vllm/deploy/agg_kvbm.yaml
+50
-0
components/backends/vllm/deploy/disagg_kvbm.yaml
components/backends/vllm/deploy/disagg_kvbm.yaml
+77
-0
components/backends/vllm/deploy/disagg_kvbm_2p2d.yaml
components/backends/vllm/deploy/disagg_kvbm_2p2d.yaml
+81
-0
components/backends/vllm/deploy/disagg_kvbm_tp2.yaml
components/backends/vllm/deploy/disagg_kvbm_tp2.yaml
+87
-0
No files found.
components/backends/vllm/deploy/agg_kvbm.yaml
0 → 100644
View file @
5f7c1f7e
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion
:
nvidia.com/v1alpha1
kind
:
DynamoGraphDeployment
metadata
:
name
:
vllm-agg-kvbm
spec
:
services
:
Frontend
:
dynamoNamespace
:
vllm-agg-kvbm
componentType
:
frontend
replicas
:
1
extraPodSpec
:
mainContainer
:
image
:
nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag
VllmDecodeWorker
:
envFromSecret
:
hf-token-secret
dynamoNamespace
:
vllm-agg-kvbm
componentType
:
worker
replicas
:
1
resources
:
requests
:
gpu
:
"
1"
memory
:
"
200Gi"
limits
:
gpu
:
"
1"
memory
:
"
250Gi"
envs
:
-
name
:
DYN_KVBM_CPU_CACHE_GB
value
:
"
100"
extraPodSpec
:
mainContainer
:
image
:
nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag
workingDir
:
/workspace/components/backends/vllm
command
:
-
python3
-
-m
-
dynamo.vllm
args
:
-
--model
-
Qwen/Qwen3-8B
-
--gpu-memory-utilization
-
"
0.45"
-
--disable-log-requests
-
--max-model-len
-
"
32000"
-
--enforce-eager
-
--connector
-
kvbm
components/backends/vllm/deploy/disagg_kvbm.yaml
0 → 100644
View file @
5f7c1f7e
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion
:
nvidia.com/v1alpha1
kind
:
DynamoGraphDeployment
metadata
:
name
:
vllm-disagg-kvbm
spec
:
services
:
Frontend
:
dynamoNamespace
:
vllm-disagg-kvbm
componentType
:
frontend
replicas
:
1
extraPodSpec
:
mainContainer
:
image
:
nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag
VllmDecodeWorker
:
dynamoNamespace
:
vllm-disagg-kvbm
envFromSecret
:
hf-token-secret
componentType
:
worker
replicas
:
1
resources
:
limits
:
gpu
:
"
1"
extraPodSpec
:
mainContainer
:
image
:
nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag
workingDir
:
/workspace/components/backends/vllm
command
:
-
python3
-
-m
-
dynamo.vllm
args
:
-
--model
-
Qwen/Qwen3-8B
-
--gpu-memory-utilization
-
"
0.3"
-
--disable-log-requests
-
--max-model-len
-
"
32000"
-
--enforce-eager
VllmPrefillWorker
:
dynamoNamespace
:
vllm-disagg-kvbm
envFromSecret
:
hf-token-secret
componentType
:
worker
replicas
:
1
resources
:
requests
:
gpu
:
"
1"
memory
:
"
200Gi"
limits
:
gpu
:
"
1"
memory
:
"
250Gi"
envs
:
-
name
:
DYN_KVBM_CPU_CACHE_GB
value
:
"
100"
extraPodSpec
:
mainContainer
:
image
:
nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag
workingDir
:
/workspace/components/backends/vllm
command
:
-
python3
-
-m
-
dynamo.vllm
args
:
-
--model
-
Qwen/Qwen3-8B
-
--is-prefill-worker
-
--gpu-memory-utilization
-
"
0.3"
-
--disable-log-requests
-
--max-model-len
-
"
32000"
-
--enforce-eager
-
--connector
-
kvbm
-
nixl
components/backends/vllm/deploy/disagg_kvbm_2p2d.yaml
0 → 100644
View file @
5f7c1f7e
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion
:
nvidia.com/v1alpha1
kind
:
DynamoGraphDeployment
metadata
:
name
:
vllm-disagg-kvbm-2p2d
spec
:
services
:
Frontend
:
dynamoNamespace
:
vllm-disagg-kvbm-2p2d
componentType
:
frontend
replicas
:
1
extraPodSpec
:
mainContainer
:
image
:
nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag
VllmDecodeWorker
:
dynamoNamespace
:
vllm-disagg-kvbm-2p2d
envFromSecret
:
hf-token-secret
componentType
:
worker
replicas
:
2
resources
:
limits
:
gpu
:
"
1"
extraPodSpec
:
mainContainer
:
image
:
nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag
workingDir
:
/workspace/components/backends/vllm
command
:
-
python3
-
-m
-
dynamo.vllm
args
:
-
--model
-
Qwen/Qwen3-8B
-
--gpu-memory-utilization
-
"
0.3"
-
--disable-log-requests
-
--max-model-len
-
"
32000"
-
--enforce-eager
VllmPrefillWorker
:
dynamoNamespace
:
vllm-disagg-kvbm-2p2d
envFromSecret
:
hf-token-secret
componentType
:
worker
replicas
:
2
resources
:
requests
:
gpu
:
"
1"
memory
:
"
200Gi"
limits
:
gpu
:
"
1"
memory
:
"
250Gi"
envs
:
-
name
:
DYN_KVBM_CPU_CACHE_GB
value
:
"
100"
-
name
:
DYN_KVBM_BARRIER_ID_PREFIX
valueFrom
:
fieldRef
:
fieldPath
:
metadata.name
extraPodSpec
:
mainContainer
:
image
:
nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag
workingDir
:
/workspace/components/backends/vllm
command
:
-
python3
-
-m
-
dynamo.vllm
args
:
-
--model
-
Qwen/Qwen3-8B
-
--is-prefill-worker
-
--gpu-memory-utilization
-
"
0.3"
-
--disable-log-requests
-
--max-model-len
-
"
32000"
-
--enforce-eager
-
--connector
-
kvbm
-
nixl
components/backends/vllm/deploy/disagg_kvbm_tp2.yaml
0 → 100644
View file @
5f7c1f7e
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion
:
nvidia.com/v1alpha1
kind
:
DynamoGraphDeployment
metadata
:
name
:
vllm-disagg-kvbm-tp2
spec
:
services
:
Frontend
:
dynamoNamespace
:
vllm-disagg-kvbm-tp2
componentType
:
frontend
replicas
:
1
extraPodSpec
:
mainContainer
:
image
:
nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag
VllmDecodeWorker
:
dynamoNamespace
:
vllm-disagg-kvbm-tp2
envFromSecret
:
hf-token-secret
componentType
:
worker
replicas
:
1
resources
:
requests
:
gpu
:
"
2"
limits
:
gpu
:
"
2"
extraPodSpec
:
mainContainer
:
image
:
nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag
workingDir
:
/workspace/components/backends/vllm
command
:
-
python3
-
-m
-
dynamo.vllm
args
:
-
--model
-
Qwen/Qwen3-8B
-
--gpu-memory-utilization
-
"
0.23"
-
--disable-log-requests
-
--max-model-len
-
"
32000"
-
--enforce-eager
-
--tensor-parallel-size
-
"
2"
VllmPrefillWorker
:
dynamoNamespace
:
vllm-disagg-kvbm-tp2
envFromSecret
:
hf-token-secret
componentType
:
worker
replicas
:
1
resources
:
requests
:
gpu
:
"
2"
memory
:
"
200Gi"
limits
:
gpu
:
"
2"
memory
:
"
250Gi"
envs
:
-
name
:
DYN_KVBM_CPU_CACHE_GB
value
:
"
100"
-
name
:
DYN_KVBM_BARRIER_ID_PREFIX
valueFrom
:
fieldRef
:
fieldPath
:
metadata.name
extraPodSpec
:
mainContainer
:
image
:
nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag
workingDir
:
/workspace/components/backends/vllm
command
:
-
python3
-
-m
-
dynamo.vllm
args
:
-
--model
-
Qwen/Qwen3-8B
-
--is-prefill-worker
-
--gpu-memory-utilization
-
"
0.23"
-
--disable-log-requests
-
--max-model-len
-
"
32000"
-
--enforce-eager
-
--connector
-
kvbm
-
nixl
-
--tensor-parallel-size
-
"
2"
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment