Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
0d9c8994
Unverified
Commit
0d9c8994
authored
Sep 05, 2025
by
julienmancuso
Committed by
GitHub
Sep 05, 2025
Browse files
feat: add trtllm example with config (#2895)
Signed-off-by:
Julien Mancuso
<
jmancuso@nvidia.com
>
parent
357efee3
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
77 additions
and
0 deletions
+77
-0
.pre-commit-config.yaml
.pre-commit-config.yaml
+1
-0
components/backends/trtllm/deploy/README.md
components/backends/trtllm/deploy/README.md
+8
-0
components/backends/trtllm/deploy/agg-with-config.yaml
components/backends/trtllm/deploy/agg-with-config.yaml
+68
-0
No files found.
.pre-commit-config.yaml
View file @
0d9c8994
...
...
@@ -57,6 +57,7 @@ repos:
-
id
:
check-toml
-
id
:
check-yaml
exclude
:
^.*/templates/.*\.yaml$
#ignore all yaml files in helm chart templates
args
:
[
'
--allow-multiple-documents'
]
-
id
:
check-shebang-scripts-are-executable
-
id
:
end-of-file-fixer
types_or
:
[
c
,
c++
,
cuda
,
proto
,
textproto
,
java
,
python
]
...
...
components/backends/trtllm/deploy/README.md
View file @
0d9c8994
...
...
@@ -34,6 +34,14 @@ Advanced disaggregated deployment with KV cache routing capabilities.
-
`TRTLLMDecodeWorker`
: Specialized decode-only worker
-
`TRTLLMPrefillWorker`
: Specialized prefill-only worker (2 replicas for load balancing)
### 5. **Aggregated Deployment with Config** (`agg-with-config.yaml`)
Aggregated deployment with custom configuration.
**Architecture:**
-
`nvidia-config`
: ConfigMap containing a custom trtllm configuration
-
`Frontend`
: OpenAI-compatible API server (with kv router mode disabled)
-
`TRTLLMWorker`
: Single worker handling both prefill and decode with custom configuration mounted from the configmap
## CRD Structure
All templates use the
**DynamoGraphDeployment**
CRD:
...
...
components/backends/trtllm/deploy/agg-with-config.yaml
0 → 100644
View file @
0d9c8994
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
# configmap that contains the custom trtllm configuration
apiVersion
:
v1
kind
:
ConfigMap
metadata
:
name
:
nvidia-config
data
:
agg.yaml
:
|
tensor_parallel_size: 1
moe_expert_parallel_size: 1
enable_attention_dp: false
max_num_tokens: 8192
max_batch_size: 16
trust_remote_code: true
backend: pytorch
enable_chunked_prefill: true
disable_overlap_scheduler: true
kv_cache_config:
free_gpu_memory_fraction: 0.95
cuda_graph_config:
max_batch_size: 16
---
# dynamo graph deployment which uses the custom configuration contained in the configmap
apiVersion
:
nvidia.com/v1alpha1
kind
:
DynamoGraphDeployment
metadata
:
name
:
trtllm-agg
spec
:
services
:
Frontend
:
dynamoNamespace
:
trtllm-agg
componentType
:
frontend
replicas
:
1
extraPodSpec
:
mainContainer
:
image
:
nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.4.1
TRTLLMWorker
:
envFromSecret
:
hf-token-secret
dynamoNamespace
:
trtllm-agg
componentType
:
worker
replicas
:
1
resources
:
limits
:
gpu
:
"
1"
extraPodSpec
:
# declare the configmap as a volume
volumes
:
-
name
:
nvidia-config
configMap
:
name
:
nvidia-config
mainContainer
:
image
:
nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.4.1
workingDir
:
/workspace/components/backends/trtllm
# mount the configmap as a volume
volumeMounts
:
-
name
:
nvidia-config
mountPath
:
/workspace/components/backends/trtllm/engine_configs
readOnly
:
true
command
:
-
/bin/sh
-
-c
args
:
-
>-
python3 -m dynamo.trtllm
--model-path Qwen/Qwen3-0.6B
--served-model-name Qwen/Qwen3-0.6B
--extra-engine-args engine_configs/agg.yaml
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment