Unverified Commit 0d9c8994 authored by julienmancuso's avatar julienmancuso Committed by GitHub
Browse files

feat: add trtllm example with config (#2895)


Signed-off-by: default avatarJulien Mancuso <jmancuso@nvidia.com>
parent 357efee3
...@@ -57,6 +57,7 @@ repos: ...@@ -57,6 +57,7 @@ repos:
- id: check-toml - id: check-toml
- id: check-yaml - id: check-yaml
exclude: ^.*/templates/.*\.yaml$ #ignore all yaml files in helm chart templates exclude: ^.*/templates/.*\.yaml$ #ignore all yaml files in helm chart templates
args: ['--allow-multiple-documents']
- id: check-shebang-scripts-are-executable - id: check-shebang-scripts-are-executable
- id: end-of-file-fixer - id: end-of-file-fixer
types_or: [c, c++, cuda, proto, textproto, java, python] types_or: [c, c++, cuda, proto, textproto, java, python]
......
...@@ -34,6 +34,14 @@ Advanced disaggregated deployment with KV cache routing capabilities. ...@@ -34,6 +34,14 @@ Advanced disaggregated deployment with KV cache routing capabilities.
- `TRTLLMDecodeWorker`: Specialized decode-only worker - `TRTLLMDecodeWorker`: Specialized decode-only worker
- `TRTLLMPrefillWorker`: Specialized prefill-only worker (2 replicas for load balancing) - `TRTLLMPrefillWorker`: Specialized prefill-only worker (2 replicas for load balancing)
### 5. **Aggregated Deployment with Config** (`agg-with-config.yaml`)
Aggregated deployment with custom configuration.
**Architecture:**
- `nvidia-config`: ConfigMap containing a custom trtllm configuration
- `Frontend`: OpenAI-compatible API server (with kv router mode disabled)
- `TRTLLMWorker`: Single worker handling both prefill and decode with custom configuration mounted from the configmap
## CRD Structure ## CRD Structure
All templates use the **DynamoGraphDeployment** CRD: All templates use the **DynamoGraphDeployment** CRD:
......
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
# configmap that contains the custom trtllm configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: nvidia-config
data:
agg.yaml: |
tensor_parallel_size: 1
moe_expert_parallel_size: 1
enable_attention_dp: false
max_num_tokens: 8192
max_batch_size: 16
trust_remote_code: true
backend: pytorch
enable_chunked_prefill: true
disable_overlap_scheduler: true
kv_cache_config:
free_gpu_memory_fraction: 0.95
cuda_graph_config:
max_batch_size: 16
---
# dynamo graph deployment which uses the custom configuration contained in the configmap
apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
name: trtllm-agg
spec:
services:
Frontend:
dynamoNamespace: trtllm-agg
componentType: frontend
replicas: 1
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.4.1
TRTLLMWorker:
envFromSecret: hf-token-secret
dynamoNamespace: trtllm-agg
componentType: worker
replicas: 1
resources:
limits:
gpu: "1"
extraPodSpec:
# declare the configmap as a volume
volumes:
- name: nvidia-config
configMap:
name: nvidia-config
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.4.1
workingDir: /workspace/components/backends/trtllm
# mount the configmap as a volume
volumeMounts:
- name: nvidia-config
mountPath: /workspace/components/backends/trtllm/engine_configs
readOnly: true
command:
- /bin/sh
- -c
args:
- >-
python3 -m dynamo.trtllm
--model-path Qwen/Qwen3-0.6B
--served-model-name Qwen/Qwen3-0.6B
--extra-engine-args engine_configs/agg.yaml
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment