Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
34be4418
"examples/vscode:/vscode.git/clone" did not exist on "bd8f0804737847cc3d09db7aec29ee3e3233b30b"
Commit
34be4418
authored
Apr 10, 2025
by
Ziqi Fan
Committed by
GitHub
Apr 10, 2025
Browse files
docs: update dynamo serve trtllm agg example yaml files (#600)
parent
bb4e819c
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
47 additions
and
14 deletions
+47
-14
examples/tensorrt_llm/configs/agg.yaml
examples/tensorrt_llm/configs/agg.yaml
+2
-3
examples/tensorrt_llm/configs/agg_router.yaml
examples/tensorrt_llm/configs/agg_router.yaml
+2
-3
examples/tensorrt_llm/configs/llm_api_config.yaml
examples/tensorrt_llm/configs/llm_api_config.yaml
+4
-8
examples/tensorrt_llm/configs/llm_api_config_router.yaml
examples/tensorrt_llm/configs/llm_api_config_router.yaml
+39
-0
No files found.
examples/tensorrt_llm/configs/agg.yaml
View file @
34be4418
...
...
@@ -20,13 +20,12 @@ Frontend:
Processor
:
engine_args
:
"
configs/llm_api_config.yaml"
block-size
:
64
router
:
round-robin
TensorRTLLMWorker
:
engine_args
:
"
configs/llm_api_config.yaml"
router
:
r
andom
router
:
r
ound-robin
ServiceArgs
:
workers
:
1
resources
:
gpu
:
1
\ No newline at end of file
gpu
:
1
examples/tensorrt_llm/configs/agg_router.yaml
View file @
34be4418
...
...
@@ -20,7 +20,6 @@ Frontend:
Processor
:
engine_args
:
"
configs/llm_api_config.yaml"
block-size
:
64
router
:
kv
Router
:
...
...
@@ -28,9 +27,9 @@ Router:
min-workers
:
1
TensorRTLLMWorker
:
engine_args
:
"
configs/llm_api_config.yaml"
engine_args
:
"
configs/llm_api_config
_router
.yaml"
router
:
kv
ServiceArgs
:
workers
:
1
resources
:
gpu
:
1
\ No newline at end of file
gpu
:
1
examples/tensorrt_llm/configs/llm_api_config.yaml
View file @
34be4418
...
...
@@ -22,19 +22,15 @@ model_path: null
tensor_parallel_size
:
1
moe_expert_parallel_size
:
1
enable_attention_dp
:
false
max_num_tokens
:
10240
max_num_tokens
:
8192
max_batch_size
:
16
trust_remote_code
:
true
backend
:
pytorch
enable_chunked_prefill
:
true
kv_cache_config
:
free_gpu_memory_fraction
:
0.95
# Uncomment to enable kv cache event collection
#event_buffer_max_size: 1024
#enable_block_reuse: true
pytorch_backend_config
:
enable_overlap_scheduler
:
false
use_cuda_graph
:
false
# Uncomment to enable iter perf stats
#enable_iter_perf_stats: true
\ No newline at end of file
enable_overlap_scheduler
:
true
use_cuda_graph
:
true
examples/tensorrt_llm/configs/llm_api_config_router.yaml
0 → 100644
View file @
34be4418
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# In the case of disaggregated deployment, this config will apply to each server
# and will be overwritten by the disaggregated config file
model_name
:
"
deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
model_path
:
null
tensor_parallel_size
:
1
moe_expert_parallel_size
:
1
enable_attention_dp
:
false
max_num_tokens
:
8192
max_batch_size
:
16
trust_remote_code
:
true
backend
:
pytorch
enable_chunked_prefill
:
true
kv_cache_config
:
free_gpu_memory_fraction
:
0.95
event_buffer_max_size
:
1024
enable_block_reuse
:
true
pytorch_backend_config
:
enable_overlap_scheduler
:
true
use_cuda_graph
:
true
enable_iter_perf_stats
:
true
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment