Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
84454ab4
Unverified
Commit
84454ab4
authored
Jun 17, 2025
by
Tanmay Verma
Committed by
GitHub
Jun 17, 2025
Browse files
fix: Fix message truncation in disagg flow (#1572)
parent
4abab20f
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
4 additions
and
12 deletions
+4
-12
examples/tensorrt_llm/configs/llmapi_disagg_configs/single_node_config.yaml
...llm/configs/llmapi_disagg_configs/single_node_config.yaml
+2
-6
examples/tensorrt_llm/configs/llmapi_disagg_router_configs/single_node_config.yaml
...figs/llmapi_disagg_router_configs/single_node_config.yaml
+2
-6
No files found.
examples/tensorrt_llm/configs/llmapi_disagg_configs/single_node_config.yaml
View file @
84454ab4
...
...
@@ -30,9 +30,7 @@ context_servers:
max_batch_size
:
16
enable_chunked_prefill
:
false
kv_cache_config
:
free_gpu_memory_fraction
:
0.40
cache_transceiver_config
:
max_num_tokens
:
10240
free_gpu_memory_fraction
:
0.75
# NOTE: pytorch_backend_config section flattened since: https://github.com/NVIDIA/TensorRT-LLM/pull/4603
# NOTE: This field is called 'enable_overlap_scheduler' in older TRTLLM versions
# Overlap scheduler not currently supported in context-only
...
...
@@ -47,9 +45,7 @@ generation_servers:
max_num_tokens
:
256
max_batch_size
:
256
kv_cache_config
:
free_gpu_memory_fraction
:
0.40
cache_transceiver_config
:
max_num_tokens
:
256
free_gpu_memory_fraction
:
0.75
# NOTE: pytorch_backend_config section flattened since: https://github.com/NVIDIA/TensorRT-LLM/pull/4603
# NOTE: This field is called 'enable_overlap_scheduler' in older TRTLLM versions
disable_overlap_scheduler
:
false
...
...
examples/tensorrt_llm/configs/llmapi_disagg_router_configs/single_node_config.yaml
View file @
84454ab4
...
...
@@ -30,11 +30,9 @@ context_servers:
max_batch_size
:
16
enable_chunked_prefill
:
false
kv_cache_config
:
free_gpu_memory_fraction
:
0.
40
free_gpu_memory_fraction
:
0.
75
event_buffer_max_size
:
1024
enable_block_reuse
:
true
cache_transceiver_config
:
max_num_tokens
:
10240
# NOTE: pytorch_backend_config section flattened since: https://github.com/NVIDIA/TensorRT-LLM/pull/4603
# NOTE: This field is called 'enable_overlap_scheduler' in older TRTLLM versions
# Overlap scheduler not currently supported in context-only
...
...
@@ -50,11 +48,9 @@ generation_servers:
max_num_tokens
:
256
max_batch_size
:
256
kv_cache_config
:
free_gpu_memory_fraction
:
0.
40
free_gpu_memory_fraction
:
0.
75
event_buffer_max_size
:
1024
enable_block_reuse
:
true
cache_transceiver_config
:
max_num_tokens
:
256
# NOTE: pytorch_backend_config section flattened since: https://github.com/NVIDIA/TensorRT-LLM/pull/4603
# NOTE: This field is called 'enable_overlap_scheduler' in older TRTLLM versions
disable_overlap_scheduler
:
false
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment