Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
84454ab4
"lib/runtime/vscode:/vscode.git/clone" did not exist on "0055f2a47147dc90b8fb99e14a4926c71aaabc34"
Unverified
Commit
84454ab4
authored
Jun 17, 2025
by
Tanmay Verma
Committed by
GitHub
Jun 17, 2025
Browse files
fix: Fix message truncation in disagg flow (#1572)
parent
4abab20f
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
4 additions
and
12 deletions
+4
-12
examples/tensorrt_llm/configs/llmapi_disagg_configs/single_node_config.yaml
...llm/configs/llmapi_disagg_configs/single_node_config.yaml
+2
-6
examples/tensorrt_llm/configs/llmapi_disagg_router_configs/single_node_config.yaml
...figs/llmapi_disagg_router_configs/single_node_config.yaml
+2
-6
No files found.
examples/tensorrt_llm/configs/llmapi_disagg_configs/single_node_config.yaml
View file @
84454ab4
...
...
@@ -30,9 +30,7 @@ context_servers:
max_batch_size
:
16
enable_chunked_prefill
:
false
kv_cache_config
:
free_gpu_memory_fraction
:
0.40
cache_transceiver_config
:
max_num_tokens
:
10240
free_gpu_memory_fraction
:
0.75
# NOTE: pytorch_backend_config section flattened since: https://github.com/NVIDIA/TensorRT-LLM/pull/4603
# NOTE: This field is called 'enable_overlap_scheduler' in older TRTLLM versions
# Overlap scheduler not currently supported in context-only
...
...
@@ -47,9 +45,7 @@ generation_servers:
max_num_tokens
:
256
max_batch_size
:
256
kv_cache_config
:
free_gpu_memory_fraction
:
0.40
cache_transceiver_config
:
max_num_tokens
:
256
free_gpu_memory_fraction
:
0.75
# NOTE: pytorch_backend_config section flattened since: https://github.com/NVIDIA/TensorRT-LLM/pull/4603
# NOTE: This field is called 'enable_overlap_scheduler' in older TRTLLM versions
disable_overlap_scheduler
:
false
...
...
examples/tensorrt_llm/configs/llmapi_disagg_router_configs/single_node_config.yaml
View file @
84454ab4
...
...
@@ -30,11 +30,9 @@ context_servers:
max_batch_size
:
16
enable_chunked_prefill
:
false
kv_cache_config
:
free_gpu_memory_fraction
:
0.
40
free_gpu_memory_fraction
:
0.
75
event_buffer_max_size
:
1024
enable_block_reuse
:
true
cache_transceiver_config
:
max_num_tokens
:
10240
# NOTE: pytorch_backend_config section flattened since: https://github.com/NVIDIA/TensorRT-LLM/pull/4603
# NOTE: This field is called 'enable_overlap_scheduler' in older TRTLLM versions
# Overlap scheduler not currently supported in context-only
...
...
@@ -50,11 +48,9 @@ generation_servers:
max_num_tokens
:
256
max_batch_size
:
256
kv_cache_config
:
free_gpu_memory_fraction
:
0.
40
free_gpu_memory_fraction
:
0.
75
event_buffer_max_size
:
1024
enable_block_reuse
:
true
cache_transceiver_config
:
max_num_tokens
:
256
# NOTE: pytorch_backend_config section flattened since: https://github.com/NVIDIA/TensorRT-LLM/pull/4603
# NOTE: This field is called 'enable_overlap_scheduler' in older TRTLLM versions
disable_overlap_scheduler
:
false
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment