Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
d537378a
Unverified
Commit
d537378a
authored
Aug 04, 2025
by
Ryan McCormick
Committed by
GitHub
Aug 05, 2025
Browse files
fix: Update disagg configs for trtllm 1.0.0rc4 changes (main) (#2278) (#2282)
parent
4b8a748f
Changes
12
Hide whitespace changes
Inline
Side-by-side
Showing
12 changed files
with
42 additions
and
12 deletions
+42
-12
components/backends/trtllm/engine_configs/decode.yaml
components/backends/trtllm/engine_configs/decode.yaml
+1
-1
components/backends/trtllm/engine_configs/deepseek_r1/mtp/mtp_decode.yaml
...nds/trtllm/engine_configs/deepseek_r1/mtp/mtp_decode.yaml
+4
-1
components/backends/trtllm/engine_configs/deepseek_r1/mtp/mtp_prefill.yaml
...ds/trtllm/engine_configs/deepseek_r1/mtp/mtp_prefill.yaml
+3
-0
components/backends/trtllm/engine_configs/deepseek_r1/simple/decode.yaml
...ends/trtllm/engine_configs/deepseek_r1/simple/decode.yaml
+3
-0
components/backends/trtllm/engine_configs/deepseek_r1/simple/prefill.yaml
...nds/trtllm/engine_configs/deepseek_r1/simple/prefill.yaml
+4
-1
components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_decode.yaml
...lm/engine_configs/deepseek_r1/wide_ep/wide_ep_decode.yaml
+3
-0
components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_prefill.yaml
...m/engine_configs/deepseek_r1/wide_ep/wide_ep_prefill.yaml
+4
-1
components/backends/trtllm/engine_configs/llama4/eagle/eagle_decode.yaml
...ends/trtllm/engine_configs/llama4/eagle/eagle_decode.yaml
+5
-2
components/backends/trtllm/engine_configs/llama4/eagle/eagle_prefill.yaml
...nds/trtllm/engine_configs/llama4/eagle/eagle_prefill.yaml
+6
-3
components/backends/trtllm/engine_configs/llama4/eagle_one_model/eagle_agg.yml
...rtllm/engine_configs/llama4/eagle_one_model/eagle_agg.yml
+1
-1
components/backends/trtllm/engine_configs/llama4/eagle_one_model/eagle_decode.yaml
...m/engine_configs/llama4/eagle_one_model/eagle_decode.yaml
+4
-1
components/backends/trtllm/engine_configs/llama4/eagle_one_model/eagle_prefill.yaml
.../engine_configs/llama4/eagle_one_model/eagle_prefill.yaml
+4
-1
No files found.
components/backends/trtllm/engine_configs/decode.yaml
View file @
d537378a
...
@@ -28,4 +28,4 @@ kv_cache_config:
...
@@ -28,4 +28,4 @@ kv_cache_config:
free_gpu_memory_fraction
:
0.95
free_gpu_memory_fraction
:
0.95
cache_transceiver_config
:
cache_transceiver_config
:
backend
:
default
backend
:
default
\ No newline at end of file
components/backends/trtllm/engine_configs/deepseek_r1/mtp/mtp_decode.yaml
View file @
d537378a
...
@@ -51,4 +51,7 @@ cuda_graph_config:
...
@@ -51,4 +51,7 @@ cuda_graph_config:
-
128
-
128
-
256
-
256
print_iter_log
:
true
print_iter_log
:
true
\ No newline at end of file
cache_transceiver_config
:
backend
:
default
components/backends/trtllm/engine_configs/deepseek_r1/mtp/mtp_prefill.yaml
View file @
d537378a
...
@@ -36,3 +36,6 @@ disable_overlap_scheduler: true
...
@@ -36,3 +36,6 @@ disable_overlap_scheduler: true
speculative_config
:
speculative_config
:
decoding_type
:
MTP
decoding_type
:
MTP
num_nextn_predict_layers
:
1
num_nextn_predict_layers
:
1
cache_transceiver_config
:
backend
:
default
components/backends/trtllm/engine_configs/deepseek_r1/simple/decode.yaml
View file @
d537378a
...
@@ -55,3 +55,6 @@ cuda_graph_config:
...
@@ -55,3 +55,6 @@ cuda_graph_config:
-
256
-
256
print_iter_log
:
true
print_iter_log
:
true
cache_transceiver_config
:
backend
:
default
components/backends/trtllm/engine_configs/deepseek_r1/simple/prefill.yaml
View file @
d537378a
...
@@ -33,4 +33,7 @@ kv_cache_config:
...
@@ -33,4 +33,7 @@ kv_cache_config:
# config field from 'enable_overlap_scheduler' to 'disable_overlap_scheduler':
# config field from 'enable_overlap_scheduler' to 'disable_overlap_scheduler':
# https://github.com/NVIDIA/TensorRT-LLM/commit/b4e5df0ee0024eda3eeb83a6ba822245a30ab428
# https://github.com/NVIDIA/TensorRT-LLM/commit/b4e5df0ee0024eda3eeb83a6ba822245a30ab428
disable_overlap_scheduler
:
true
disable_overlap_scheduler
:
true
print_iter_log
:
true
print_iter_log
:
true
\ No newline at end of file
cache_transceiver_config
:
backend
:
default
components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_decode.yaml
View file @
d537378a
...
@@ -61,3 +61,6 @@ cuda_graph_config:
...
@@ -61,3 +61,6 @@ cuda_graph_config:
print_iter_log
:
true
print_iter_log
:
true
cache_transceiver_config
:
backend
:
default
components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_prefill.yaml
View file @
d537378a
...
@@ -38,4 +38,7 @@ kv_cache_config:
...
@@ -38,4 +38,7 @@ kv_cache_config:
# config field from 'enable_overlap_scheduler' to 'disable_overlap_scheduler':
# config field from 'enable_overlap_scheduler' to 'disable_overlap_scheduler':
# https://github.com/NVIDIA/TensorRT-LLM/commit/b4e5df0ee0024eda3eeb83a6ba822245a30ab428
# https://github.com/NVIDIA/TensorRT-LLM/commit/b4e5df0ee0024eda3eeb83a6ba822245a30ab428
disable_overlap_scheduler
:
true
disable_overlap_scheduler
:
true
print_iter_log
:
true
print_iter_log
:
true
\ No newline at end of file
cache_transceiver_config
:
backend
:
default
components/backends/trtllm/engine_configs/llama4/eagle/eagle_decode.yaml
View file @
d537378a
...
@@ -21,13 +21,13 @@ max_num_tokens: 512
...
@@ -21,13 +21,13 @@ max_num_tokens: 512
# 8704 = 8192 ISL + 512 OSL
# 8704 = 8192 ISL + 512 OSL
max_seq_len
:
8704
max_seq_len
:
8704
disable_overlap_scheduler
:
true
disable_overlap_scheduler
:
true
autotuner
_enabled
:
false
enable_
autotuner
:
false
# Enable Speculative Decoding in the model engine
# Enable Speculative Decoding in the model engine
speculative_config
:
speculative_config
:
decoding_type
:
Eagle
decoding_type
:
Eagle
max_draft_len
:
1
max_draft_len
:
1
pytorch_weights_path
:
nvidia/Llama-4-Maverick-17B-128E-Eagle3
speculative_model_dir
:
nvidia/Llama-4-Maverick-17B-128E-Eagle3
eagle3_one_model
:
false
eagle3_one_model
:
false
kv_cache_config
:
kv_cache_config
:
...
@@ -49,3 +49,6 @@ cuda_graph_config:
...
@@ -49,3 +49,6 @@ cuda_graph_config:
-
256
-
256
print_iter_log
:
true
print_iter_log
:
true
cache_transceiver_config
:
backend
:
default
components/backends/trtllm/engine_configs/llama4/eagle/eagle_prefill.yaml
View file @
d537378a
...
@@ -20,17 +20,20 @@ max_batch_size: 1
...
@@ -20,17 +20,20 @@ max_batch_size: 1
max_num_tokens
:
8192
max_num_tokens
:
8192
max_seq_len
:
8192
max_seq_len
:
8192
print_iter_log
:
true
print_iter_log
:
true
kv_cache_dtype
:
fp8
disable_overlap_scheduler
:
true
disable_overlap_scheduler
:
true
autotuner
_enabled
:
false
enable_
autotuner
:
false
# Enable Speculative Decoding in the model engine
# Enable Speculative Decoding in the model engine
speculative_config
:
speculative_config
:
decoding_type
:
Eagle
decoding_type
:
Eagle
max_draft_len
:
1
max_draft_len
:
1
pytorch_weights_path
:
nvidia/Llama-4-Maverick-17B-128E-Eagle3
speculative_model_dir
:
nvidia/Llama-4-Maverick-17B-128E-Eagle3
eagle3_one_model
:
false
eagle3_one_model
:
false
kv_cache_config
:
kv_cache_config
:
free_gpu_memory_fraction
:
0.5
free_gpu_memory_fraction
:
0.5
enable_block_reuse
:
false
enable_block_reuse
:
false
dtype
:
fp8
cache_transceiver_config
:
backend
:
default
components/backends/trtllm/engine_configs/llama4/eagle_one_model/eagle_agg.yml
View file @
d537378a
...
@@ -24,7 +24,7 @@ disable_overlap_scheduler: true # disable_overlap_scheduler is having acc issue
...
@@ -24,7 +24,7 @@ disable_overlap_scheduler: true # disable_overlap_scheduler is having acc issue
speculative_config
:
speculative_config
:
decoding_type
:
Eagle
decoding_type
:
Eagle
max_draft_len
:
3
max_draft_len
:
3
pytorch_weights_path
:
nvidia/Llama-4-Maverick-17B-128E-Eagle3
speculative_model_dir
:
nvidia/Llama-4-Maverick-17B-128E-Eagle3
eagle3_one_model
:
true
eagle3_one_model
:
true
kv_cache_config
:
kv_cache_config
:
...
...
components/backends/trtllm/engine_configs/llama4/eagle_one_model/eagle_decode.yaml
View file @
d537378a
...
@@ -26,7 +26,7 @@ disable_overlap_scheduler: true
...
@@ -26,7 +26,7 @@ disable_overlap_scheduler: true
speculative_config
:
speculative_config
:
decoding_type
:
Eagle
decoding_type
:
Eagle
max_draft_len
:
3
max_draft_len
:
3
pytorch_weights_path
:
nvidia/Llama-4-Maverick-17B-128E-Eagle3
speculative_model_dir
:
nvidia/Llama-4-Maverick-17B-128E-Eagle3
eagle3_one_model
:
True
eagle3_one_model
:
True
kv_cache_config
:
kv_cache_config
:
...
@@ -38,3 +38,6 @@ cuda_graph_config:
...
@@ -38,3 +38,6 @@ cuda_graph_config:
max_batch_size
:
256
max_batch_size
:
256
print_iter_log
:
true
print_iter_log
:
true
cache_transceiver_config
:
backend
:
default
components/backends/trtllm/engine_configs/llama4/eagle_one_model/eagle_prefill.yaml
View file @
d537378a
...
@@ -26,9 +26,12 @@ disable_overlap_scheduler: true
...
@@ -26,9 +26,12 @@ disable_overlap_scheduler: true
speculative_config
:
speculative_config
:
decoding_type
:
Eagle
decoding_type
:
Eagle
max_draft_len
:
3
max_draft_len
:
3
pytorch_weights_path
:
nvidia/Llama-4-Maverick-17B-128E-Eagle3
speculative_model_dir
:
nvidia/Llama-4-Maverick-17B-128E-Eagle3
eagle3_one_model
:
True
eagle3_one_model
:
True
kv_cache_config
:
kv_cache_config
:
free_gpu_memory_fraction
:
0.5
free_gpu_memory_fraction
:
0.5
enable_block_reuse
:
false
enable_block_reuse
:
false
cache_transceiver_config
:
backend
:
default
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment