Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
b43a131c
Unverified
Commit
b43a131c
authored
Feb 03, 2026
by
Yunzhou Liu
Committed by
GitHub
Feb 03, 2026
Browse files
docs: update Qwen3-235B-A22B-FP8 recipes (#5254)
Signed-off-by:
Elnifio
<
elnifio0519@gmail.com
>
parent
84b5e9b5
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
11 additions
and
16 deletions
+11
-16
recipes/qwen3-235b-a22b-fp8/trtllm/agg/deploy.yaml
recipes/qwen3-235b-a22b-fp8/trtllm/agg/deploy.yaml
+3
-4
recipes/qwen3-235b-a22b-fp8/trtllm/disagg/deploy.yaml
recipes/qwen3-235b-a22b-fp8/trtllm/disagg/deploy.yaml
+8
-12
No files found.
recipes/qwen3-235b-a22b-fp8/trtllm/agg/deploy.yaml
View file @
b43a131c
...
...
@@ -13,10 +13,6 @@ data:
moe_tensor_parallel_size: 1
enable_attention_dp: false
enable_chunked_prefill: true
build_config:
max_batch_size: 128
max_num_tokens: 8192
max_seq_len: 8192
kv_cache_config:
enable_block_reuse: true
free_gpu_memory_fraction: 0.8
...
...
@@ -91,6 +87,9 @@ spec:
python3 -m dynamo.trtllm \
--model-path "${MODEL_PATH}" \
--served-model-name "Qwen/Qwen3-235B-A22B-FP8" \
--max-batch-size 128 \
--max-num-tokens 8192 \
--max-seq-len 8192 \
--extra-engine-args "${ENGINE_ARGS}"
image
:
nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.8.0
workingDir
:
/workspace/components/backends/trtllm
...
...
recipes/qwen3-235b-a22b-fp8/trtllm/disagg/deploy.yaml
View file @
b43a131c
...
...
@@ -13,10 +13,6 @@ data:
moe_expert_parallel_size: 1
enable_attention_dp: false
enable_chunked_prefill: false
build_config:
max_batch_size: 2
max_num_tokens: 8192
max_seq_len: 8192
kv_cache_config:
enable_block_reuse: true
free_gpu_memory_fraction: 0.7
...
...
@@ -42,10 +38,6 @@ data:
moe_tensor_parallel_size: 1
enable_attention_dp: false
enable_chunked_prefill: false
build_config:
max_batch_size: 512
max_num_tokens: 1024
max_seq_len: 8192
kv_cache_config:
enable_block_reuse: false
free_gpu_memory_fraction: 0.95
...
...
@@ -127,9 +119,11 @@ spec:
python3 -m dynamo.trtllm \
--model-path "${MODEL_PATH}" \
--served-model-name "Qwen/Qwen3-235B-A22B-FP8" \
--max-batch-size 2 \
--max-num-tokens 8192 \
--max-seq-len 8192 \
--extra-engine-args "${ENGINE_ARGS}" \
--disaggregation-mode prefill \
--disaggregation-strategy prefill_first
--disaggregation-mode prefill
volumeMounts
:
-
name
:
prefill-config
mountPath
:
/engine_configs
...
...
@@ -180,9 +174,11 @@ spec:
python3 -m dynamo.trtllm \
--model-path "${MODEL_PATH}" \
--served-model-name "Qwen/Qwen3-235B-A22B-FP8" \
--max-batch-size 512 \
--max-num-tokens 1024 \
--max-seq-len 8192 \
--extra-engine-args "${ENGINE_ARGS}" \
--disaggregation-mode decode \
--disaggregation-strategy prefill_first
--disaggregation-mode decode
volumeMounts
:
-
name
:
decode-config
mountPath
:
/engine_configs
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment