Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
633f943e
Unverified
Commit
633f943e
authored
Sep 26, 2025
by
Cyrus Leung
Committed by
GitHub
Sep 26, 2025
Browse files
[Doc] Update Batch-level DP docs (#25757)
Signed-off-by:
DarkLight1337
<
tlleungac@connect.ust.hk
>
parent
b03b1b97
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
6 additions
and
5 deletions
+6
-5
docs/configuration/optimization.md
docs/configuration/optimization.md
+6
-5
No files found.
docs/configuration/optimization.md
View file @
633f943e
...
...
@@ -139,9 +139,9 @@ there is relatively little gain from TP. On the other hand, TP incurs significan
overhead because of all-reduce being performed after every layer.
Given this, it may be advantageous to instead shard the batched input data using TP, essentially
performing batch-level DP. This has been shown to improve the throughput by around 10% for
performing batch-level DP. This has been shown to improve the throughput
and TTFT
by around 10% for
`tensor_parallel_size=8`
. For vision encoders that use hardware-unoptimized Conv3D operations,
batch-level DP can provide another 40% i
ncrease to throughpu
t compared to regular TP.
batch-level DP can provide another 40% i
mprovemen
t compared to regular TP.
Nevertheless, since the weights of the multi-modal encoder are replicated across each TP rank,
there will be a minor increase in memory consumption and may cause OOM if you can barely fit the model already.
...
...
@@ -172,14 +172,15 @@ Batch-level DP needs to be implemented on a per-model basis,
and enabled by setting
`supports_encoder_tp_data = True`
in the model class.
Regardless, you need to set
`mm_encoder_tp_mode="data"`
in engine arguments to use this feature.
Known supported models:
Known supported models
(with corresponding benchmarks)
:
-
GLM-4.5V GLM-4.1V (
<gh-pr:23168>
)
-
dots_ocr (
<gh-pr:25466>
)
-
GLM-4.1V or above (
<gh-pr:23168>
)
-
InternVL (
<gh-pr:23909>
)
-
Kimi-VL (
<gh-pr:23817>
)
-
Llama4 (
<gh-pr:18368>
)
-
MiniCPM-V-2.5 or above (
<gh-pr:23327>
,
<gh-pr:23948>
)
-
Qwen2
.5
-VL (
<gh-pr:22742>
)
-
Qwen2-VL
or above
(
<gh-pr:22742>
,
<gh-pr:24955>
,
<gh-pr:25445>
)
-
Step3 (
<gh-pr:22697>
)
## Input Processing
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment