Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
633f943e
Unverified
Commit
633f943e
authored
Sep 26, 2025
by
Cyrus Leung
Committed by
GitHub
Sep 26, 2025
Browse files
[Doc] Update Batch-level DP docs (#25757)
Signed-off-by:
DarkLight1337
<
tlleungac@connect.ust.hk
>
parent
b03b1b97
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
6 additions
and
5 deletions
+6
-5
docs/configuration/optimization.md
docs/configuration/optimization.md
+6
-5
No files found.
docs/configuration/optimization.md
View file @
633f943e
...
@@ -139,9 +139,9 @@ there is relatively little gain from TP. On the other hand, TP incurs significan
...
@@ -139,9 +139,9 @@ there is relatively little gain from TP. On the other hand, TP incurs significan
overhead because of all-reduce being performed after every layer.
overhead because of all-reduce being performed after every layer.
Given this, it may be advantageous to instead shard the batched input data using TP, essentially
Given this, it may be advantageous to instead shard the batched input data using TP, essentially
performing batch-level DP. This has been shown to improve the throughput by around 10% for
performing batch-level DP. This has been shown to improve the throughput
and TTFT
by around 10% for
`tensor_parallel_size=8`
. For vision encoders that use hardware-unoptimized Conv3D operations,
`tensor_parallel_size=8`
. For vision encoders that use hardware-unoptimized Conv3D operations,
batch-level DP can provide another 40% i
ncrease to throughpu
t compared to regular TP.
batch-level DP can provide another 40% i
mprovemen
t compared to regular TP.
Nevertheless, since the weights of the multi-modal encoder are replicated across each TP rank,
Nevertheless, since the weights of the multi-modal encoder are replicated across each TP rank,
there will be a minor increase in memory consumption and may cause OOM if you can barely fit the model already.
there will be a minor increase in memory consumption and may cause OOM if you can barely fit the model already.
...
@@ -172,14 +172,15 @@ Batch-level DP needs to be implemented on a per-model basis,
...
@@ -172,14 +172,15 @@ Batch-level DP needs to be implemented on a per-model basis,
and enabled by setting
`supports_encoder_tp_data = True`
in the model class.
and enabled by setting
`supports_encoder_tp_data = True`
in the model class.
Regardless, you need to set
`mm_encoder_tp_mode="data"`
in engine arguments to use this feature.
Regardless, you need to set
`mm_encoder_tp_mode="data"`
in engine arguments to use this feature.
Known supported models:
Known supported models
(with corresponding benchmarks)
:
-
GLM-4.5V GLM-4.1V (
<gh-pr:23168>
)
-
dots_ocr (
<gh-pr:25466>
)
-
GLM-4.1V or above (
<gh-pr:23168>
)
-
InternVL (
<gh-pr:23909>
)
-
InternVL (
<gh-pr:23909>
)
-
Kimi-VL (
<gh-pr:23817>
)
-
Kimi-VL (
<gh-pr:23817>
)
-
Llama4 (
<gh-pr:18368>
)
-
Llama4 (
<gh-pr:18368>
)
-
MiniCPM-V-2.5 or above (
<gh-pr:23327>
,
<gh-pr:23948>
)
-
MiniCPM-V-2.5 or above (
<gh-pr:23327>
,
<gh-pr:23948>
)
-
Qwen2
.5
-VL (
<gh-pr:22742>
)
-
Qwen2-VL
or above
(
<gh-pr:22742>
,
<gh-pr:24955>
,
<gh-pr:25445>
)
-
Step3 (
<gh-pr:22697>
)
-
Step3 (
<gh-pr:22697>
)
## Input Processing
## Input Processing
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment