Unverified Commit 633f943e authored by Cyrus Leung's avatar Cyrus Leung Committed by GitHub
Browse files

[Doc] Update Batch-level DP docs (#25757)


Signed-off-by: default avatarDarkLight1337 <tlleungac@connect.ust.hk>
parent b03b1b97
...@@ -139,9 +139,9 @@ there is relatively little gain from TP. On the other hand, TP incurs significan ...@@ -139,9 +139,9 @@ there is relatively little gain from TP. On the other hand, TP incurs significan
overhead because of all-reduce being performed after every layer. overhead because of all-reduce being performed after every layer.
Given this, it may be advantageous to instead shard the batched input data using TP, essentially Given this, it may be advantageous to instead shard the batched input data using TP, essentially
performing batch-level DP. This has been shown to improve the throughput by around 10% for performing batch-level DP. This has been shown to improve the throughput and TTFT by around 10% for
`tensor_parallel_size=8`. For vision encoders that use hardware-unoptimized Conv3D operations, `tensor_parallel_size=8`. For vision encoders that use hardware-unoptimized Conv3D operations,
batch-level DP can provide another 40% increase to throughput compared to regular TP. batch-level DP can provide another 40% improvement compared to regular TP.
Nevertheless, since the weights of the multi-modal encoder are replicated across each TP rank, Nevertheless, since the weights of the multi-modal encoder are replicated across each TP rank,
there will be a minor increase in memory consumption and may cause OOM if you can barely fit the model already. there will be a minor increase in memory consumption and may cause OOM if you can barely fit the model already.
...@@ -172,14 +172,15 @@ Batch-level DP needs to be implemented on a per-model basis, ...@@ -172,14 +172,15 @@ Batch-level DP needs to be implemented on a per-model basis,
and enabled by setting `supports_encoder_tp_data = True` in the model class. and enabled by setting `supports_encoder_tp_data = True` in the model class.
Regardless, you need to set `mm_encoder_tp_mode="data"` in engine arguments to use this feature. Regardless, you need to set `mm_encoder_tp_mode="data"` in engine arguments to use this feature.
Known supported models: Known supported models (with corresponding benchmarks):
- GLM-4.5V GLM-4.1V (<gh-pr:23168>) - dots_ocr (<gh-pr:25466>)
- GLM-4.1V or above (<gh-pr:23168>)
- InternVL (<gh-pr:23909>) - InternVL (<gh-pr:23909>)
- Kimi-VL (<gh-pr:23817>) - Kimi-VL (<gh-pr:23817>)
- Llama4 (<gh-pr:18368>) - Llama4 (<gh-pr:18368>)
- MiniCPM-V-2.5 or above (<gh-pr:23327>, <gh-pr:23948>) - MiniCPM-V-2.5 or above (<gh-pr:23327>, <gh-pr:23948>)
- Qwen2.5-VL (<gh-pr:22742>) - Qwen2-VL or above (<gh-pr:22742>, <gh-pr:24955>, <gh-pr:25445>)
- Step3 (<gh-pr:22697>) - Step3 (<gh-pr:22697>)
## Input Processing ## Input Processing
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment