Unverified Commit 5cc54f7c authored by Cyrus Leung's avatar Cyrus Leung Committed by GitHub
Browse files

[Doc] Fix batch-level DP example (#23325)


Signed-off-by: default avatarDarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: default avatarCyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: default avataryoukaichao <youkaichao@gmail.com>
parent 0c6e40bb
...@@ -153,13 +153,14 @@ from vllm import LLM ...@@ -153,13 +153,14 @@ from vllm import LLM
llm = LLM( llm = LLM(
model="Qwen/Qwen2.5-VL-72B-Instruct", model="Qwen/Qwen2.5-VL-72B-Instruct",
# Create two EngineCore instances, one per DP rank
data_parallel_size=2,
# Within each EngineCore instance:
# The vision encoder uses TP=4 (not DP=2) to shard the input data
# The language decoder uses TP=4 to shard the weights as usual
tensor_parallel_size=4, tensor_parallel_size=4,
# When mm_encoder_tp_mode="data",
# the vision encoder uses TP=4 (not DP=1) to shard the input data,
# so the TP size becomes the effective DP size.
# Note that this is independent of the DP size for language decoder which is used in expert parallel setting.
mm_encoder_tp_mode="data", mm_encoder_tp_mode="data",
# The language decoder uses TP=4 to shard the weights regardless
# of the setting of mm_encoder_tp_mode
) )
``` ```
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment