Unverified Commit 32e84fa1 authored by Isotr0py's avatar Isotr0py Committed by GitHub
Browse files

[CI/Build] Investigate torchrun distributed tests hanging issue (#33650)


Signed-off-by: default avatarIsotr0py <mozf@mail2.sysu.edu.cn>
parent fd9c83d0
...@@ -32,6 +32,9 @@ llm = LLM( ...@@ -32,6 +32,9 @@ llm = LLM(
gpu_memory_utilization=random.uniform(0.7, 0.9), gpu_memory_utilization=random.uniform(0.7, 0.9),
swap_space=random.randint(1, 4), swap_space=random.randint(1, 4),
seed=0, seed=0,
# FIXME(Isotr0py): async scheduling causes deadlock
# on torchrun with PP, need to investigate further.
async_scheduling=False,
) )
outputs = llm.generate(prompts, sampling_params) outputs = llm.generate(prompts, sampling_params)
......
...@@ -39,6 +39,9 @@ llm = LLM( ...@@ -39,6 +39,9 @@ llm = LLM(
gpu_memory_utilization=random.uniform(0.7, 0.9), gpu_memory_utilization=random.uniform(0.7, 0.9),
swap_space=random.randint(1, 4), swap_space=random.randint(1, 4),
seed=0, seed=0,
# FIXME(Isotr0py): async scheduling causes deadlock
# on torchrun with PP, need to investigate further.
async_scheduling=False,
) )
outputs = llm.generate(prompts, sampling_params) outputs = llm.generate(prompts, sampling_params)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment