"...git@developer.sourcefind.cn:2222/OpenDAS/vllm_cscc.git" did not exist on "63e7176f265be43dcc425f5ab4ab45c90234f5c3"
Unverified Commit a116f969 authored by sbeurnier's avatar sbeurnier Committed by GitHub
Browse files

[V1] Remove pin_memory() in async_copy_to_gpu to fix sporadic stalls (#37006)


Signed-off-by: default avatarSebastien Beurnier <sbeurnier@together.ai>
parent 092ace9e
...@@ -27,12 +27,10 @@ def async_copy_to_gpu( ...@@ -27,12 +27,10 @@ def async_copy_to_gpu(
assert device is not None assert device is not None
out = torch.empty_like(x, device=device) out = torch.empty_like(x, device=device)
# CPU-to-CPU copy # Copy directly to GPU — explicit pin_memory() causes sporadic stalls
tmp = x.pin_memory() # under high concurrency due to CUDA driver contention. The driver
assert tmp is not x # handles the transfer efficiently without manual pinning.
return out.copy_(x, non_blocking=True)
# CPU-to-GPU copy
return out.copy_(tmp, non_blocking=True)
class UvaBuffer: class UvaBuffer:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment