[Disagg][Perf] Use CUDA event sync instead of blocking `tolist` to avoid...
[Disagg][Perf] Use CUDA event sync instead of blocking `tolist` to avoid unintentional copy ops blocking across different CUDA streams, improving disagg TTIT/TTFT (#22760) Signed-off-by:Zijing Liu <liuzijing2014@gmail.com> Signed-off-by:
Zijing Liu <liuzijing2014@users.noreply.github.com>
Showing
Please register or sign in to comment