"docs/vscode:/vscode.git/clone" did not exist on "43569381ce3779f9bf6084da917556db210d745d"
Unverified Commit 4e9c2c39 authored by Alp Dener's avatar Alp Dener Committed by GitHub
Browse files

[PyTorch] Fix cuBLAS workspace leak in applications that initialize+destroy...


[PyTorch] Fix cuBLAS workspace leak in applications that initialize+destroy Userbuffers more than once (#1715)

safeguarded cuBLAS workspace expansion in initialize_ub() to avoid exponential growth across repeat initializations
Signed-off-by: default avatarAlp Dener <adener@nvidia.com>
Co-authored-by: default avatarKirthi Shankar Sivamani <ksivamani@nvidia.com>
parent 8ace813c
......@@ -230,9 +230,15 @@ def initialize_ub(
flush=True,
)
# Increase the workspace by the number of maximum concurrent streams
# Allocate cuBLAS workspace with expanded size for chunking in overlapping GEMM calls
global _cublas_workspace
if _cublas_workspace is None:
_cublas_workspace = get_workspace().repeat(_NUM_MAX_UB_STREAMS)
elif _cublas_workspace.numel() != get_cublas_workspace_size_bytes() * _NUM_MAX_UB_STREAMS:
# This ensures we don't do `.repeat()` on an already expanded workspace
_cublas_workspace = torch.empty(
get_cublas_workspace_size_bytes(), dtype=torch.uint8, device="cuda"
).repeat(_NUM_MAX_UB_STREAMS)
# Default buffer precision: AllGather buffers use fp8 when using fp8 recipe
layers_all_gather_overlap = [
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment