Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
da554f93
Unverified
Commit
da554f93
authored
Oct 01, 2025
by
Wentao Ye
Committed by
GitHub
Oct 01, 2025
Browse files
[Bug] Fix Negative Cuda Memory Usage (#25683)
Signed-off-by:
yewentao256
<
zhyanwentao@126.com
>
parent
aac622e0
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
2 deletions
+4
-2
vllm/v1/worker/gpu_model_runner.py
vllm/v1/worker/gpu_model_runner.py
+4
-2
No files found.
vllm/v1/worker/gpu_model_runner.py
View file @
da554f93
...
...
@@ -3517,7 +3517,6 @@ class GPUModelRunner(LoRAModelRunnerMixin, KVConnectorModelRunnerMixin):
compilation_counter
.
num_gpu_runner_capture_triggers
+=
1
start_time
=
time
.
perf_counter
()
start_free_gpu_memory
=
torch
.
cuda
.
mem_get_info
()[
0
]
@
contextmanager
def
freeze_gc
():
...
...
@@ -3540,6 +3539,7 @@ class GPUModelRunner(LoRAModelRunnerMixin, KVConnectorModelRunnerMixin):
# can reuse the memory pool allocated for the large shapes.
set_cudagraph_capturing_enabled
(
True
)
with
freeze_gc
(),
graph_capture
(
device
=
self
.
device
):
start_free_gpu_memory
=
torch
.
cuda
.
mem_get_info
()[
0
]
cudagraph_mode
=
self
.
compilation_config
.
cudagraph_mode
assert
cudagraph_mode
is
not
None
if
cudagraph_mode
.
mixed_mode
()
!=
CUDAGraphMode
.
NONE
:
...
...
@@ -3568,6 +3568,9 @@ class GPUModelRunner(LoRAModelRunnerMixin, KVConnectorModelRunnerMixin):
cudagraph_runtime_mode
=
CUDAGraphMode
.
FULL
,
uniform_decode
=
True
)
torch
.
cuda
.
synchronize
()
end_free_gpu_memory
=
torch
.
cuda
.
mem_get_info
()[
0
]
# Disable cudagraph capturing globally, so any unexpected cudagraph
# capturing will be detected and raise an error after here.
# Note: We don't put it into graph_capture context manager because
...
...
@@ -3576,7 +3579,6 @@ class GPUModelRunner(LoRAModelRunnerMixin, KVConnectorModelRunnerMixin):
set_cudagraph_capturing_enabled
(
False
)
end_time
=
time
.
perf_counter
()
end_free_gpu_memory
=
torch
.
cuda
.
mem_get_info
()[
0
]
elapsed_time
=
end_time
-
start_time
cuda_graph_size
=
start_free_gpu_memory
-
end_free_gpu_memory
# This usually takes 5~20 seconds.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment