Unverified Commit 9dc30b70 authored by 22quinn's avatar 22quinn Committed by GitHub
Browse files

[Bugfix] Add strong reference to CUDA pluggable allocator callbacks (#23477)


Signed-off-by: default avatar22quinn <33176974+22quinn@users.noreply.github.com>
Signed-off-by: default avataryoukaichao <youkaichao@gmail.com>
Co-authored-by: default avatarEric Marcus <eric.marcus@kaiko.ai>
Co-authored-by: default avataryoukaichao <youkaichao@gmail.com>
parent 053278a5
...@@ -152,8 +152,13 @@ class CuMemAllocator: ...@@ -152,8 +152,13 @@ class CuMemAllocator:
self.pointer_to_data: dict[int, AllocationData] = {} self.pointer_to_data: dict[int, AllocationData] = {}
self.current_tag: str = CuMemAllocator.default_tag self.current_tag: str = CuMemAllocator.default_tag
self.allocator_and_pools: dict[str, Any] = {} self.allocator_and_pools: dict[str, Any] = {}
# Creating strong references to the two callbacks here to prevent
# these ephemeral bound-method objects being garbage collected.
# See discussions in https://github.com/vllm-project/vllm/pull/22724
self.python_malloc_callback = self._python_malloc_callback
self.python_free_callback = self._python_free_callback
def python_malloc_callback(self, allocation_handle: HandleType) -> None: def _python_malloc_callback(self, allocation_handle: HandleType) -> None:
""" """
Internal method to store the allocation data Internal method to store the allocation data
when memory is allocated in the memory pool.""" when memory is allocated in the memory pool."""
...@@ -162,7 +167,7 @@ class CuMemAllocator: ...@@ -162,7 +167,7 @@ class CuMemAllocator:
allocation_handle, self.current_tag) allocation_handle, self.current_tag)
return return
def python_free_callback(self, ptr: int) -> HandleType: def _python_free_callback(self, ptr: int) -> HandleType:
""" """
Internal method to look up the allocation data Internal method to look up the allocation data
when memory is freed in the memory pool.""" when memory is freed in the memory pool."""
...@@ -212,9 +217,9 @@ class CuMemAllocator: ...@@ -212,9 +217,9 @@ class CuMemAllocator:
def wake_up(self, tags: Optional[list[str]] = None) -> None: def wake_up(self, tags: Optional[list[str]] = None) -> None:
""" """
Wake up the allocator from sleep mode. Wake up the allocator from sleep mode.
All data that is previously offloaded will be loaded back to GPU All data that is previously offloaded will be loaded back to GPU
memory, and the rest of the data will have empty memory. memory, and the rest of the data will have empty memory.
:param tags: The tags of the memory allocation that will be loaded :param tags: The tags of the memory allocation that will be loaded
back to GPU memory. If None, all memory allocation will be loaded back to GPU memory. If None, all memory allocation will be loaded
back to GPU memory. back to GPU memory.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment