[Bugfix] Add strong reference to CUDA pluggable allocator callbacks (#23477)

Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Eric Marcus <eric.marcus@kaiko.ai> Co-authored-by: youkaichao <youkaichao@gmail.com>

[Bugfix] Add strong reference to CUDA pluggable allocator callbacks (#23477)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Eric Marcus <eric.marcus@kaiko.ai> Co-authored-by: youkaichao <youkaichao@gmail.com>
9dc30b70 · 22quinn · GitHub · 053278a5 · 9dc30b70
Unverified Commit 9dc30b70 authored Aug 23, 2025 by 22quinn Committed by GitHub Aug 24, 2025
Show whitespace changes
Inline Side-by-side

Showing with 9 additions and 4 deletions

vllm/device_allocator/cumem.py vllm/device_allocator/cumem.py +9 -4

No files found.
--- a/vllm/device_allocator/cumem.py
+++ b/vllm/device_allocator/cumem.py
@@ -152,8 +152,13 @@ class CuMemAllocator:
        self.pointer_to_data: dict[int, AllocationData] = {}
        self.current_tag: str = CuMemAllocator.default_tag
        self.allocator_and_pools: dict[str, Any] = {}
+        # Creating strong references to the two callbacks here to prevent
+        # these ephemeral bound-method objects being garbage collected.
+        # See discussions in https://github.com/vllm-project/vllm/pull/22724
+        self.python_malloc_callback = self._python_malloc_callback
+        self.python_free_callback = self._python_free_callback

-    def python_malloc_callback(self, allocation_handle: HandleType) -> None:
+    def _python_malloc_callback(self, allocation_handle: HandleType) -> None:
        """
        Internal method to store the allocation data
        when memory is allocated in the memory pool."""
@@ -162,7 +167,7 @@ class CuMemAllocator:
            allocation_handle, self.current_tag)
        return

-    def python_free_callback(self, ptr: int) -> HandleType:
+    def _python_free_callback(self, ptr: int) -> HandleType:
        """
        Internal method to look up the allocation data
        when memory is freed in the memory pool."""