"ssh:/git@developer.sourcefind.cn:2222/OpenDAS/vllm_cscc.git" did not exist on "862f2ef893d9751db0a92bd2d4ae0e3d9677872f"
Unverified Commit 4e0f6076 authored by Kebe's avatar Kebe Committed by GitHub
Browse files

[Bugfix] Fix failure to launch in Tensor Parallel TP mode on macOS. (#14948)


Signed-off-by: default avatarKebe <mail@kebe7jun.com>
Signed-off-by: default avataryoukaichao <youkaichao@gmail.com>
Co-authored-by: default avataryoukaichao <youkaichao@gmail.com>
parent 726efc6a
...@@ -24,7 +24,7 @@ This document describes how vLLM deals with these challenges. ...@@ -24,7 +24,7 @@ This document describes how vLLM deals with these challenges.
[Python multiprocessing methods](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods) include: [Python multiprocessing methods](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods) include:
- `spawn` - spawn a new Python process. This will be the default as of Python - `spawn` - spawn a new Python process. This will be the default as of Python
3.14. 3.14. In macOS, this is already the default.
- `fork` - Use `os.fork()` to fork the Python interpreter. This is the default - `fork` - Use `os.fork()` to fork the Python interpreter. This is the default
in Python versions prior to 3.14. in Python versions prior to 3.14.
...@@ -34,7 +34,7 @@ This document describes how vLLM deals with these challenges. ...@@ -34,7 +34,7 @@ This document describes how vLLM deals with these challenges.
### Tradeoffs ### Tradeoffs
`fork` is the fastest method, but is incompatible with dependencies that use `fork` is the fastest method, but is incompatible with dependencies that use
threads. threads. If you are under macOS, using `fork` may cause the process to crash.
`spawn` is more compatible with dependencies, but can be problematic when vLLM `spawn` is more compatible with dependencies, but can be problematic when vLLM
is used as a library. If the consuming code does not use a `__main__` guard (`if is used as a library. If the consuming code does not use a `__main__` guard (`if
......
...@@ -125,8 +125,13 @@ class ShmRingBuffer: ...@@ -125,8 +125,13 @@ class ShmRingBuffer:
lambda *args, **kwargs: None): lambda *args, **kwargs: None):
try: try:
self.shared_memory = shared_memory.SharedMemory(name=name) self.shared_memory = shared_memory.SharedMemory(name=name)
assert ( # See https://docs.python.org/3/library/multiprocessing.shared_memory.html # noqa
self.shared_memory.size == self.total_bytes_of_buffer) # Some platforms allocate memory based on page size,
# so the shared memory block size may be larger or equal
# to the requested size. The size parameter is ignored
# when attaching to an existing block.
assert (self.shared_memory.size
>= self.total_bytes_of_buffer)
except FileNotFoundError: except FileNotFoundError:
# we might deserialize the object in a different node # we might deserialize the object in a different node
# in this case, this object is not used, # in this case, this object is not used,
......
# SPDX-License-Identifier: Apache-2.0 # SPDX-License-Identifier: Apache-2.0
import os import os
import sys
from typing import TYPE_CHECKING, Optional from typing import TYPE_CHECKING, Optional
import psutil import psutil
...@@ -148,6 +149,13 @@ class CpuPlatform(Platform): ...@@ -148,6 +149,13 @@ class CpuPlatform(Platform):
# To hint IPEX uses shared memory based AllReduce # To hint IPEX uses shared memory based AllReduce
os.environ["LOCAL_WORLD_SIZE"] = str( os.environ["LOCAL_WORLD_SIZE"] = str(
vllm_config.parallel_config.tensor_parallel_size) vllm_config.parallel_config.tensor_parallel_size)
if sys.platform == "darwin" and \
envs.VLLM_WORKER_MULTIPROC_METHOD == "fork":
if os.environ.get('VLLM_WORKER_MULTIPROC_METHOD', None) is None:
logger.warning(
"Default to spawn method on MacOS. If this is not desired,"
" set VLLM_WORKER_MULTIPROC_METHOD to fork explicitly.")
os.environ['VLLM_WORKER_MULTIPROC_METHOD'] = 'spawn'
@classmethod @classmethod
def is_pin_memory_available(cls) -> bool: def is_pin_memory_available(cls) -> bool:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment