Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
fd3bfe74
Unverified
Commit
fd3bfe74
authored
Mar 05, 2026
by
Michael Yao
Committed by
GitHub
Mar 04, 2026
Browse files
[Docs] Update design/multiprocessing.md (#30677)
Signed-off-by:
windsonsea
<
haifeng.yao@daocloud.io
>
parent
bfdb512f
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
13 additions
and
17 deletions
+13
-17
docs/design/multiprocessing.md
docs/design/multiprocessing.md
+13
-17
No files found.
docs/design/multiprocessing.md
View file @
fd3bfe74
...
@@ -12,9 +12,8 @@ page for information on known issues and how to solve them.
...
@@ -12,9 +12,8 @@ page for information on known issues and how to solve them.
The use of Python multiprocessing in vLLM is complicated by:
The use of Python multiprocessing in vLLM is complicated by:
-
The use of vLLM as a library and the inability to control the code using vLLM
-
using vLLM as a library, which limits control over its internal code;
-
Varying levels of incompatibilities between multiprocessing methods and vLLM
-
incompatibilities between certain multiprocessing methods and vLLM dependencies.
dependencies
This document describes how vLLM deals with these challenges.
This document describes how vLLM deals with these challenges.
...
@@ -22,11 +21,9 @@ This document describes how vLLM deals with these challenges.
...
@@ -22,11 +21,9 @@ This document describes how vLLM deals with these challenges.
[
Python multiprocessing methods
](
https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
)
include:
[
Python multiprocessing methods
](
https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
)
include:
-
`spawn`
- spawn a new Python process. The default on Windows and macOS.
-
`spawn`
- Spawn a new Python process. The default on Windows and macOS.
-
`fork`
- Use
`os.fork()`
to fork the Python interpreter. The default on
-
`fork`
- Use
`os.fork()`
to fork the Python interpreter. The default on
Linux for Python versions prior to 3.14.
Linux for Python versions prior to 3.14.
-
`forkserver`
- Spawn a server process that will fork a new process on request.
-
`forkserver`
- Spawn a server process that will fork a new process on request.
The default on Linux for Python version 3.14 and newer.
The default on Linux for Python version 3.14 and newer.
...
@@ -36,8 +33,8 @@ This document describes how vLLM deals with these challenges.
...
@@ -36,8 +33,8 @@ This document describes how vLLM deals with these challenges.
threads. If you are under macOS, using
`fork`
may cause the process to crash.
threads. If you are under macOS, using
`fork`
may cause the process to crash.
`spawn`
is more compatible with dependencies, but can be problematic when vLLM
`spawn`
is more compatible with dependencies, but can be problematic when vLLM
is used as a library. If the consuming code does not use a
`__main__`
guard
(
`if
is used as a library. If the consuming code does not use a
`__main__`
guard
__name__ == "__main__":`
), the code will be inadvertently re-executed when vLLM
(
`if
__name__ == "__main__":`
), the code will be inadvertently re-executed when vLLM
spawns a new process. This can lead to infinite recursion, among other problems.
spawns a new process. This can lead to infinite recursion, among other problems.
`forkserver`
will spawn a new server process that will fork new processes on
`forkserver`
will spawn a new server process that will fork new processes on
...
@@ -57,8 +54,7 @@ Multiple vLLM dependencies indicate either a preference or requirement for using
...
@@ -57,8 +54,7 @@ Multiple vLLM dependencies indicate either a preference or requirement for using
-
<https://pytorch.org/docs/stable/multiprocessing.html#sharing-cuda-tensors>
-
<https://pytorch.org/docs/stable/multiprocessing.html#sharing-cuda-tensors>
-
<https://docs.habana.ai/en/latest/PyTorch/Getting_Started_with_PyTorch_and_Gaudi/Getting_Started_with_PyTorch.html?highlight=multiprocessing#torch-multiprocessing-for-dataloaders>
-
<https://docs.habana.ai/en/latest/PyTorch/Getting_Started_with_PyTorch_and_Gaudi/Getting_Started_with_PyTorch.html?highlight=multiprocessing#torch-multiprocessing-for-dataloaders>
It is perhaps more accurate to say that there are known problems with using
Known issues exist when using
`fork`
after initializing these dependencies.
`fork`
after initializing these dependencies.
## Current State (v0)
## Current State (v0)
...
@@ -66,8 +62,8 @@ The environment variable `VLLM_WORKER_MULTIPROC_METHOD` can be used to control w
...
@@ -66,8 +62,8 @@ The environment variable `VLLM_WORKER_MULTIPROC_METHOD` can be used to control w
-
<https://github.com/vllm-project/vllm/blob/d05f88679bedd73939251a17c3d785a354b2946c/vllm/envs.py#L339-L342>
-
<https://github.com/vllm-project/vllm/blob/d05f88679bedd73939251a17c3d785a354b2946c/vllm/envs.py#L339-L342>
When we know we own the process because
the
`vllm`
command
was used, we use
If the main process is controlled via
the
`vllm`
command
,
`spawn`
because it's the most widely compatible.
`spawn`
is used
because it's the most widely compatible.
-
<https://github.com/vllm-project/vllm/blob/d05f88679bedd73939251a17c3d785a354b2946c/vllm/scripts.py#L123-L140>
-
<https://github.com/vllm-project/vllm/blob/d05f88679bedd73939251a17c3d785a354b2946c/vllm/scripts.py#L123-L140>
...
@@ -104,8 +100,8 @@ dependencies and code using vLLM as a library.
...
@@ -104,8 +100,8 @@ dependencies and code using vLLM as a library.
### Changes Made in v1
### Changes Made in v1
There is not an easy solution with Python's
`multiprocessing`
that will work
There is not an easy solution with Python's
`multiprocessing`
that will work
everywhere. As a first step, we can get v1 into a state where it does
"best
everywhere. As a first step, we can get v1 into a state where it does
effort" choice of multiprocessing method to maximize compatibility.
"best
effort" choice of multiprocessing method to maximize compatibility.
-
Default to
`fork`
.
-
Default to
`fork`
.
-
Use
`spawn`
when we know we control the main process (
`vllm`
was executed).
-
Use
`spawn`
when we know we control the main process (
`vllm`
was executed).
...
@@ -154,8 +150,8 @@ RuntimeError:
...
@@ -154,8 +150,8 @@ RuntimeError:
### Detect if a `__main__` guard is present
### Detect if a `__main__` guard is present
It has been suggested that we could behave better if we could detect whether
It has been suggested that we could behave better if we could detect whether
code using vLLM as a library has a
`__main__`
guard in place. This
[
post on
code using vLLM as a library has a
`__main__`
guard in place. This
s
tack
o
verflow
](
https://stackoverflow.com/questions/77220442/multiprocessing-pool-in-a-python-class-without-name-main-guard
)
[
post on S
tack
O
verflow
](
https://stackoverflow.com/questions/77220442/multiprocessing-pool-in-a-python-class-without-name-main-guard
)
was from a library author facing the same question.
was from a library author facing the same question.
It is possible to detect whether we are in the original,
`__main__`
process, or
It is possible to detect whether we are in the original,
`__main__`
process, or
...
@@ -192,4 +188,4 @@ that works around these challenges.
...
@@ -192,4 +188,4 @@ that works around these challenges.
2.
We can explore other libraries that may better suit our needs. Examples to
2.
We can explore other libraries that may better suit our needs. Examples to
consider:
consider:
-
<https://github.com/joblib/loky>
- <https://github.com/joblib/loky>
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment