Commit 96ae75ad authored by zhuwenwen's avatar zhuwenwen
Browse files

Merge tag 'v0.6.6.post1' into v0.6.6.post1-dev

parents f9f4a735 2339d59f
...@@ -2,13 +2,14 @@ ...@@ -2,13 +2,14 @@
## Debugging ## Debugging
Please see the [Debugging Please see the [Debugging Tips](#debugging-python-multiprocessing)
Tips](https://docs.vllm.ai/en/latest/getting_started/debugging.html#python-multiprocessing)
page for information on known issues and how to solve them. page for information on known issues and how to solve them.
## Introduction ## Introduction
*Note that source code references are to the state of the code at the time of writing in December, 2024.* ```{important}
The source code references are to the state of the code at the time of writing in December, 2024.
```
The use of Python multiprocessing in vLLM is complicated by: The use of Python multiprocessing in vLLM is complicated by:
...@@ -20,7 +21,7 @@ This document describes how vLLM deals with these challenges. ...@@ -20,7 +21,7 @@ This document describes how vLLM deals with these challenges.
## Multiprocessing Methods ## Multiprocessing Methods
[Python multiprocessing methods](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods) include: [Python multiprocessing methods](https://docs.python.org/3/library/multiprocessing.html.md#contexts-and-start-methods) include:
- `spawn` - spawn a new Python process. This will be the default as of Python - `spawn` - spawn a new Python process. This will be the default as of Python
3.14. 3.14.
...@@ -82,7 +83,7 @@ There are other miscellaneous places hard-coding the use of `spawn`: ...@@ -82,7 +83,7 @@ There are other miscellaneous places hard-coding the use of `spawn`:
Related PRs: Related PRs:
- <https://github.com/vllm-project/vllm/pull/8823> - <gh-pr:8823>
## Prior State in v1 ## Prior State in v1
...@@ -96,7 +97,7 @@ engine core. ...@@ -96,7 +97,7 @@ engine core.
- <https://github.com/vllm-project/vllm/blob/d05f88679bedd73939251a17c3d785a354b2946c/vllm/v1/engine/llm_engine.py#L93-L95> - <https://github.com/vllm-project/vllm/blob/d05f88679bedd73939251a17c3d785a354b2946c/vllm/v1/engine/llm_engine.py#L93-L95>
- <https://github.com/vllm-project/vllm/blob/d05f88679bedd73939251a17c3d785a354b2946c/vllm/v1/engine/llm_engine.py#L70-L77> - <https://github.com/vllm-project/vllm/blob/d05f88679bedd73939251a17c3d785a354b2946c/vllm/v1/engine/llm_engine.py#L70-L77>
- https://github.com/vllm-project/vllm/blob/d05f88679bedd73939251a17c3d785a354b2946c/vllm/v1/engine/core_client.py#L44-L45 - <https://github.com/vllm-project/vllm/blob/d05f88679bedd73939251a17c3d785a354b2946c/vllm/v1/engine/core_client.py#L44-L45>
It was off by default for all the reasons mentioned above - compatibility with It was off by default for all the reasons mentioned above - compatibility with
dependencies and code using vLLM as a library. dependencies and code using vLLM as a library.
...@@ -119,8 +120,8 @@ instruct users to either add a `__main__` guard or to disable multiprocessing. ...@@ -119,8 +120,8 @@ instruct users to either add a `__main__` guard or to disable multiprocessing.
If that known-failure case occurs, the user will see two messages that explain If that known-failure case occurs, the user will see two messages that explain
what is happening. First, a log message from vLLM: what is happening. First, a log message from vLLM:
``` ```console
WARNING 12-11 14:50:37 multiproc_worker_utils.py:281] CUDA was previously WARNING 12-11 14:50:37 multiproc_worker_utils.py:281] CUDA was previously
initialized. We must use the `spawn` multiprocessing start method. Setting initialized. We must use the `spawn` multiprocessing start method. Setting
VLLM_WORKER_MULTIPROC_METHOD to 'spawn'. See VLLM_WORKER_MULTIPROC_METHOD to 'spawn'. See
https://docs.vllm.ai/en/latest/getting_started/debugging.html#python-multiprocessing https://docs.vllm.ai/en/latest/getting_started/debugging.html#python-multiprocessing
...@@ -129,7 +130,7 @@ what is happening. First, a log message from vLLM: ...@@ -129,7 +130,7 @@ what is happening. First, a log message from vLLM:
Second, Python itself will raise an exception with a nice explanation: Second, Python itself will raise an exception with a nice explanation:
``` ```console
RuntimeError: RuntimeError:
An attempt has been made to start a new process before the An attempt has been made to start a new process before the
current process has finished its bootstrapping phase. current process has finished its bootstrapping phase.
......
(plugin-system)=
# vLLM's Plugin System
The community frequently requests the ability to extend vLLM with custom features. To facilitate this, vLLM includes a plugin system that allows users to add custom features without modifying the vLLM codebase. This document explains how plugins work in vLLM and how to create a plugin for vLLM.
## How Plugins Work in vLLM
Plugins are user-registered code that vLLM executes. Given vLLM's architecture (see [](#arch-overview)), multiple processes may be involved, especially when using distributed inference with various parallelism techniques. To enable plugins successfully, every process created by vLLM needs to load the plugin. This is done by the [load_general_plugins](https://github.com/vllm-project/vllm/blob/c76ac49d266e27aa3fea84ef2df1f813d24c91c7/vllm/plugins/__init__.py#L16) function in the `vllm.plugins` module. This function is called for every process created by vLLM before it starts any work.
## How vLLM Discovers Plugins
vLLM's plugin system uses the standard Python `entry_points` mechanism. This mechanism allows developers to register functions in their Python packages for use by other packages. An example of a plugin:
```python
# inside `setup.py` file
from setuptools import setup
setup(name='vllm_add_dummy_model',
version='0.1',
packages=['vllm_add_dummy_model'],
entry_points={
'vllm.general_plugins':
["register_dummy_model = vllm_add_dummy_model:register"]
})
# inside `vllm_add_dummy_model.py` file
def register():
from vllm import ModelRegistry
if "MyLlava" not in ModelRegistry.get_supported_archs():
ModelRegistry.register_model("MyLlava",
"vllm_add_dummy_model.my_llava:MyLlava")
```
For more information on adding entry points to your package, please check the [official documentation](https://setuptools.pypa.io/en/latest/userguide/entry_point.html).
Every plugin has three parts:
1. **Plugin group**: The name of the entry point group. vLLM uses the entry point group `vllm.general_plugins` to register general plugins. This is the key of `entry_points` in the `setup.py` file. Always use `vllm.general_plugins` for vLLM's general plugins.
2. **Plugin name**: The name of the plugin. This is the value in the dictionary of the `entry_points` dictionary. In the example above, the plugin name is `register_dummy_model`. Plugins can be filtered by their names using the `VLLM_PLUGINS` environment variable. To load only a specific plugin, set `VLLM_PLUGINS` to the plugin name.
3. **Plugin value**: The fully qualified name of the function to register in the plugin system. In the example above, the plugin value is `vllm_add_dummy_model:register`, which refers to a function named `register` in the `vllm_add_dummy_model` module.
## What Can Plugins Do?
Currently, the primary use case for plugins is to register custom, out-of-the-tree models into vLLM. This is done by calling `ModelRegistry.register_model` to register the model. In the future, the plugin system may be extended to support more features, such as swapping in custom implementations for certain classes in vLLM.
## Guidelines for Writing Plugins
- **Being re-entrant**: The function specified in the entry point should be re-entrant, meaning it can be called multiple times without causing issues. This is necessary because the function might be called multiple times in some processes.
## Compatibility Guarantee
vLLM guarantees the interface of documented plugins, such as `ModelRegistry.register_model`, will always be available for plugins to register models. However, it is the responsibility of plugin developers to ensure their plugins are compatible with the version of vLLM they are targeting. For example, `"vllm_add_dummy_model.my_llava:MyLlava"` should be compatible with the version of vLLM that the plugin targets. The interface for the model may change during vLLM's development.
This diff is collapsed.
AsyncLLMEngine # AsyncLLMEngine
=================================
```{eval-rst}
.. autoclass:: vllm.AsyncLLMEngine .. autoclass:: vllm.AsyncLLMEngine
:members: :members:
:show-inheritance: :show-inheritance:
```
LLMEngine # LLMEngine
=================================
```{eval-rst}
.. autoclass:: vllm.LLMEngine .. autoclass:: vllm.LLMEngine
:members: :members:
:show-inheritance: :show-inheritance:
```
LLM Class # LLM Class
=========
```{eval-rst}
.. autoclass:: vllm.LLM .. autoclass:: vllm.LLM
:members: :members:
:show-inheritance: :show-inheritance:
```
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment