Merge tag 'v0.6.6.post1' into v0.6.6.post1-dev

96ae75ad · zhuwenwen · f9f4a735 · 2339d59f · 96ae75ad · 96ae75ad
Commit 96ae75ad authored Jan 04, 2025 by zhuwenwen
20 changed files
--- a/docs/source/design/multiprocessing.md
+++ b/docs/source/design/multiprocessing.md
@@ -2,13 +2,14 @@

 ## Debugging

-Please see the [Debugging
-Tips](https://docs.vllm.ai/en/latest/getting_started/debugging.html#python-multiprocessing)
+Please see the [Debugging Tips](#debugging-python-multiprocessing)
 page for information on known issues and how to solve them.

 ## Introduction

-*Note that source code references are to the state of the code at the time of writing in December, 2024.*
+```{important}
+The source code references are to the state of the code at the time of writing in December, 2024.
+```

 The use of Python multiprocessing in vLLM is complicated by:

@@ -20,7 +21,7 @@ This document describes how vLLM deals with these challenges.

 ## Multiprocessing Methods

-[Python multiprocessing methods](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods) include:
+[Python multiprocessing methods](https://docs.python.org/3/library/multiprocessing.html.md#contexts-and-start-methods) include:

 - `spawn` - spawn a new Python process. This will be the default as of Python
  3.14.
@@ -82,7 +83,7 @@ There are other miscellaneous places hard-coding the use of `spawn`:

 Related PRs:

- <https://github.com/vllm-project/vllm/pull/8823>
+- <gh-pr:8823>

 ## Prior State in v1

@@ -96,7 +97,7 @@ engine core.

 - <https://github.com/vllm-project/vllm/blob/d05f88679bedd73939251a17c3d785a354b2946c/vllm/v1/engine/llm_engine.py#L93-L95>
 - <https://github.com/vllm-project/vllm/blob/d05f88679bedd73939251a17c3d785a354b2946c/vllm/v1/engine/llm_engine.py#L70-L77>
- https://github.com/vllm-project/vllm/blob/d05f88679bedd73939251a17c3d785a354b2946c/vllm/v1/engine/core_client.py#L44-L45
+- <https://github.com/vllm-project/vllm/blob/d05f88679bedd73939251a17c3d785a354b2946c/vllm/v1/engine/core_client.py#L44-L45>

 It was off by default for all the reasons mentioned above - compatibility with
 dependencies and code using vLLM as a library.
@@ -119,17 +120,17 @@ instruct users to either add a `__main__` guard or to disable multiprocessing.
 If that known-failure case occurs, the user will see two messages that explain
 what is happening. First, a log message from vLLM:

-```
-    WARNING 12-11 14:50:37 multiproc_worker_utils.py:281] CUDA was previously
-      initialized. We must use the `spawn` multiprocessing start method. Setting
-      VLLM_WORKER_MULTIPROC_METHOD to 'spawn'. See
-      https://docs.vllm.ai/en/latest/getting_started/debugging.html#python-multiprocessing
-      for more information.
+```console
+WARNING 12-11 14:50:37 multiproc_worker_utils.py:281] CUDA was previously
+    initialized. We must use the `spawn` multiprocessing start method. Setting
+    VLLM_WORKER_MULTIPROC_METHOD to 'spawn'. See
+    https://docs.vllm.ai/en/latest/getting_started/debugging.html#python-multiprocessing
+    for more information.
 ```

 Second, Python itself will raise an exception with a nice explanation:

-```
+```console
 RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

--- a/docs/source/design/plugin_system.md
+++ b/docs/source/design/plugin_system.md
--- a/docs/source/design/plugin_system.rst
+++ b/docs/source/design/plugin_system.rst
--- a/docs/source/dev/engine/async_llm_engine.rst
+++ b/docs/source/dev/engine/async_llm_engine.rst
-AsyncLLMEngine
-=================================
+# AsyncLLMEngine

+```{eval-rst}
 .. autoclass:: vllm.AsyncLLMEngine
    :members:
    :show-inheritance:
+```
--- a/docs/source/dev/engine/engine_index.rst
+++ b/docs/source/dev/engine/engine_index.rst
--- a/docs/source/dev/engine/llm_engine.rst
+++ b/docs/source/dev/engine/llm_engine.rst
-LLMEngine
-=================================
+# LLMEngine

+```{eval-rst}
 .. autoclass:: vllm.LLMEngine
    :members:
    :show-inheritance:
+```
--- a/docs/source/dev/offline_inference/llm.rst
+++ b/docs/source/dev/offline_inference/llm.rst
--- a/docs/source/dev/offline_inference/llm_inputs.rst
+++ b/docs/source/dev/offline_inference/llm_inputs.rst
--- a/docs/source/dev/offline_inference/offline_index.md
+++ b/docs/source/dev/offline_inference/offline_index.md
--- a/docs/source/dev/offline_inference/offline_index.rst
+++ b/docs/source/dev/offline_inference/offline_index.rst
--- a/docs/source/dev/pooling_params.rst
+++ b/docs/source/dev/pooling_params.rst
--- a/docs/source/dev/sampling_params.rst
+++ b/docs/source/dev/sampling_params.rst
--- a/docs/source/generate_examples.py
+++ b/docs/source/generate_examples.py
--- a/docs/source/getting_started/amd-installation.md
+++ b/docs/source/getting_started/amd-installation.md
--- a/docs/source/getting_started/arm-installation.rst
+++ b/docs/source/getting_started/arm-installation.rst
--- a/docs/source/getting_started/cpu-installation.md
+++ b/docs/source/getting_started/cpu-installation.md
--- a/docs/source/getting_started/cpu-installation.rst
+++ b/docs/source/getting_started/cpu-installation.rst
--- a/docs/source/getting_started/debugging.md
+++ b/docs/source/getting_started/debugging.md
--- a/docs/source/getting_started/debugging.rst
+++ b/docs/source/getting_started/debugging.rst
--- a/docs/source/getting_started/examples/examples_index.template.md
+++ b/docs/source/getting_started/examples/examples_index.template.md