[Docs] Update PP docs (#6598)

45ceb85a · Murali Andoorveedu · GitHub · 4cc24f01 · 45ceb85a
Unverified Commit 45ceb85a authored Jul 19, 2024 by Murali Andoorveedu Committed by GitHub Jul 19, 2024
Show whitespace changes
Inline Side-by-side

Showing with 2 additions and 3 deletions

docs/source/serving/distributed_serving.rst docs/source/serving/distributed_serving.rst +2 -3

No files found.
--- a/docs/source/serving/distributed_serving.rst
+++ b/docs/source/serving/distributed_serving.rst
@@ -44,11 +44,10 @@ You can also additionally specify :code:`--pipeline-parallel-size` to enable pip

    $ vllm serve gpt2 \
    $     --tensor-parallel-size 4 \
-    $     --pipeline-parallel-size 2 \
-    $     --distributed-executor-backend ray
+    $     --pipeline-parallel-size 2

 .. note::
-    Pipeline parallel is a beta feature. It is only supported for online serving and the ray backend for now, as well as LLaMa and GPT2 style models.
+    Pipeline parallel is a beta feature. It is only supported for online serving as well as LLaMa, GPT2, and Mixtral style models.

 To scale vLLM beyond a single machine, install and start a `Ray runtime <https://docs.ray.io/en/latest/ray-core/starting-ray.html>`_ via CLI before running vLLM: