"examples/offline_inference/vision_language_multi_image.py" did not exist on "aba8d6ee006b78149ac4514f460e4038b2d4f607"
Unverified Commit 45ceb85a authored by Murali Andoorveedu's avatar Murali Andoorveedu Committed by GitHub
Browse files

[Docs] Update PP docs (#6598)

parent 4cc24f01
...@@ -44,11 +44,10 @@ You can also additionally specify :code:`--pipeline-parallel-size` to enable pip ...@@ -44,11 +44,10 @@ You can also additionally specify :code:`--pipeline-parallel-size` to enable pip
$ vllm serve gpt2 \ $ vllm serve gpt2 \
$ --tensor-parallel-size 4 \ $ --tensor-parallel-size 4 \
$ --pipeline-parallel-size 2 \ $ --pipeline-parallel-size 2
$ --distributed-executor-backend ray
.. note:: .. note::
Pipeline parallel is a beta feature. It is only supported for online serving and the ray backend for now, as well as LLaMa and GPT2 style models. Pipeline parallel is a beta feature. It is only supported for online serving as well as LLaMa, GPT2, and Mixtral style models.
To scale vLLM beyond a single machine, install and start a `Ray runtime <https://docs.ray.io/en/latest/ray-core/starting-ray.html>`_ via CLI before running vLLM: To scale vLLM beyond a single machine, install and start a `Ray runtime <https://docs.ray.io/en/latest/ray-core/starting-ray.html>`_ via CLI before running vLLM:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment