Unverified Commit c2462129 authored by youkaichao's avatar youkaichao Committed by GitHub
Browse files

[doc][faq] add warning to download models for every nodes (#5783)

parent edd5fe5f
...@@ -36,3 +36,6 @@ To scale vLLM beyond a single machine, install and start a `Ray runtime <https:/ ...@@ -36,3 +36,6 @@ To scale vLLM beyond a single machine, install and start a `Ray runtime <https:/
$ ray start --address=<ray-head-address> $ ray start --address=<ray-head-address>
After that, you can run inference and serving on multiple machines by launching the vLLM process on the head node by setting :code:`tensor_parallel_size` to the number of GPUs to be the total number of GPUs across all machines. After that, you can run inference and serving on multiple machines by launching the vLLM process on the head node by setting :code:`tensor_parallel_size` to the number of GPUs to be the total number of GPUs across all machines.
.. warning::
Please make sure you downloaded the model to all the nodes, or the model is downloaded to some distributed file system that is accessible by all nodes.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment