Unverified Commit c2462129 authored by youkaichao's avatar youkaichao Committed by GitHub
Browse files

[doc][faq] add warning to download models for every nodes (#5783)

parent edd5fe5f
......@@ -35,4 +35,7 @@ To scale vLLM beyond a single machine, install and start a `Ray runtime <https:/
$ # On worker nodes
$ ray start --address=<ray-head-address>
After that, you can run inference and serving on multiple machines by launching the vLLM process on the head node by setting :code:`tensor_parallel_size` to the number of GPUs to be the total number of GPUs across all machines.
\ No newline at end of file
After that, you can run inference and serving on multiple machines by launching the vLLM process on the head node by setting :code:`tensor_parallel_size` to the number of GPUs to be the total number of GPUs across all machines.
.. warning::
Please make sure you downloaded the model to all the nodes, or the model is downloaded to some distributed file system that is accessible by all nodes.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment