You can provision Cloud TPUs using the `Cloud TPU API <https://cloud.google.com/tpu/docs/reference/rest>`_`
You can provision Cloud TPUs using the `Cloud TPU API <https://cloud.google.com/tpu/docs/reference/rest>`_
or the `queued resources <https://cloud.google.com/tpu/docs/queued-resources>`_`
or the `queued resources <https://cloud.google.com/tpu/docs/queued-resources>`_
API. This section shows how to create TPUs using the queued resource API.
API. This section shows how to create TPUs using the queued resource API. For
For more information about using the Cloud TPU API, see `Create a Cloud TPU using the Create Node API <https://cloud.google.com/tpu/docs/managing-tpus-tpu-vm#create-node-api>`_.
more information about using the Cloud TPU API, see `Create a Cloud TPU using the Create Node API <https://cloud.google.com/tpu/docs/managing-tpus-tpu-vm#create-node-api>`_.
@@ -162,9 +175,11 @@ Run the Docker image with the following command:
...
@@ -162,9 +175,11 @@ Run the Docker image with the following command:
.. note::
.. note::
Since TPU relies on XLA which requires static shapes, vLLM bucketizes the possible input shapes and compiles an XLA graph for each different shape.
Since TPU relies on XLA which requires static shapes, vLLM bucketizes the
The compilation time may take 20~30 minutes in the first run.
possible input shapes and compiles an XLA graph for each shape. The
However, the compilation time reduces to ~5 minutes afterwards because the XLA graphs are cached in the disk (in :code:`VLLM_XLA_CACHE_PATH` or :code:`~/.cache/vllm/xla_cache` by default).
compilation time may take 20~30 minutes in the first run. However, the
compilation time reduces to ~5 minutes afterwards because the XLA graphs are
cached in the disk (in :code:`VLLM_XLA_CACHE_PATH` or :code:`~/.cache/vllm/xla_cache` by default).
.. tip::
.. tip::
...
@@ -173,7 +188,8 @@ Run the Docker image with the following command:
...
@@ -173,7 +188,8 @@ Run the Docker image with the following command:
.. code-block:: console
.. code-block:: console
from torch._C import * # noqa: F403
from torch._C import * # noqa: F403
ImportError: libopenblas.so.0: cannot open shared object file: No such file or directory
ImportError: libopenblas.so.0: cannot open shared object file: No such