[Docs] Misc updates to TPU installation instructions (#10165)

4f168f69 · Michael Green · GitHub · 3e8d14d8 · 4f168f69
Unverified Commit 4f168f69 authored Nov 15, 2024 by Michael Green Committed by GitHub Nov 15, 2024
Show whitespace changes
Inline Side-by-side

Showing with 35 additions and 19 deletions

docs/source/getting_started/tpu-installation.rst docs/source/getting_started/tpu-installation.rst +35 -19

No files found.
--- a/docs/source/getting_started/tpu-installation.rst
+++ b/docs/source/getting_started/tpu-installation.rst
@@ -44,15 +44,18 @@ Requirements
 Provision Cloud TPUs
 ====================
-You can provision Cloud TPUs using the `Cloud TPU API <https://cloud.google.com/tpu/docs/reference/rest>`_` 
+You can provision Cloud TPUs using the `Cloud TPU API <https://cloud.google.com/tpu/docs/reference/rest>`_ 
-or the `queued resources <https://cloud.google.com/tpu/docs/queued-resources>`_` 
+or the `queued resources <https://cloud.google.com/tpu/docs/queued-resources>`_ 
-API. This section shows how to create TPUs using the queued resource API. 
+API. This section shows how to create TPUs using the queued resource API. For 
-For more information about using the Cloud TPU API, see `Create a Cloud TPU using the Create Node API <https://cloud.google.com/tpu/docs/managing-tpus-tpu-vm#create-node-api>`_. 
+more information about using the Cloud TPU API, see `Create a Cloud TPU using the Create Node API <https://cloud.google.com/tpu/docs/managing-tpus-tpu-vm#create-node-api>`_. 
-`Queued resources <https://cloud.devsite.corp.google.com/tpu/docs/queued-resources>`_
+Queued resources enable you to request Cloud TPU resources in a queued manner. 
-enable you to request Cloud TPU resources in a queued manner. When you request 
+When you request queued resources, the request is added to a queue maintained by 
-queued resources, the request is added to a queue maintained by the Cloud TPU 
+the Cloud TPU service. When the requested resource becomes available, it's 
-service. When the requested resource becomes available, it's assigned to your 
+assigned to your Google Cloud project for your immediate exclusive use. 
-Google Cloud project for your immediate exclusive use. 
+.. note::
+   In all of the following commands, replace the ALL CAPS parameter names with 
+   appropriate values. See the parameter descriptions table for more information.
 Provision a Cloud TPU with the queued resource API
 --------------------------------------------------
@@ -68,6 +71,7 @@ Create a TPU v5e with 4 TPU chips:
    --runtime-version RUNTIME_VERSION \
    --service-account SERVICE_ACCOUNT
 .. list-table:: Parameter descriptions
    :header-rows: 1
@@ -81,12 +85,13 @@ Create a TPU v5e with 4 TPU chips:
    * - PROJECT_ID
      - Your Google Cloud project
    * - ZONE
-      - The `zone <https://cloud.google.com/tpu/docs/regions-zones>`_ where you 
+      - The GCP zone where you want to create your Cloud TPU. The value you use 
-        want to create your Cloud TPU.
+        depends on the version of TPUs you are using. For more information, see 
+        `TPU regions and zones <https://cloud.google.com/tpu/docs/regions-zones>`_ 
    * - ACCELERATOR_TYPE
-      - The TPU version you want to use. Specify the TPU version, followed by a 
+      - The TPU version you want to use. Specify the TPU version, for example 
-        '-' and the number of TPU cores. For example `v5e-4` specifies a v5e TPU 
+        `v5litepod-4` specifies a v5e TPU with 4 cores. For more information, 
-        with 4 cores. For more information, see `TPU versions <https://cloud.devsite.corp.google.com/tpu/docs/system-architecture-tpu-vm#versions>`_.
+        see `TPU versions <https://cloud.devsite.corp.google.com/tpu/docs/system-architecture-tpu-vm#versions>`_.
    * - RUNTIME_VERSION
      - The TPU VM runtime version to use. For more information see `TPU VM images <https://cloud.google.com/tpu/docs/runtimes>`_.
    * - SERVICE_ACCOUNT
@@ -98,7 +103,15 @@ Connect to your TPU using SSH:
 .. code-block:: bash
-    gcloud compute tpus tpu-vm ssh TPU_NAME
+    gcloud compute tpus tpu-vm ssh TPU_NAME --zone ZONE
+Install Miniconda
+.. code-block:: bash
+    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
+    bash Miniconda3-latest-Linux-x86_64.sh
+    source ~/.bashrc
 Create and activate a Conda environment for vLLM:
@@ -162,9 +175,11 @@ Run the Docker image with the following command:
 .. note::
-    Since TPU relies on XLA which requires static shapes, vLLM bucketizes the possible input shapes and compiles an XLA graph for each different shape.
+    Since TPU relies on XLA which requires static shapes, vLLM bucketizes the 
-    The compilation time may take 20~30 minutes in the first run.
+    possible input shapes and compiles an XLA graph for each shape. The 
-    However, the compilation time reduces to ~5 minutes afterwards because the XLA graphs are cached in the disk (in :code:`VLLM_XLA_CACHE_PATH` or :code:`~/.cache/vllm/xla_cache` by default).
+    compilation time may take 20~30 minutes in the first run. However, the 
+    compilation time reduces to ~5 minutes afterwards because the XLA graphs are 
+    cached in the disk (in :code:`VLLM_XLA_CACHE_PATH` or :code:`~/.cache/vllm/xla_cache` by default).
 .. tip::
@@ -173,7 +188,8 @@ Run the Docker image with the following command:
    .. code-block:: console
        from torch._C import *  # noqa: F403
-        ImportError: libopenblas.so.0: cannot open shared object file: No such file or directory
+        ImportError: libopenblas.so.0: cannot open shared object file: No such 
+        file or directory
    Install OpenBLAS with the following command: