[doc] fix "Other AI accelerators" getting started page (#19457)

Signed-off-by: David Xia <david@davidxia.com>

[doc] fix "Other AI accelerators" getting started page (#19457)
Signed-off-by: David Xia <david@davidxia.com>
89b0f84e · David Xia · GitHub · 497a91e9 · 89b0f84e · 89b0f84e
Unverified Commit 89b0f84e authored Jun 11, 2025 by David Xia Committed by GitHub Jun 11, 2025
3 changed files
--- a/docs/getting_started/installation/ai_accelerator/hpu-gaudi.inc.md
+++ b/docs/getting_started/installation/ai_accelerator/hpu-gaudi.inc.md
@@ -19,7 +19,8 @@ to set up the execution environment. To achieve the best performance,
 please follow the methods outlined in the
 [Optimizing Training Platform Guide](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Optimization_in_Training_Platform.html).
-## Configure a new environment
+# --8<-- [end:requirements]
+# --8<-- [start:configure-a-new-environment]
 ### Environment verification
@@ -56,7 +57,7 @@ docker run \
  vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
 ```
-# --8<-- [end:requirements]
+# --8<-- [end:configure-a-new-environment]
 # --8<-- [start:set-up-using-python]
 # --8<-- [end:set-up-using-python]
@@ -183,7 +184,6 @@ Currently in vLLM for HPU we support four execution modes, depending on selected
 |                    0 |                 0 | torch.compile      |
 |                    0 |                 1 | PyTorch eager mode |
 |                    1 |                 0 | HPU Graphs         |
-  <figcaption>vLLM execution modes</figcaption>
 !!! warning
    In 1.18.0, all modes utilizing `PT_HPU_LAZY_MODE=0` are highly experimental and should be only used for validating functional correctness. Their performance will be improved in the next releases. For obtaining the best performance in 1.18.0, please use HPU Graphs, or PyTorch lazy mode.

--- a/docs/getting_started/installation/ai_accelerator/neuron.inc.md
+++ b/docs/getting_started/installation/ai_accelerator/neuron.inc.md
@@ -17,7 +17,8 @@
 - Accelerator: NeuronCore-v2 (in trn1/inf2 chips) or NeuronCore-v3 (in trn2 chips)
 - AWS Neuron SDK 2.23
-## Configure a new environment
+# --8<-- [end:requirements]
+# --8<-- [start:configure-a-new-environment]
 ### Launch a Trn1/Trn2/Inf2 instance and verify Neuron dependencies
@@ -37,7 +38,7 @@ for alternative setup instructions including using Docker and manually installin
    NxD Inference is the default recommended backend to run inference on Neuron. If you are looking to use the legacy [transformers-neuronx](https://github.com/aws-neuron/transformers-neuronx)
    library, refer to [Transformers NeuronX Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/transformers-neuronx/setup/index.html).
-# --8<-- [end:requirements]
+# --8<-- [end:configure-a-new-environment]
 # --8<-- [start:set-up-using-python]
 # --8<-- [end:set-up-using-python]
@@ -102,8 +103,8 @@ Make sure to use <gh-file:docker/Dockerfile.neuron> in place of the default Dock
 ### Feature support through NxD Inference backend
 The current vLLM and Neuron integration relies on either the `neuronx-distributed-inference` (preferred) or `transformers-neuronx` backend
-    to perform most of the heavy lifting which includes PyTorch model initialization, compilation, and runtime execution. Therefore, most 
+to perform most of the heavy lifting which includes PyTorch model initialization, compilation, and runtime execution. Therefore, most
-    [features supported on Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/feature-guide.html) are also available via the vLLM integration. 
+[features supported on Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/feature-guide.html) are also available via the vLLM integration.
 To configure NxD Inference features through the vLLM entrypoint, use the `override_neuron_config` setting. Provide the configs you want to override
 as a dictionary (or JSON object when starting vLLM from the CLI). For example, to disable auto bucketing, include

--- a/docs/getting_started/installation/ai_accelerator/tpu.inc.md
+++ b/docs/getting_started/installation/ai_accelerator/tpu.inc.md
@@ -58,11 +58,13 @@ assigned to your Google Cloud project for your immediate exclusive use.
 ### Provision Cloud TPUs with GKE
 For more information about using TPUs with GKE, see:
 - <https://cloud.google.com/kubernetes-engine/docs/how-to/tpus>
 - <https://cloud.google.com/kubernetes-engine/docs/concepts/tpus>
 - <https://cloud.google.com/kubernetes-engine/docs/concepts/plan-tpus>
-## Configure a new environment
+# --8<-- [end:requirements]
+# --8<-- [start:configure-a-new-environment]
 ### Provision a Cloud TPU with the queued resource API
@@ -70,23 +72,23 @@ Create a TPU v5e with 4 TPU chips:
 ```console
 gcloud alpha compute tpus queued-resources create QUEUED_RESOURCE_ID \
--node-id TPU_NAME \
+  --node-id TPU_NAME \
--project PROJECT_ID \
+  --project PROJECT_ID \
--zone ZONE \
+  --zone ZONE \
--accelerator-type ACCELERATOR_TYPE \
+  --accelerator-type ACCELERATOR_TYPE \
--runtime-version RUNTIME_VERSION \
+  --runtime-version RUNTIME_VERSION \
--service-account SERVICE_ACCOUNT
+  --service-account SERVICE_ACCOUNT
 ```
 | Parameter name     | Description                                                                                                                                                                                              |
 |--------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | QUEUED_RESOURCE_ID | The user-assigned ID of the queued resource request.                                                                                                                                                     |
-| TPU_NAME           | The user-assigned name of the TPU which is created when the queued                                                                                                                                       |
+| TPU_NAME           | The user-assigned name of the TPU which is created when the queued resource request is allocated.                                                                                                        |
 | PROJECT_ID         | Your Google Cloud project                                                                                                                                                                                |
-| ZONE               | The GCP zone where you want to create your Cloud TPU. The value you use                                                                                                                                  |
+| ZONE               | The GCP zone where you want to create your Cloud TPU. The value you use depends on the version of TPUs you are using. For more information, see [TPU regions and zones]                                  |
-| ACCELERATOR_TYPE   | The TPU version you want to use. Specify the TPU version, for example                                                                                                                                    |
+| ACCELERATOR_TYPE   | The TPU version you want to use. Specify the TPU version, for example `v5litepod-4` specifies a v5e TPU with 4 cores, `v6e-1` specifies a v6e TPU with 1 core. For more information, see [TPU versions]. |
-| RUNTIME_VERSION    | The TPU VM runtime version to use. For example, use `v2-alpha-tpuv6e` for a VM loaded with one or more v6e TPU(s). For more information see [TPU VM images](https://cloud.google.com/tpu/docs/runtimes). |
+| RUNTIME_VERSION    | The TPU VM runtime version to use. For example, use `v2-alpha-tpuv6e` for a VM loaded with one or more v6e TPU(s). For more information see [TPU VM images].                                             |
-  <figcaption>Parameter descriptions</figcaption>
+| SERVICE_ACCOUNT    | The email address for your service account. You can find it in the IAM Cloud Console under *Service Accounts*. For example: `tpu-service-account@<your_project_ID>.iam.gserviceaccount.com`              |
 Connect to your TPU using SSH:
@@ -94,7 +96,11 @@ Connect to your TPU using SSH:
 gcloud compute tpus tpu-vm ssh TPU_NAME --zone ZONE
 ```
-# --8<-- [end:requirements]
+[TPU versions]: https://cloud.google.com/tpu/docs/runtimes
+[TPU VM images]: https://cloud.google.com/tpu/docs/runtimes
+[TPU regions and zones]: https://cloud.google.com/tpu/docs/regions-zones
+# --8<-- [end:configure-a-new-environment]
 # --8<-- [start:set-up-using-python]
 # --8<-- [end:set-up-using-python]