[CPU Backend] [Doc]: Update Installation Docs for CPUs (#29868)

Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>

[CPU Backend] [Doc]: Update Installation Docs for CPUs (#29868)
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
1bb17ecb · ioana ghiban · GitHub · 15b1511a · 1bb17ecb · 1bb17ecb
Unverified Commit 1bb17ecb authored Dec 03, 2025 by ioana ghiban Committed by GitHub Dec 03, 2025
5 changed files
--- a/docs/getting_started/installation/cpu.apple.inc.md
+++ b/docs/getting_started/installation/cpu.apple.inc.md
@@ -4,9 +4,6 @@ vLLM has experimental support for macOS with Apple Silicon. For now, users must
 Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.
-!!! warning
-    There are no pre-built wheels or images for this device, so you must build vLLM from source.
 # --8<-- [end:installation]
 # --8<-- [start:requirements]
@@ -20,6 +17,8 @@ Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.
 # --8<-- [end:set-up-using-python]
 # --8<-- [start:pre-built-wheels]
+Currently, there are no pre-built Apple silicon CPU wheels.
 # --8<-- [end:pre-built-wheels]
 # --8<-- [start:build-wheel-from-source]
@@ -78,6 +77,8 @@ uv pip install -e .
 # --8<-- [end:build-wheel-from-source]
 # --8<-- [start:pre-built-images]
+Currently, there are no pre-built Arm silicon CPU images.
 # --8<-- [end:pre-built-images]
 # --8<-- [start:build-image-from-source]

--- a/docs/getting_started/installation/cpu.arm.inc.md
+++ b/docs/getting_started/installation/cpu.arm.inc.md
 # --8<-- [start:installation]
-vLLM has been adapted to work on ARM64 CPUs with NEON support, leveraging the CPU backend initially developed for the x86 platform.
+vLLM offers basic model inferencing and serving on Arm CPU platform, with support NEON, data types FP32, FP16 and BF16.
-ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.
-!!! warning
-    There are no pre-built wheels or images for this device, so you must build vLLM from source.
 # --8<-- [end:installation]
 # --8<-- [start:requirements]
@@ -20,6 +15,23 @@ ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.
 # --8<-- [end:set-up-using-python]
 # --8<-- [start:pre-built-wheels]
+Pre-built vLLM wheels for Arm are available since version 0.11.2. These wheels contain pre-compiled C++ binaries.
+Please replace `<version>` in the commands below with a specific version string (e.g., `0.11.2`).
+```bash
+uv pip install --pre vllm==<version>+cpu --extra-index-url https://wheels.vllm.ai/<version>%2Bcpu/
+```
+??? console "pip"
+    ```bash
+    pip install --pre vllm==<version>+cpu --extra-index-url https://wheels.vllm.ai/<version>%2Bcpu/
+    ```
+The `uv` approach works for vLLM `v0.6.6` and later. A unique feature of `uv` is that packages in `--extra-index-url` have [higher priority than the default index](https://docs.astral.sh/uv/pip/compatibility/#packages-that-exist-on-multiple-indexes). If the latest public release is `v0.6.6.post1`, `uv`'s behavior allows installing a commit before `v0.6.6.post1` by specifying the `--extra-index-url`. In contrast, `pip` combines packages from `--extra-index-url` and the default index, choosing only the latest version, which makes it difficult to install a development version prior to the released version.
+!!! note
+    Nightly wheels are currently unsupported for this architecture. (e.g. to bisect the behavior change, performance regression).
 # --8<-- [end:pre-built-wheels]
 # --8<-- [start:build-wheel-from-source]
@@ -69,6 +81,8 @@ Testing has been conducted on AWS Graviton3 instances for compatibility.
 # --8<-- [end:build-wheel-from-source]
 # --8<-- [start:pre-built-images]
+Currently, there are no pre-built Arm CPU images.
 # --8<-- [end:pre-built-images]
 # --8<-- [start:build-image-from-source]
 ```bash

--- a/docs/getting_started/installation/cpu.md
+++ b/docs/getting_started/installation/cpu.md
@@ -46,11 +46,25 @@ vLLM is a Python library that supports the following CPU variants. Select your C
 ### Pre-built wheels
-Please refer to the instructions for [pre-built wheels on GPU](./gpu.md#pre-built-wheels).
 When specifying the index URL, please make sure to use the `cpu` variant subdirectory.
 For example, the nightly build index is: `https://wheels.vllm.ai/nightly/cpu/`.
+=== "Intel/AMD x86"
+    --8<-- "docs/getting_started/installation/cpu.x86.inc.md:pre-built-wheels"
+=== "ARM AArch64"
+    --8<-- "docs/getting_started/installation/cpu.arm.inc.md:pre-built-wheels"
+=== "Apple silicon"
+    --8<-- "docs/getting_started/installation/cpu.apple.inc.md:pre-built-wheels"
+=== "IBM Z (S390X)"
+    --8<-- "docs/getting_started/installation/cpu.s390x.inc.md:pre-built-wheels"
 ### Build wheel from source
 #### Set up using Python-only build (without compilation) {#python-only-build}
@@ -87,6 +101,18 @@ VLLM_USE_PRECOMPILED=1 VLLM_PRECOMPILED_WHEEL_VARIANT=cpu VLLM_TARGET_DEVICE=cpu
    --8<-- "docs/getting_started/installation/cpu.x86.inc.md:pre-built-images"
+=== "ARM AArch64"
+    --8<-- "docs/getting_started/installation/cpu.arm.inc.md:pre-built-images"
+=== "Apple silicon"
+    --8<-- "docs/getting_started/installation/cpu.apple.inc.md:pre-built-images"
+=== "IBM Z (S390X)"
+    --8<-- "docs/getting_started/installation/cpu.s390x.inc.md:pre-built-images"
 ### Build image from source
 === "Intel/AMD x86"

--- a/docs/getting_started/installation/cpu.s390x.inc.md
+++ b/docs/getting_started/installation/cpu.s390x.inc.md
@@ -4,9 +4,6 @@ vLLM has experimental support for s390x architecture on IBM Z platform. For now,
 Currently, the CPU implementation for s390x architecture supports FP32 datatype only.
-!!! warning
-    There are no pre-built wheels or images for this device, so you must build vLLM from source.
 # --8<-- [end:installation]
 # --8<-- [start:requirements]
@@ -21,6 +18,8 @@ Currently, the CPU implementation for s390x architecture supports FP32 datatype
 # --8<-- [end:set-up-using-python]
 # --8<-- [start:pre-built-wheels]
+Currently, there are no pre-built IBM Z CPU wheels.
 # --8<-- [end:pre-built-wheels]
 # --8<-- [start:build-wheel-from-source]
@@ -69,6 +68,8 @@ Execute the following commands to build and install vLLM from source.
 # --8<-- [end:build-wheel-from-source]
 # --8<-- [start:pre-built-images]
+Currently, there are no pre-built IBM Z CPU images.
 # --8<-- [end:pre-built-images]
 # --8<-- [start:build-image-from-source]

--- a/docs/getting_started/installation/cpu.x86.inc.md
+++ b/docs/getting_started/installation/cpu.x86.inc.md
@@ -17,6 +17,8 @@ vLLM supports basic model inferencing and serving on x86 CPU platform, with data
 # --8<-- [end:set-up-using-python]
 # --8<-- [start:pre-built-wheels]
+Currently, there are no pre-built x86 CPU wheels.
 # --8<-- [end:pre-built-wheels]
 # --8<-- [start:build-wheel-from-source]