Unverified Commit 1bb17ecb authored by ioana ghiban's avatar ioana ghiban Committed by GitHub
Browse files

[CPU Backend] [Doc]: Update Installation Docs for CPUs (#29868)


Signed-off-by: default avatarIoana Ghiban <ioana.ghiban@arm.com>
parent 15b1511a
...@@ -4,9 +4,6 @@ vLLM has experimental support for macOS with Apple Silicon. For now, users must ...@@ -4,9 +4,6 @@ vLLM has experimental support for macOS with Apple Silicon. For now, users must
Currently the CPU implementation for macOS supports FP32 and FP16 datatypes. Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.
!!! warning
There are no pre-built wheels or images for this device, so you must build vLLM from source.
# --8<-- [end:installation] # --8<-- [end:installation]
# --8<-- [start:requirements] # --8<-- [start:requirements]
...@@ -20,6 +17,8 @@ Currently the CPU implementation for macOS supports FP32 and FP16 datatypes. ...@@ -20,6 +17,8 @@ Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.
# --8<-- [end:set-up-using-python] # --8<-- [end:set-up-using-python]
# --8<-- [start:pre-built-wheels] # --8<-- [start:pre-built-wheels]
Currently, there are no pre-built Apple silicon CPU wheels.
# --8<-- [end:pre-built-wheels] # --8<-- [end:pre-built-wheels]
# --8<-- [start:build-wheel-from-source] # --8<-- [start:build-wheel-from-source]
...@@ -78,6 +77,8 @@ uv pip install -e . ...@@ -78,6 +77,8 @@ uv pip install -e .
# --8<-- [end:build-wheel-from-source] # --8<-- [end:build-wheel-from-source]
# --8<-- [start:pre-built-images] # --8<-- [start:pre-built-images]
Currently, there are no pre-built Arm silicon CPU images.
# --8<-- [end:pre-built-images] # --8<-- [end:pre-built-images]
# --8<-- [start:build-image-from-source] # --8<-- [start:build-image-from-source]
......
# --8<-- [start:installation] # --8<-- [start:installation]
vLLM has been adapted to work on ARM64 CPUs with NEON support, leveraging the CPU backend initially developed for the x86 platform. vLLM offers basic model inferencing and serving on Arm CPU platform, with support NEON, data types FP32, FP16 and BF16.
ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.
!!! warning
There are no pre-built wheels or images for this device, so you must build vLLM from source.
# --8<-- [end:installation] # --8<-- [end:installation]
# --8<-- [start:requirements] # --8<-- [start:requirements]
...@@ -20,6 +15,23 @@ ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes. ...@@ -20,6 +15,23 @@ ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.
# --8<-- [end:set-up-using-python] # --8<-- [end:set-up-using-python]
# --8<-- [start:pre-built-wheels] # --8<-- [start:pre-built-wheels]
Pre-built vLLM wheels for Arm are available since version 0.11.2. These wheels contain pre-compiled C++ binaries.
Please replace `<version>` in the commands below with a specific version string (e.g., `0.11.2`).
```bash
uv pip install --pre vllm==<version>+cpu --extra-index-url https://wheels.vllm.ai/<version>%2Bcpu/
```
??? console "pip"
```bash
pip install --pre vllm==<version>+cpu --extra-index-url https://wheels.vllm.ai/<version>%2Bcpu/
```
The `uv` approach works for vLLM `v0.6.6` and later. A unique feature of `uv` is that packages in `--extra-index-url` have [higher priority than the default index](https://docs.astral.sh/uv/pip/compatibility/#packages-that-exist-on-multiple-indexes). If the latest public release is `v0.6.6.post1`, `uv`'s behavior allows installing a commit before `v0.6.6.post1` by specifying the `--extra-index-url`. In contrast, `pip` combines packages from `--extra-index-url` and the default index, choosing only the latest version, which makes it difficult to install a development version prior to the released version.
!!! note
Nightly wheels are currently unsupported for this architecture. (e.g. to bisect the behavior change, performance regression).
# --8<-- [end:pre-built-wheels] # --8<-- [end:pre-built-wheels]
# --8<-- [start:build-wheel-from-source] # --8<-- [start:build-wheel-from-source]
...@@ -69,6 +81,8 @@ Testing has been conducted on AWS Graviton3 instances for compatibility. ...@@ -69,6 +81,8 @@ Testing has been conducted on AWS Graviton3 instances for compatibility.
# --8<-- [end:build-wheel-from-source] # --8<-- [end:build-wheel-from-source]
# --8<-- [start:pre-built-images] # --8<-- [start:pre-built-images]
Currently, there are no pre-built Arm CPU images.
# --8<-- [end:pre-built-images] # --8<-- [end:pre-built-images]
# --8<-- [start:build-image-from-source] # --8<-- [start:build-image-from-source]
```bash ```bash
......
...@@ -46,11 +46,25 @@ vLLM is a Python library that supports the following CPU variants. Select your C ...@@ -46,11 +46,25 @@ vLLM is a Python library that supports the following CPU variants. Select your C
### Pre-built wheels ### Pre-built wheels
Please refer to the instructions for [pre-built wheels on GPU](./gpu.md#pre-built-wheels).
When specifying the index URL, please make sure to use the `cpu` variant subdirectory. When specifying the index URL, please make sure to use the `cpu` variant subdirectory.
For example, the nightly build index is: `https://wheels.vllm.ai/nightly/cpu/`. For example, the nightly build index is: `https://wheels.vllm.ai/nightly/cpu/`.
=== "Intel/AMD x86"
--8<-- "docs/getting_started/installation/cpu.x86.inc.md:pre-built-wheels"
=== "ARM AArch64"
--8<-- "docs/getting_started/installation/cpu.arm.inc.md:pre-built-wheels"
=== "Apple silicon"
--8<-- "docs/getting_started/installation/cpu.apple.inc.md:pre-built-wheels"
=== "IBM Z (S390X)"
--8<-- "docs/getting_started/installation/cpu.s390x.inc.md:pre-built-wheels"
### Build wheel from source ### Build wheel from source
#### Set up using Python-only build (without compilation) {#python-only-build} #### Set up using Python-only build (without compilation) {#python-only-build}
...@@ -87,6 +101,18 @@ VLLM_USE_PRECOMPILED=1 VLLM_PRECOMPILED_WHEEL_VARIANT=cpu VLLM_TARGET_DEVICE=cpu ...@@ -87,6 +101,18 @@ VLLM_USE_PRECOMPILED=1 VLLM_PRECOMPILED_WHEEL_VARIANT=cpu VLLM_TARGET_DEVICE=cpu
--8<-- "docs/getting_started/installation/cpu.x86.inc.md:pre-built-images" --8<-- "docs/getting_started/installation/cpu.x86.inc.md:pre-built-images"
=== "ARM AArch64"
--8<-- "docs/getting_started/installation/cpu.arm.inc.md:pre-built-images"
=== "Apple silicon"
--8<-- "docs/getting_started/installation/cpu.apple.inc.md:pre-built-images"
=== "IBM Z (S390X)"
--8<-- "docs/getting_started/installation/cpu.s390x.inc.md:pre-built-images"
### Build image from source ### Build image from source
=== "Intel/AMD x86" === "Intel/AMD x86"
......
...@@ -4,9 +4,6 @@ vLLM has experimental support for s390x architecture on IBM Z platform. For now, ...@@ -4,9 +4,6 @@ vLLM has experimental support for s390x architecture on IBM Z platform. For now,
Currently, the CPU implementation for s390x architecture supports FP32 datatype only. Currently, the CPU implementation for s390x architecture supports FP32 datatype only.
!!! warning
There are no pre-built wheels or images for this device, so you must build vLLM from source.
# --8<-- [end:installation] # --8<-- [end:installation]
# --8<-- [start:requirements] # --8<-- [start:requirements]
...@@ -21,6 +18,8 @@ Currently, the CPU implementation for s390x architecture supports FP32 datatype ...@@ -21,6 +18,8 @@ Currently, the CPU implementation for s390x architecture supports FP32 datatype
# --8<-- [end:set-up-using-python] # --8<-- [end:set-up-using-python]
# --8<-- [start:pre-built-wheels] # --8<-- [start:pre-built-wheels]
Currently, there are no pre-built IBM Z CPU wheels.
# --8<-- [end:pre-built-wheels] # --8<-- [end:pre-built-wheels]
# --8<-- [start:build-wheel-from-source] # --8<-- [start:build-wheel-from-source]
...@@ -69,6 +68,8 @@ Execute the following commands to build and install vLLM from source. ...@@ -69,6 +68,8 @@ Execute the following commands to build and install vLLM from source.
# --8<-- [end:build-wheel-from-source] # --8<-- [end:build-wheel-from-source]
# --8<-- [start:pre-built-images] # --8<-- [start:pre-built-images]
Currently, there are no pre-built IBM Z CPU images.
# --8<-- [end:pre-built-images] # --8<-- [end:pre-built-images]
# --8<-- [start:build-image-from-source] # --8<-- [start:build-image-from-source]
......
...@@ -17,6 +17,8 @@ vLLM supports basic model inferencing and serving on x86 CPU platform, with data ...@@ -17,6 +17,8 @@ vLLM supports basic model inferencing and serving on x86 CPU platform, with data
# --8<-- [end:set-up-using-python] # --8<-- [end:set-up-using-python]
# --8<-- [start:pre-built-wheels] # --8<-- [start:pre-built-wheels]
Currently, there are no pre-built x86 CPU wheels.
# --8<-- [end:pre-built-wheels] # --8<-- [end:pre-built-wheels]
# --8<-- [start:build-wheel-from-source] # --8<-- [start:build-wheel-from-source]
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment