Update docker docs with ARM CUDA cross-compile (#19037)

Signed-off-by: mgoin <michael@neuralmagic.com>

Update docker docs with ARM CUDA cross-compile (#19037)
Signed-off-by: mgoin <michael@neuralmagic.com>
6d18ed2a · Michael Goin · GitHub · f32fcd94 · 6d18ed2a
Unverified Commit 6d18ed2a authored Jun 03, 2025 by Michael Goin Committed by GitHub Jun 03, 2025
Show whitespace changes
Inline Side-by-side

Showing with 12 additions and 1 deletion

docs/deployment/docker.md docs/deployment/docker.md +12 -1

No files found.
--- a/docs/deployment/docker.md
+++ b/docs/deployment/docker.md
@@ -107,10 +107,21 @@ DOCKER_BUILDKIT=1 docker build . \
  -t vllm/vllm-gh200-openai:latest \
  --build-arg max_jobs=66 \
  --build-arg nvcc_threads=2 \
-  --build-arg torch_cuda_arch_list="9.0+PTX" \
+  --build-arg torch_cuda_arch_list="9.0 10.0+PTX" \
  --build-arg vllm_fa_cmake_gpu_arches="90-real"
 ```
+!!! note
+    If you are building the `linux/arm64` image on a non-ARM host (e.g., an x86_64 machine), you need to ensure your system is set up for cross-compilation using QEMU. This allows your host machine to emulate ARM64 execution.
+    Run the following command on your host machine to register QEMU user static handlers:
+    ```console
+    docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
+    ```
+    After setting up QEMU, you can use the `--platform "linux/arm64"` flag in your `docker build` command.
 ## Use the custom-built vLLM Docker image
 To run vLLM with the custom-built Docker image: