Unverified Commit 6d18ed2a authored by Michael Goin's avatar Michael Goin Committed by GitHub
Browse files

Update docker docs with ARM CUDA cross-compile (#19037)


Signed-off-by: default avatarmgoin <michael@neuralmagic.com>
parent f32fcd94
...@@ -107,10 +107,21 @@ DOCKER_BUILDKIT=1 docker build . \ ...@@ -107,10 +107,21 @@ DOCKER_BUILDKIT=1 docker build . \
-t vllm/vllm-gh200-openai:latest \ -t vllm/vllm-gh200-openai:latest \
--build-arg max_jobs=66 \ --build-arg max_jobs=66 \
--build-arg nvcc_threads=2 \ --build-arg nvcc_threads=2 \
--build-arg torch_cuda_arch_list="9.0+PTX" \ --build-arg torch_cuda_arch_list="9.0 10.0+PTX" \
--build-arg vllm_fa_cmake_gpu_arches="90-real" --build-arg vllm_fa_cmake_gpu_arches="90-real"
``` ```
!!! note
If you are building the `linux/arm64` image on a non-ARM host (e.g., an x86_64 machine), you need to ensure your system is set up for cross-compilation using QEMU. This allows your host machine to emulate ARM64 execution.
Run the following command on your host machine to register QEMU user static handlers:
```console
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
```
After setting up QEMU, you can use the `--platform "linux/arm64"` flag in your `docker build` command.
## Use the custom-built vLLM Docker image ## Use the custom-built vLLM Docker image
To run vLLM with the custom-built Docker image: To run vLLM with the custom-built Docker image:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment