fix gh200 tests on main (#11246)

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix gh200 tests on main (#11246)
Signed-off-by: youkaichao <youkaichao@gmail.com>
35bae114 · youkaichao · GitHub · 88a412ed · 35bae114 · 35bae114
Unverified Commit 35bae114 authored Dec 16, 2024 by youkaichao Committed by GitHub Dec 16, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 6 deletions

.buildkite/run-gh200-test.sh .buildkite/run-gh200-test.sh +2 -2

docs/source/serving/deploying_with_docker.rst docs/source/serving/deploying_with_docker.rst +1 -4

No files found.
--- a/.buildkite/run-gh200-test.sh
+++ b/.buildkite/run-gh200-test.sh
@@ -6,8 +6,8 @@ set -ex

 # Try building the docker image
 DOCKER_BUILDKIT=1 docker build . \
-  --target test \
-  -platform "linux/arm64" \
+  --target vllm-openai \
+  --platform "linux/arm64" \
  -t gh200-test \
  --build-arg max_jobs=66 \
  --build-arg nvcc_threads=2 \

--- a/docs/source/serving/deploying_with_docker.rst
+++ b/docs/source/serving/deploying_with_docker.rst
@@ -54,16 +54,13 @@ of PyTorch Nightly and should be considered **experimental**. Using the flag `--
    # Example of building on Nvidia GH200 server. (Memory usage: ~12GB, Build time: ~1475s / ~25 min, Image size: 7.26GB)
    $ DOCKER_BUILDKIT=1 sudo docker build . \
      --target vllm-openai \
-      -platform "linux/arm64" \
+      --platform "linux/arm64" \
      -t vllm/vllm-gh200-openai:latest \
      --build-arg max_jobs=66 \
      --build-arg nvcc_threads=2 \
      --build-arg torch_cuda_arch_list="9.0+PTX" \
      --build-arg vllm_fa_cmake_gpu_arches="90-real"

-
-
-
 To run vLLM:

 .. code-block:: console