Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
35bae114
Unverified
Commit
35bae114
authored
Dec 16, 2024
by
youkaichao
Committed by
GitHub
Dec 16, 2024
Browse files
fix gh200 tests on main (#11246)
Signed-off-by:
youkaichao
<
youkaichao@gmail.com
>
parent
88a412ed
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
3 additions
and
6 deletions
+3
-6
.buildkite/run-gh200-test.sh
.buildkite/run-gh200-test.sh
+2
-2
docs/source/serving/deploying_with_docker.rst
docs/source/serving/deploying_with_docker.rst
+1
-4
No files found.
.buildkite/run-gh200-test.sh
View file @
35bae114
...
...
@@ -6,8 +6,8 @@ set -ex
# Try building the docker image
DOCKER_BUILDKIT
=
1 docker build
.
\
--target
test
\
-platform
"linux/arm64"
\
--target
vllm-openai
\
-
-platform
"linux/arm64"
\
-t
gh200-test
\
--build-arg
max_jobs
=
66
\
--build-arg
nvcc_threads
=
2
\
...
...
docs/source/serving/deploying_with_docker.rst
View file @
35bae114
...
...
@@ -54,16 +54,13 @@ of PyTorch Nightly and should be considered **experimental**. Using the flag `--
# Example of building on Nvidia GH200 server. (Memory usage: ~12GB, Build time: ~1475s / ~25 min, Image size: 7.26GB)
$ DOCKER_BUILDKIT=1 sudo docker build . \
--target vllm-openai \
-platform "linux/arm64" \
-
-platform "linux/arm64" \
-t vllm/vllm-gh200-openai:latest \
--build-arg max_jobs=66 \
--build-arg nvcc_threads=2 \
--build-arg torch_cuda_arch_list="9.0+PTX" \
--build-arg vllm_fa_cmake_gpu_arches="90-real"
To run vLLM:
.. code-block:: console
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment