Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
norm
vllm
Commits
c85b80c2
Unverified
Commit
c85b80c2
authored
Dec 08, 2023
by
Simon Mo
Committed by
GitHub
Dec 08, 2023
Browse files
[Docker] Add cuda arch list as build option (#1950)
parent
2b981012
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
13 additions
and
1 deletion
+13
-1
Dockerfile
Dockerfile
+5
-1
docs/source/serving/deploying_with_docker.rst
docs/source/serving/deploying_with_docker.rst
+8
-0
No files found.
Dockerfile
View file @
c85b80c2
...
@@ -30,11 +30,15 @@ COPY requirements.txt requirements.txt
...
@@ -30,11 +30,15 @@ COPY requirements.txt requirements.txt
COPY
pyproject.toml pyproject.toml
COPY
pyproject.toml pyproject.toml
COPY
vllm/__init__.py vllm/__init__.py
COPY
vllm/__init__.py vllm/__init__.py
ARG
torch_cuda_arch_list='7.0 7.5 8.0 8.6 8.9 9.0+PTX'
ENV
TORCH_CUDA_ARCH_LIST=${torch_cuda_arch_list}
# max jobs used by Ninja to build extensions
# max jobs used by Ninja to build extensions
ENV
MAX_JOBS=$max_jobs
ARG
max_jobs=2
ENV
MAX_JOBS=${max_jobs}
# number of threads used by nvcc
# number of threads used by nvcc
ARG
nvcc_threads=8
ARG
nvcc_threads=8
ENV
NVCC_THREADS=$nvcc_threads
ENV
NVCC_THREADS=$nvcc_threads
RUN
python3 setup.py build_ext
--inplace
RUN
python3 setup.py build_ext
--inplace
# image to run unit testing suite
# image to run unit testing suite
...
...
docs/source/serving/deploying_with_docker.rst
View file @
c85b80c2
...
@@ -31,6 +31,14 @@ You can build and run vLLM from source via the provided dockerfile. To build vLL
...
@@ -31,6 +31,14 @@ You can build and run vLLM from source via the provided dockerfile. To build vLL
$ DOCKER_BUILDKIT=1 docker build . --target vllm-openai --tag vllm/vllm-openai # optionally specifies: --build-arg max_jobs=8 --build-arg nvcc_threads=2
$ DOCKER_BUILDKIT=1 docker build . --target vllm-openai --tag vllm/vllm-openai # optionally specifies: --build-arg max_jobs=8 --build-arg nvcc_threads=2
.. note::
By default vLLM will build for all GPU types for widest distribution. If you are just building for the
current GPU type the machine is running on, you can add the argument ``--build-arg torch_cuda_arch_list=""``
for vLLM to find the current GPU type and build for that.
To run vLLM:
To run vLLM:
.. code-block:: console
.. code-block:: console
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment