deploying_with_docker.rst 1.33 KB
Newer Older
Stephen Krider's avatar
Stephen Krider committed
1
2
3
4
5
.. _deploying_with_docker:

Deploying with Docker
============================

6
7
8
9
vLLM offers official docker image for deployment.
The image can be used to run OpenAI compatible server.
The image is available on Docker Hub as `vllm/vllm-openai <https://hub.docker.com/r/vllm/vllm-openai/tags>`_.

10
.. code-block:: console
11
12
13
14

    $ docker run --runtime nvidia --gpus all \
        -v ~/.cache/huggingface:/root/.cache/huggingface \
        --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
15
16
        -p 8000:8000 \
        --ipc=host \
17
18
19
20
        vllm/vllm-openai:latest \
        --model mistralai/Mistral-7B-v0.1


21
22
23
24
25
26
27
.. note::

        You can either use the ``ipc=host`` flag or ``--shm-size`` flag to allow the
        container to access the host's shared memory. vLLM uses PyTorch, which uses shared
        memory to share data between processes under the hood, particularly for tensor parallel inference.


Stephen Krider's avatar
Stephen Krider committed
28
29
30
31
You can build and run vLLM from source via the provided dockerfile. To build vLLM:

.. code-block:: console

32
    $ DOCKER_BUILDKIT=1 docker build . --target vllm-openai --tag vllm/vllm-openai --build-arg max_jobs=8
Stephen Krider's avatar
Stephen Krider committed
33
34
35
36
37
38
39
40
41

To run vLLM:

.. code-block:: console

    $ docker run --runtime nvidia --gpus all \
        -v ~/.cache/huggingface:/root/.cache/huggingface \
        -p 8000:8000 \
        --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
42
        vllm/vllm-openai <args...>
Stephen Krider's avatar
Stephen Krider committed
43