[Docs] Add information about using shared memory in docker (#1845)

0f621c2c · Simon Mo · GitHub · a9e45742 · 0f621c2c · 0f621c2c
Unverified Commit 0f621c2c authored Nov 29, 2023 by Simon Mo Committed by GitHub Nov 29, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 10 additions and 2 deletions

docs/source/models/adding_model.rst docs/source/models/adding_model.rst +1 -1

docs/source/serving/deploying_with_docker.rst docs/source/serving/deploying_with_docker.rst +9 -1

No files found.
--- a/docs/source/models/adding_model.rst
+++ b/docs/source/models/adding_model.rst
@@ -18,7 +18,7 @@ This document provides a high-level guide on integrating a `HuggingFace Transfor
 0. Fork the vLLM repository
 --------------------------------

-Start by forking our `GitHub <https://github.com/vllm-project/vllm/>`_ repository and then :ref:`build it from source <build_from_source>`.
+Start by forking our `GitHub`_ repository and then :ref:`build it from source <build_from_source>`.
 This gives you the ability to modify the codebase and test your model.



--- a/docs/source/serving/deploying_with_docker.rst
+++ b/docs/source/serving/deploying_with_docker.rst
@@ -11,12 +11,20 @@ The image is available on Docker Hub as `vllm/vllm-openai <https://hub.docker.co

    $ docker run --runtime nvidia --gpus all \
        -v ~/.cache/huggingface:/root/.cache/huggingface \
-        -p 8000:8000 \
        --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
+        -p 8000:8000 \
+        --ipc=host \
        vllm/vllm-openai:latest \
        --model mistralai/Mistral-7B-v0.1


+.. note::
+
+        You can either use the ``ipc=host`` flag or ``--shm-size`` flag to allow the
+        container to access the host's shared memory. vLLM uses PyTorch, which uses shared
+        memory to share data between processes under the hood, particularly for tensor parallel inference.
+
+
 You can build and run vLLM from source via the provided dockerfile. To build vLLM:

 .. code-block:: console