Use `--ipc=host` in docker run for distributed inference (#1125)

7d7e3b78 · Woosuk Kwon · GitHub · f98b745a · 7d7e3b78
Unverified Commit 7d7e3b78 authored Sep 21, 2023 by Woosuk Kwon Committed by GitHub Sep 21, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 1 deletion

docs/source/getting_started/installation.rst docs/source/getting_started/installation.rst +2 -1

No files found.
--- a/docs/source/getting_started/installation.rst
+++ b/docs/source/getting_started/installation.rst
@@ -46,4 +46,5 @@ You can also build and install vLLM from source:
    .. code-block:: console
        $ # Pull the Docker image with CUDA 11.8.
-        $ docker run --gpus all -it --rm --shm-size=8g nvcr.io/nvidia/pytorch:22.12-py3
+        $ # Use `--ipc=host` to make sure the shared memory is large enough.
+        $ docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:22.12-py3