"tests/vscode:/vscode.git/clone" did not exist on "17fa6670eb0be27e47fb15013b490c470f89e041"
Unverified Commit 0f621c2c authored by Simon Mo's avatar Simon Mo Committed by GitHub
Browse files

[Docs] Add information about using shared memory in docker (#1845)

parent a9e45742
......@@ -18,7 +18,7 @@ This document provides a high-level guide on integrating a `HuggingFace Transfor
0. Fork the vLLM repository
--------------------------------
Start by forking our `GitHub <https://github.com/vllm-project/vllm/>`_ repository and then :ref:`build it from source <build_from_source>`.
Start by forking our `GitHub`_ repository and then :ref:`build it from source <build_from_source>`.
This gives you the ability to modify the codebase and test your model.
......
......@@ -11,12 +11,20 @@ The image is available on Docker Hub as `vllm/vllm-openai <https://hub.docker.co
$ docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-p 8000:8000 \
--env "HUGGING_FACE_HUB_TOKEN=<secret>" \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai:latest \
--model mistralai/Mistral-7B-v0.1
.. note::
You can either use the ``ipc=host`` flag or ``--shm-size`` flag to allow the
container to access the host's shared memory. vLLM uses PyTorch, which uses shared
memory to share data between processes under the hood, particularly for tensor parallel inference.
You can build and run vLLM from source via the provided dockerfile. To build vLLM:
.. code-block:: console
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment