Unverified Commit 2cd96ec7 authored by Tanmay Verma's avatar Tanmay Verma Committed by GitHub
Browse files

build: Add TensorRT-LLM to optional dependency and corresponding instructions (#2113)


Signed-off-by: default avatarAnant Sharma <anants@nvidia.com>
Co-authored-by: default avatarGraham King <grahamk@nvidia.com>
Co-authored-by: default avatarAnant Sharma <anants@nvidia.com>
parent 412a12a8
...@@ -79,6 +79,7 @@ uv pip install pip ...@@ -79,6 +79,7 @@ uv pip install pip
# Choose one # Choose one
uv pip install "ai-dynamo[sglang]" uv pip install "ai-dynamo[sglang]"
uv pip install "ai-dynamo[vllm]" uv pip install "ai-dynamo[vllm]"
uv pip install "ai-dynamo[trtllm]"
uv pip install "ai-dynamo[llama_cpp]" # CPU, see later for GPU uv pip install "ai-dynamo[llama_cpp]" # CPU, see later for GPU
``` ```
...@@ -173,7 +174,38 @@ You can pass any sglang flags directly to this worker, see https://docs.sglang.a ...@@ -173,7 +174,38 @@ You can pass any sglang flags directly to this worker, see https://docs.sglang.a
# TRT-LLM # TRT-LLM
This currently requires a container TODO ADD THE DOCS PLZ THANK YOU It is recommended to use [NGC PyTorch Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) for running TensorRT-LLM engine.
> [!Note]
> Ensure that you select a PyTorch container image version that matches the version of TensorRT-LLM you are using.
> For example, if you are using `tensorrt-llm==1.0.0rc4`, use the PyTorch container image version `25.05`.
> To find the correct PyTorch container version for your desired `tensorrt-llm` release, visit the [TensorRT-LLM Dockerfile.multi](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docker/Dockerfile.multi) on GitHub. Switch to the branch that matches your `tensorrt-llm` version, and look for the `BASE_TAG` line to identify the recommended PyTorch container tag.
> [!Important]
> Launch container with the following additional settings `--shm-size=1g --ulimit memlock=-1`
## Install prerequites
```
# Optional step: Only required for Blackwell and Grace Hopper
pip3 install torch==2.7.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
sudo apt-get -y install libopenmpi-dev
```
> [!Tip]
> You can learn more about these prequisites and known issues with TensorRT-LLM pip based installation [here](https://nvidia.github.io/TensorRT-LLM/installation/linux.html).
## Install dynamo
```
uv pip install --upgrade pip setuptools && uv pip install ai-dynamo[trtllm]
```
Run the backend/worker like this:
```
python -m dynamo.trtllm --help
```
To specify which GPUs to use set environment variable `CUDA_VISIBLE_DEVICES`.
# llama.cpp # llama.cpp
......
...@@ -94,6 +94,8 @@ TRTLLM_USE_NIXL_KVCACHE_EXPERIMENTAL="0" ...@@ -94,6 +94,8 @@ TRTLLM_USE_NIXL_KVCACHE_EXPERIMENTAL="0"
# TensorRT-LLM PyPI index URL # TensorRT-LLM PyPI index URL
TENSORRTLLM_INDEX_URL="https://pypi.python.org/simple" TENSORRTLLM_INDEX_URL="https://pypi.python.org/simple"
# TODO: Remove the version specification from here and use the ai-dynamo[trtllm] package.
# Need to update the Dockerfile.tensorrt_llm to use the ai-dynamo[trtllm] package.
DEFAULT_TENSORRTLLM_PIP_WHEEL="tensorrt-llm==1.0.0rc0" DEFAULT_TENSORRTLLM_PIP_WHEEL="tensorrt-llm==1.0.0rc0"
TENSORRTLLM_PIP_WHEEL="" TENSORRTLLM_PIP_WHEEL=""
......
...@@ -59,6 +59,11 @@ keywords = ["llm", "genai", "inference", "nvidia", "distributed", "dynamo"] ...@@ -59,6 +59,11 @@ keywords = ["llm", "genai", "inference", "nvidia", "distributed", "dynamo"]
Repository = "https://github.com/ai-dynamo/dynamo.git" Repository = "https://github.com/ai-dynamo/dynamo.git"
[project.optional-dependencies] [project.optional-dependencies]
trtllm =[
"uvloop",
"tensorrt-llm==1.0.0rc4"
]
vllm = [ vllm = [
"uvloop", "uvloop",
"nixl", "nixl",
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment