Unverified Commit 7d035aff authored by Richard Huo's avatar Richard Huo Committed by GitHub
Browse files

docs: add notes and instruction for latest trtllm kvbm disagg (#6055)

parent 00ea11ff
...@@ -212,6 +212,30 @@ cd $DYNAMO_HOME/examples/backends/vllm ...@@ -212,6 +212,30 @@ cd $DYNAMO_HOME/examples/backends/vllm
### Disaggregated Serving with TRT-LLM ### Disaggregated Serving with TRT-LLM
> [!NOTE]
> The latest TensorRT-LLM release (1.3.0rc1) is currently experiencing a request hang when running disaggregated serving with KVBM.
> Please include the TensorRT-LLM commit id `18e611da773026a55d187870ebcfa95ff00c8482` when building the Dynamo TensorRT-LLM runtime image to test the KVBM + disaggregated serving feature.
```bash
# Build the Dynamo TensorRT-LLM container using commit ID 18e611da773026a55d187870ebcfa95ff00c8482. Note: This build can take a long time.
./container/build.sh --framework trtllm --tensorrtllm-commit 18e611da773026a55d187870ebcfa95ff00c8482 --tensorrtllm-git-url https://github.com/NVIDIA/TensorRT-LLM.git
# Launch the container
./container/run.sh --framework trtllm -it --mount-workspace --use-nixl-gds
```
> [!NOTE]
> Important: After logging into the Dynamo TensorRT-LLM runtime container, copy the Triton kernels into the container’s virtual environment as a separate Python module.
```bash
# Clone the TensorRT-LLM repo and copy the triton_kernels folder into the container as a Python module.
git clone https://github.com/NVIDIA/TensorRT-LLM.git /tmp/TensorRT-LLM && \
cd /tmp/TensorRT-LLM && \
git checkout 18e611da773026a55d187870ebcfa95ff00c8482 && \
cp -r triton_kernels /opt/dynamo/venv/lib/python3.12/site-packages/ && \
cd /workspace && \
rm -rf /tmp/TensorRT-LLM
```
```bash ```bash
# Launch prefill worker with KVBM # Launch prefill worker with KVBM
python3 -m dynamo.trtllm \ python3 -m dynamo.trtllm \
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment