docs: add notes and instruction for latest trtllm kvbm disagg (#6055)

7d035aff · Richard Huo · GitHub · 00ea11ff · 7d035aff
Unverified Commit 7d035aff authored Feb 06, 2026 by Richard Huo Committed by GitHub Feb 06, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 24 additions and 0 deletions

docs/components/kvbm/kvbm_guide.md docs/components/kvbm/kvbm_guide.md +24 -0

No files found.
--- a/docs/components/kvbm/kvbm_guide.md
+++ b/docs/components/kvbm/kvbm_guide.md
@@ -212,6 +212,30 @@ cd $DYNAMO_HOME/examples/backends/vllm
 ### Disaggregated Serving with TRT-LLM
+> [!NOTE]
+> The latest TensorRT-LLM release (1.3.0rc1) is currently experiencing a request hang when running disaggregated serving with KVBM.
+> Please include the TensorRT-LLM commit id `18e611da773026a55d187870ebcfa95ff00c8482` when building the Dynamo TensorRT-LLM runtime image to test the KVBM + disaggregated serving feature.
+```bash
+# Build the Dynamo TensorRT-LLM container using commit ID 18e611da773026a55d187870ebcfa95ff00c8482. Note: This build can take a long time.
+./container/build.sh --framework trtllm --tensorrtllm-commit 18e611da773026a55d187870ebcfa95ff00c8482 --tensorrtllm-git-url https://github.com/NVIDIA/TensorRT-LLM.git
+# Launch the container
+./container/run.sh --framework trtllm -it --mount-workspace --use-nixl-gds
+```
+> [!NOTE]
+> Important: After logging into the Dynamo TensorRT-LLM runtime container, copy the Triton kernels into the container’s virtual environment as a separate Python module.
+```bash
+# Clone the TensorRT-LLM repo and copy the triton_kernels folder into the container as a Python module.
+git clone https://github.com/NVIDIA/TensorRT-LLM.git /tmp/TensorRT-LLM && \
+cd /tmp/TensorRT-LLM && \
+git checkout 18e611da773026a55d187870ebcfa95ff00c8482 && \
+cp -r triton_kernels /opt/dynamo/venv/lib/python3.12/site-packages/ && \
+cd /workspace && \
+rm -rf /tmp/TensorRT-LLM
+```
 ```bash
 # Launch prefill worker with KVBM
 python3 -m dynamo.trtllm \