@@ -212,6 +212,30 @@ cd $DYNAMO_HOME/examples/backends/vllm
...
@@ -212,6 +212,30 @@ cd $DYNAMO_HOME/examples/backends/vllm
### Disaggregated Serving with TRT-LLM
### Disaggregated Serving with TRT-LLM
> [!NOTE]
> The latest TensorRT-LLM release (1.3.0rc1) is currently experiencing a request hang when running disaggregated serving with KVBM.
> Please include the TensorRT-LLM commit id `18e611da773026a55d187870ebcfa95ff00c8482` when building the Dynamo TensorRT-LLM runtime image to test the KVBM + disaggregated serving feature.
```bash
# Build the Dynamo TensorRT-LLM container using commit ID 18e611da773026a55d187870ebcfa95ff00c8482. Note: This build can take a long time.
> Important: After logging into the Dynamo TensorRT-LLM runtime container, copy the Triton kernels into the container’s virtual environment as a separate Python module.
```bash
# Clone the TensorRT-LLM repo and copy the triton_kernels folder into the container as a Python module.