Unverified Commit 15d21760 authored by Richard Huo's avatar Richard Huo Committed by GitHub
Browse files

chore: revert the kvbm workaround since trtllm v1.3.0rc3 is upgraded (#6495)

parent 80cac7c1
......@@ -204,30 +204,6 @@ cd $DYNAMO_HOME/examples/backends/vllm
### Disaggregated Serving with TRT-LLM
> [!NOTE]
> The latest TensorRT-LLM release (1.3.0rc1) is currently experiencing a request hang when running disaggregated serving with KVBM.
> Please include the TensorRT-LLM commit id `18e611da773026a55d187870ebcfa95ff00c8482` when building the Dynamo TensorRT-LLM runtime image to test the KVBM + disaggregated serving feature.
```bash
# Build the Dynamo TensorRT-LLM container using commit ID 18e611da773026a55d187870ebcfa95ff00c8482. Note: This build can take a long time.
./container/build.sh --framework trtllm --tensorrtllm-commit 18e611da773026a55d187870ebcfa95ff00c8482 --tensorrtllm-git-url https://github.com/NVIDIA/TensorRT-LLM.git
# Launch the container
./container/run.sh --framework trtllm -it --mount-workspace --use-nixl-gds
```
> [!NOTE]
> Important: After logging into the Dynamo TensorRT-LLM runtime container, copy the Triton kernels into the container's virtual environment as a separate Python module.
```bash
# Clone the TensorRT-LLM repo and copy the triton_kernels folder into the container as a Python module.
git clone https://github.com/NVIDIA/TensorRT-LLM.git /tmp/TensorRT-LLM && \
cd /tmp/TensorRT-LLM && \
git checkout 18e611da773026a55d187870ebcfa95ff00c8482 && \
cp -r triton_kernels /opt/dynamo/venv/lib/python3.12/site-packages/ && \
cd /workspace && \
rm -rf /tmp/TensorRT-LLM
```
```bash
# Launch prefill worker with KVBM
python3 -m dynamo.trtllm \
......
......@@ -551,10 +551,6 @@ def tester(llm_server):
class TestDeterminismDisagg(BaseTestDeterminism):
"""Test class for determinism validation."""
@pytest.mark.skipif(
check_module_available("tensorrt_llm"),
reason="Skipping test until the TRT-LLM disagg hang issue is fixed. (https://github.com/NVIDIA/TensorRT-LLM/pull/11247)",
)
@pytest.mark.parametrize(
"llm_server",
[
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment