Unverified Commit 39d645e5 authored by Jonathan Tong's avatar Jonathan Tong Committed by GitHub
Browse files

docs: migrate Fern docs from fern/ into docs/ (#6206)


Signed-off-by: default avatarJont828 <jt572@cornell.edu>
parent d381e6ff
......@@ -201,6 +201,30 @@ cd $DYNAMO_HOME/examples/backends/vllm
### Disaggregated Serving with TRT-LLM
> [!NOTE]
> The latest TensorRT-LLM release (1.3.0rc1) is currently experiencing a request hang when running disaggregated serving with KVBM.
> Please include the TensorRT-LLM commit id `18e611da773026a55d187870ebcfa95ff00c8482` when building the Dynamo TensorRT-LLM runtime image to test the KVBM + disaggregated serving feature.
```bash
# Build the Dynamo TensorRT-LLM container using commit ID 18e611da773026a55d187870ebcfa95ff00c8482. Note: This build can take a long time.
./container/build.sh --framework trtllm --tensorrtllm-commit 18e611da773026a55d187870ebcfa95ff00c8482 --tensorrtllm-git-url https://github.com/NVIDIA/TensorRT-LLM.git
# Launch the container
./container/run.sh --framework trtllm -it --mount-workspace --use-nixl-gds
```
> [!NOTE]
> Important: After logging into the Dynamo TensorRT-LLM runtime container, copy the Triton kernels into the container's virtual environment as a separate Python module.
```bash
# Clone the TensorRT-LLM repo and copy the triton_kernels folder into the container as a Python module.
git clone https://github.com/NVIDIA/TensorRT-LLM.git /tmp/TensorRT-LLM && \
cd /tmp/TensorRT-LLM && \
git checkout 18e611da773026a55d187870ebcfa95ff00c8482 && \
cp -r triton_kernels /opt/dynamo/venv/lib/python3.12/site-packages/ && \
cd /workspace && \
rm -rf /tmp/TensorRT-LLM
```
```bash
# Launch prefill worker with KVBM
python3 -m dynamo.trtllm \
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment