Unverified Commit 6cd6033c authored by Tanmay Verma's avatar Tanmay Verma Committed by GitHub
Browse files

docs: Fix the documentation for NIXL kv cache transfer backend (#4611)

parent 30aba52c
...@@ -31,28 +31,25 @@ TensorRT-LLM also supports using **NIXL** (NVIDIA Inference Xfer Library) for KV ...@@ -31,28 +31,25 @@ TensorRT-LLM also supports using **NIXL** (NVIDIA Inference Xfer Library) for KV
## Using NIXL for KV Cache Transfer ## Using NIXL for KV Cache Transfer
**Note:** NIXL backend for TensorRT-LLM is currently only supported on AMD64 (x86_64) architecture. If you're running on ARM64, you'll need to use the default UCX method for KV cache transfer. **Note:** NIXL version shipped with current dynamo is not supported by tensorrt-llm<=1.2.0rc2. In order to use NIXL backend for KV cache transfer, users are required to build container image with tensorrt-llm>=1.2.0rc3.
To enable NIXL for KV cache transfer in disaggregated serving: To enable NIXL for KV cache transfer in disaggregated serving:
1. **Build the container with NIXL support:** 1. **Build the container with NIXL support(tensorrt-llm==1.2.0rc3):**
The TensorRT-LLM wheel must be built from source with NIXL support. The `./container/build.sh` script caches previously built TensorRT-LLM wheels to reduce build time. If you have previously built a TensorRT-LLM wheel without NIXL support, you must delete the cached wheel to force a rebuild with NIXL support.
**Remove cached TensorRT-LLM wheel (only if previously built without NIXL support):**
```bash
rm -rf /tmp/trtllm_wheel
```
**Build the container with NIXL support:**
```bash ```bash
./container/build.sh --framework trtllm \ ./container/build.sh --framework trtllm \
--tensorrtllm-git-url https://github.com/NVIDIA/TensorRT-LLM.git \ --tensorrtllm-pip-wheel tensorrt-llm==1.2.0rc3
--tensorrtllm-commit v1.2.0rc2
``` ```
2. **Run the containerized environment:** 2. **Run the containerized environment:**
See [run container](./README.md#run-container) section to learn how to start the container image built in previous step. See [run container](./README.md#run-container) section to learn how to start the container image built in previous step.
Within container, unset `TRTLLM_USE_UCX_KVCACHE` variable so NIXL can be used instead of UCX.
```bash
unset TRTLLM_USE_UCX_KVCACHE
```
3. **Start the disaggregated service:** 3. **Start the disaggregated service:**
See [disaggregated serving](./README.md#disaggregated-serving) to see how to start the deployment. See [disaggregated serving](./README.md#disaggregated-serving) to see how to start the deployment.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment