@@ -31,28 +31,25 @@ TensorRT-LLM also supports using **NIXL** (NVIDIA Inference Xfer Library) for KV
...
@@ -31,28 +31,25 @@ TensorRT-LLM also supports using **NIXL** (NVIDIA Inference Xfer Library) for KV
## Using NIXL for KV Cache Transfer
## Using NIXL for KV Cache Transfer
**Note:** NIXL backend for TensorRT-LLM is currently only supported on AMD64 (x86_64) architecture. If you're running on ARM64, you'll need to use the default UCX method for KV cache transfer.
**Note:** NIXL version shipped with current dynamo is not supported by tensorrt-llm<=1.2.0rc2.InordertouseNIXLbackendforKVcachetransfer,usersarerequiredtobuildcontainerimagewithtensorrt-llm>=1.2.0rc3.
To enable NIXL for KV cache transfer in disaggregated serving:
To enable NIXL for KV cache transfer in disaggregated serving:
1.**Build the container with NIXL support:**
1.**Build the container with NIXL support(tensorrt-llm==1.2.0rc3):**
The TensorRT-LLM wheel must be built from source with NIXL support. The `./container/build.sh` script caches previously built TensorRT-LLM wheels to reduce build time. If you have previously built a TensorRT-LLM wheel without NIXL support, you must delete the cached wheel to force a rebuild with NIXL support.
**Remove cached TensorRT-LLM wheel (only if previously built without NIXL support):**