@@ -30,43 +30,11 @@ By default, TensorRT-LLM uses **NIXL** (NVIDIA Inference Xfer Library) with UCX
### Specify Backends for NIXL
NIXL supports multiple communication backends that can be configured via environment variables. By default, UCX is used if no backends are explicitly specified.
**Environment Variable Format:**
```bash
DYN_KVBM_NIXL_BACKEND_<BACKEND>=<value>
```
**Supported Backends:**
-`UCX` - Unified Communication X (default)
-`GDS` - GPU Direct Storage
**Examples:**
```bash
# Enable UCX backend (default behavior)
export DYN_KVBM_NIXL_BACKEND_UCX=true
# Enable GDS backend
export DYN_KVBM_NIXL_BACKEND_GDS=true
# Enable multiple backends
export DYN_KVBM_NIXL_BACKEND_UCX=true
export DYN_KVBM_NIXL_BACKEND_GDS=true
# Explicitly disable a backend
export DYN_KVBM_NIXL_BACKEND_GDS=false
```
**Valid Values:**
-`true`, `1`, `on`, `yes` - Enable the backend
-`false`, `0`, `off`, `no` - Disable the backend
> [!Note]
> If no `DYN_KVBM_NIXL_BACKEND_*` environment variables are set, UCX is used as the default backend.
TensorRT-LLM supports two NIXL communication backends: UCX and LIBFABRIC. By default, UCX is used if no backend is explicitly specified. Dynamo currently only supports the UCX backend, as LIBFABRIC support is still a work in progress. Please do not change the NIXL backend in the Dynamo runtime image.
## Alternative Method: UCX
TensorRT-LLM can also leverage **UCX** (Unified Communication X) directly for KV cache transfer between prefill and decode workers. To enable UCX as the KV cache transfer backend, set `cache_transceiver_config.backend: UCX` in your engine configuration YAML file.
> [!Note]
> The environment variable `TRTLLM_USE_UCX_KV_CACHE=1` with `cache_transceiver_config.backend: DEFAULT` does not enable UCX. You must explicitly set `backend: UCX` in the configuration.
> The environment variable `TRTLLM_USE_UCX_KVCACHE=1` with `cache_transceiver_config.backend: DEFAULT` does not enable UCX. You must explicitly set `backend: UCX` in the configuration.