Unverified Commit 43d687e8 authored by Tanmay Verma's avatar Tanmay Verma Committed by GitHub
Browse files

chore: Mark NIXL as beta in TRTLLM (#3633)


Signed-off-by: default avatarLaikh Tewari <ltewari@nvidia.com>
Co-authored-by: default avatarLaikh Tewari <ltewari@nvidia.com>
parent a7badb85
...@@ -24,10 +24,10 @@ In disaggregated serving architectures, KV cache must be transferred between pre ...@@ -24,10 +24,10 @@ In disaggregated serving architectures, KV cache must be transferred between pre
## Default Method: UCX ## Default Method: UCX
By default, TensorRT-LLM uses UCX (Unified Communication X) for KV cache transfer between prefill and decode workers. UCX provides high-performance communication optimized for GPU-to-GPU transfers. By default, TensorRT-LLM uses UCX (Unified Communication X) for KV cache transfer between prefill and decode workers. UCX provides high-performance communication optimized for GPU-to-GPU transfers.
## Experimental Method: NIXL ## Beta Method: NIXL
TensorRT-LLM also provides experimental support for using **NIXL** (NVIDIA Inference Xfer Library) for KV cache transfer. [NIXL](https://github.com/ai-dynamo/nixl) is NVIDIA's high-performance communication library designed for efficient data transfer in distributed GPU environments. TensorRT-LLM also supports using **NIXL** (NVIDIA Inference Xfer Library) for KV cache transfer. [NIXL](https://github.com/ai-dynamo/nixl) is NVIDIA's high-performance communication library designed for efficient data transfer in distributed GPU environments.
**Note:** NIXL support in TensorRT-LLM is experimental and is not suitable for production environments yet. **Note:** NIXL support in TensorRT-LLM is currently beta and may have some sharp edges.
## Using NIXL for KV Cache Transfer ## Using NIXL for KV Cache Transfer
...@@ -61,4 +61,4 @@ To enable NIXL for KV cache transfer in disaggregated serving: ...@@ -61,4 +61,4 @@ To enable NIXL for KV cache transfer in disaggregated serving:
4. **Send the request:** 4. **Send the request:**
See [client](./README.md#client) section to learn how to send the request to deployment. See [client](./README.md#client) section to learn how to send the request to deployment.
**Important:** Ensure that ETCD and NATS services are running before starting the service. **Important:** Ensure that ETCD and NATS services are running before starting the service.
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment