"lib/vscode:/vscode.git/clone" did not exist on "225dad4de5fa0a12b917c33820124dda6f4872b3"
Unverified Commit 4f6996c7 authored by dagil-nvidia's avatar dagil-nvidia Committed by GitHub
Browse files

docs: fix broken documentation links (#5330)

parent 475999cf
...@@ -21,6 +21,10 @@ limitations under the License. ...@@ -21,6 +21,10 @@ limitations under the License.
In disaggregated serving architectures, KV cache must be transferred between prefill and decode workers. TensorRT-LLM supports two methods for this transfer: In disaggregated serving architectures, KV cache must be transferred between prefill and decode workers. TensorRT-LLM supports two methods for this transfer:
## Using NIXL for KV Cache Transfer
Start the disaggregated service: See [Disaggregated Serving](./README.md#disaggregated) to learn how to start the deployment.
## Default Method: NIXL ## Default Method: NIXL
By default, TensorRT-LLM uses **NIXL** (NVIDIA Inference Xfer Library) with UCX (Unified Communication X) as backend for KV cache transfer between prefill and decode workers. [NIXL](https://github.com/ai-dynamo/nixl) is NVIDIA's high-performance communication library designed for efficient data transfer in distributed GPU environments. By default, TensorRT-LLM uses **NIXL** (NVIDIA Inference Xfer Library) with UCX (Unified Communication X) as backend for KV cache transfer between prefill and decode workers. [NIXL](https://github.com/ai-dynamo/nixl) is NVIDIA's high-performance communication library designed for efficient data transfer in distributed GPU environments.
......
...@@ -79,7 +79,7 @@ For basic model registration without KV routing, you can use `--router-mode roun ...@@ -79,7 +79,7 @@ For basic model registration without KV routing, you can use `--router-mode roun
## Disaggregated Serving (Prefill and Decode) ## Disaggregated Serving (Prefill and Decode)
Dynamo supports disaggregated serving where prefill (prompt processing) and decode (token generation) are handled by separate worker pools. When you register workers with `ModelType.Prefill` (see [Backend Guide](../development/backend-guide.md#model-types)), the frontend automatically detects them and activates an internal prefill router. Dynamo supports disaggregated serving where prefill (prompt processing) and decode (token generation) are handled by separate worker pools. When you register workers with `ModelType.Prefill` (see [Backend Guide](../development/backend-guide.md)), the frontend automatically detects them and activates an internal prefill router.
### Automatic Prefill Router Activation ### Automatic Prefill Router Activation
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment