See the [Kubernetes Installation Guide](/docs/kubernetes/installation_guide.md#prerequisites) for detailed setup instructions and pre-deployment checks.
See the [Kubernetes Installation Guide](/docs/pages/kubernetes/installation-guide.md#prerequisites) for detailed setup instructions and pre-deployment checks.
> This deployment requires pre-deployment profiling to be completed first. See [Pre-Deployment Profiling](../../../../docs/components/profiler/profiler_guide.md) for detailed instructions.
> This deployment requires pre-deployment profiling to be completed first. See [Pre-Deployment Profiling](../../../../docs/pages/components/profiler/profiler-guide.md) for detailed instructions.
## CRD Structure
...
...
@@ -102,7 +102,7 @@ extraPodSpec:
Before using these templates, ensure you have:
1.**Dynamo Kubernetes Platform installed** - See [Quickstart Guide](../../../../docs/kubernetes/README.md)
1.**Dynamo Kubernetes Platform installed** - See [Quickstart Guide](../../../../docs/pages/kubernetes/README.md)
2.**Kubernetes cluster with GPU support**
3.**Container registry access** for TensorRT-LLM runtime images
4.**HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`)
...
...
@@ -155,7 +155,7 @@ args:
### 3. Deploy
See the [Create Deployment Guide](../../../../docs/kubernetes/deployment/create_deployment.md) to learn how to deploy the deployment file.
See the [Create Deployment Guide](../../../../docs/pages/kubernetes/deployment/create-deployment.md) to learn how to deploy the deployment file.
First, create a secret for the HuggingFace token.
```bash
...
...
@@ -219,7 +219,7 @@ TensorRT-LLM workers are configured through command-line arguments in the deploy
## Testing the Deployment
Send a test request to verify your deployment. See the [client section](../../../../docs/backends/vllm/README.md#client) for detailed instructions.
Send a test request to verify your deployment. See the [client section](../../../../docs/pages/backends/vllm/README.md#client) for detailed instructions.
**Note:** For multi-node deployments, target the node running `python3 -m dynamo.frontend <args>`.
...
...
@@ -241,11 +241,11 @@ TensorRT-LLM supports two methods for KV cache transfer in disaggregated serving
-**UCX** (default): Standard method for KV cache transfer
-**NIXL** (experimental): Alternative transfer method
For detailed configuration instructions, see the [KV cache transfer guide](../../../../docs/backends/trtllm/kv-cache-transfer.md).
For detailed configuration instructions, see the [KV cache transfer guide](../../../../docs/pages/backends/trtllm/kv-cache-transfer.md).
## Request Migration
You can enable [request migration](../../../../docs/fault_tolerance/request_migration.md) to handle worker failures gracefully by adding the migration limit argument to worker configurations:
You can enable [request migration](../../../../docs/pages/fault-tolerance/request-migration.md) to handle worker failures gracefully by adding the migration limit argument to worker configurations:
```yaml
args:
...
...
@@ -264,13 +264,13 @@ Configure the `model` name and `host` based on your deployment.
3.`post_process.py` - Scan the aiperf results to produce a json with entries to each config point.
4.`plot_performance_comparison.py` - Takes the json result file for disaggregated and/or aggregated configuration sweeps and plots a pareto line for better visualization.
For more finer grained details on how to launch TRTLLM backend workers with DeepSeek R1 on GB200 slurm, please refer [multinode-examples.md](../../../../docs/backends/trtllm/multinode/multinode-examples.md). This guide shares similar assumption to the multinode examples guide.
For more finer grained details on how to launch TRTLLM backend workers with DeepSeek R1 on GB200 slurm, please refer [multinode-examples.md](../../../../docs/pages/backends/trtllm/multinode/multinode-examples.md). This guide shares similar assumption to the multinode examples guide.
## Usage
...
...
@@ -49,7 +49,7 @@ For more finer grained details on how to launch TRTLLM backend workers with Deep
Before running the scripts, ensure you have:
1. Access to a SLURM cluster
2. Container image of Dynamo with TensorRT-LLM built using instructions from [here](https://github.com/ai-dynamo/dynamo/tree/main/docs/backends/trtllm/README.md#build-container).
2. Container image of Dynamo with TensorRT-LLM built using instructions from [here](https://github.com/ai-dynamo/dynamo/tree/main/docs/pages/backends/trtllm/README.md#build-container).
If using the SLA Planner deployment (`disagg_planner.yaml`), follow the [pre-deployment profiling guide](../../../../docs/components/profiler/profiler_guide.md) to run pre-deployment profiling.
If using the SLA Planner deployment (`disagg_planner.yaml`), follow the [pre-deployment profiling guide](../../../../docs/pages/components/profiler/profiler-guide.md) to run pre-deployment profiling.
## Usage
...
...
@@ -235,7 +235,7 @@ All templates use **Qwen/Qwen3-0.6B** as the default model, but you can use any
## Request Migration
You can enable [request migration](../../../../docs/fault_tolerance/request_migration.md) to handle worker failures gracefully by adding the migration limit argument to worker configurations:
You can enable [request migration](../../../../docs/pages/fault-tolerance/request-migration.md) to handle worker failures gracefully by adding the migration limit argument to worker configurations: