"tests/vscode:/vscode.git/clone" did not exist on "3fd03f1ec29cf9ac20584ad68156fc7279387979"
Unverified Commit 39d645e5 authored by Jonathan Tong's avatar Jonathan Tong Committed by GitHub
Browse files

docs: migrate Fern docs from fern/ into docs/ (#6206)


Signed-off-by: default avatarJont828 <jt572@cornell.edu>
parent d381e6ff
---
orphan: true
---
# <Backend> Guide
Advanced deployment and configuration for the <Backend> backend.
## Deployment
### Single-Node Setup
<!-- Local deployment instructions -->
### Multi-Node Setup
<!-- Distributed deployment with TP/PP -->
### Kubernetes Deployment
```yaml
# Full DGDR example
```
## Configuration
### CLI Arguments
| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| <!-- arg --> | <!-- type --> | <!-- default --> | <!-- description --> |
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| <!-- var --> | <!-- default --> | <!-- description --> |
### Model Configuration
<!-- Model-specific settings, quantization -->
## Performance Tuning
### Memory Optimization
<!-- KV cache sizing, batch limits -->
### Throughput Optimization
<!-- Concurrency, prefill/decode settings -->
## Troubleshooting
### Common Issues
| Issue | Cause | Solution |
|-------|-------|----------|
| <!-- issue --> | <!-- cause --> | <!-- solution --> |
### Debug Mode
```bash
# Add debug command from existing docs
```
## See Also
| Document | Path |
|----------|------|
| `<Backend> Overview` | `./README.md` |
| Backend Comparison | `../README.md` |
<!-- Convert to links when using template -->
---
orphan: true
---
# <Backend> Backend
<!-- 2-3 sentence overview of this backend integration -->
## Feature Matrix
<!-- Copy actual feature matrix from existing backend docs -->
<!-- Example pattern (from vLLM README): -->
| Feature | Status | Notes |
|---------|--------|-------|
| Disaggregated Serving | ✅ | |
| KV-Aware Routing | ✅ | |
| SLA-Based Planner | ✅ | |
| Multimodal | ✅ | Vision models |
| LoRA | 🚧 | Experimental |
## Quick Start
### Prerequisites
- <!-- List prerequisites -->
### Usage
```bash
# Add minimal usage example from existing backend docs
# Example pattern (vLLM):
# python -m dynamo.vllm --model <model-name>
# Example pattern (SGLang):
# python -m dynamo.sglang --model <model-name>
```
### Kubernetes
```yaml
# Add DGDR example - use apiVersion: nvidia.com/v1alpha1
# See recipes/ folder for production examples
```
## Configuration
| Parameter | Default | Description |
|-----------|---------|-------------|
| <!-- param --> | <!-- default --> | <!-- description --> |
<!-- EXAMPLE: Filled-in Configuration for vLLM would look like:
| Parameter | Default | Description |
|-----------|---------|-------------|
| `--model` | required | Model path or HuggingFace ID |
| `--tensor-parallel-size` | `1` | Number of GPUs for tensor parallelism |
| `--max-model-len` | auto | Maximum sequence length |
-->
## Next Steps
| Document | Path | Description |
|----------|------|-------------|
| `<Backend> Guide` | `<backend>_guide.md` | Advanced configuration |
| Backend Comparison | `../README.md` | Compare backends |
<!-- Convert table rows to markdown links -->
---
orphan: true
---
# <Component> Design
Architecture and design decisions for the <Component>.
## Overview
<!-- High-level architecture description -->
## Design Goals
1. **Goal 1** - Description
2. **Goal 2** - Description
3. **Goal 3** - Description
## Architecture
### Components
<!-- Description of internal components -->
### Data Flow
```
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Input │───▶│ Process │───▶│ Output │
└─────────┘ └─────────┘ └─────────┘
```
## Design Decisions
### Decision 1: <!-- Title -->
**Context:** <!-- What problem were we solving? -->
**Options Considered:**
1. Option A - Pros/Cons
2. Option B - Pros/Cons
**Decision:** <!-- What we chose and why -->
**Consequences:** <!-- Trade-offs accepted -->
## Algorithms
### <!-- Algorithm Name -->
<!-- Algorithm description -->
```
Pseudocode or formula
```
## Performance Considerations
<!-- Performance characteristics, bottlenecks, optimization opportunities -->
## Future Work
- <!-- Planned improvement 1 -->
- <!-- Planned improvement 2 -->
## References
- <!-- Related design docs -->
- <!-- External papers or resources -->
---
orphan: true
---
# <Component> Examples
Usage examples for the <Component>.
## Basic Examples
### Example 1: <!-- Title -->
```bash
# Add example from existing docs
```
### Example 2: <!-- Title -->
```python
# Add example from existing docs
```
## Kubernetes Examples
### Minimal Deployment
```yaml
# Add minimal DGDR from existing docs
```
### Production Deployment
```yaml
# Add production DGDR from existing docs
```
## Advanced Examples
### <!-- Advanced Use Case Title -->
<!-- Description -->
```bash
# Add example
```
## Sample Configurations
### config-minimal.yaml
```yaml
# Add from existing docs
```
---
orphan: true
---
# <Component> Guide
This guide covers deployment, configuration, and integration for the <Component>.
## Deployment
### Single-Node Setup
<!-- Instructions for local/single-node deployment -->
### Multi-Node Setup
<!-- Instructions for distributed deployment -->
### Kubernetes Deployment
```yaml
# Full DGDR example
```
## Configuration
### CLI Arguments
| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| <!-- arg --> | <!-- type --> | <!-- default --> | <!-- description --> |
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| <!-- var --> | <!-- default --> | <!-- description --> |
### Configuration File
```yaml
# Add config file example if applicable
```
## Integration
### With Router
<!-- How to integrate with Router -->
### With Planner
<!-- How to integrate with Planner -->
### With Observability
<!-- Metrics, logging, tracing integration -->
## Troubleshooting
### Common Issues
| Issue | Cause | Solution |
|-------|-------|----------|
| Error message | Root cause | Fix |
### Debug Mode
```bash
# Add debug command from existing docs
```
## See Also
| Document | Path |
|----------|------|
| `<Component> Examples` | `<component>_examples.md` |
| `<Component> Design` | `/docs/design_docs/<component>_design.md` |
<!-- Convert table rows to markdown links -->
---
orphan: true
---
# <Component>
<!-- 2-3 sentence overview of what this component does and its role in Dynamo -->
## Feature Matrix
| Feature | Status |
|---------|--------|
| Feature 1 | ✅ Supported |
| Feature 2 | 🚧 Experimental |
| Feature 3 | ❌ Not Supported |
## Quick Start
### Prerequisites
- <!-- List prerequisites -->
### Usage
```bash
# Add minimal usage example from existing docs
# Example pattern (from Router):
# python -m dynamo.frontend --router-mode kv --http-port 8000
```
### Kubernetes
```yaml
# Add DGDR example - use apiVersion: nvidia.com/v1alpha1
# Example pattern (from Router):
# apiVersion: nvidia.com/v1alpha1
# kind: DynamoGraphDeployment
# metadata:
# name: <component>-deployment
# spec:
# services:
# ...
```
<!-- EXAMPLE: Filled-in Quick Start for Router would look like:
### Prerequisites
- Dynamo platform installed
- At least one backend worker running
### Usage
```bash
python -m dynamo.frontend --router-mode kv --http-port 8000
```
### Kubernetes
```yaml
apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
name: router-example
spec:
graphs:
- name: frontend
replicas: 1
```
-->
## Configuration
| Parameter | Default | Description |
|-----------|---------|-------------|
| <!-- param --> | <!-- default --> | <!-- description --> |
## Next Steps
| Document | Path | Description |
|----------|------|-------------|
| `<Component> Guide` | `<component>_guide.md` | Deployment and configuration |
| `<Component> Examples` | `<component>_examples.md` | Usage examples |
| `<Component> Design` | `/docs/design_docs/<component>_design.md` | Architecture |
<!-- Convert table rows to markdown links -->
---
orphan: true
---
# <Feature> with <Backend>
Using <Feature> with the <Backend> backend.
## Prerequisites
- <Backend> installed with <feature> support
- <!-- Other requirements -->
## Configuration
### CLI Arguments
| Argument | Default | Description |
|----------|---------|-------------|
| <!-- arg --> | <!-- default --> | <!-- description --> |
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| <!-- var --> | <!-- default --> | <!-- description --> |
## Examples
### Basic Usage
```python
# Add example from existing docs
```
### Kubernetes Deployment
```yaml
# Add DGDR example from existing docs
```
## Limitations
- <!-- Backend-specific limitations -->
## Troubleshooting
| Issue | Solution |
|-------|----------|
| <!-- issue --> | <!-- solution --> |
## See Also
| Document | Path |
|----------|------|
| `<Feature> Overview` | `./README.md` |
| `<Backend> Guide` | `/docs/backends/<backend>/README.md` |
<!-- Convert to links: [Multimodal Overview](./README.md) -->
---
orphan: true
---
# <Feature>
<!-- 2-3 sentence overview of this cross-cutting feature -->
## Backend Support
<!-- Copy actual backend support from existing feature docs -->
<!-- Example pattern (from Multimodal index.md): -->
| Backend | Status | Notes |
|---------|--------|-------|
| vLLM | ✅ | Full support |
| SGLang | ✅ | |
| TensorRT-LLM | 🚧 | Limited support |
See the Feature Matrix for full compatibility.
## Overview
<!-- How this feature works across backends -->
## Quick Start
<!-- Add minimal example from existing feature docs -->
## Backend-Specific Guides
| Backend | Guide |
|---------|-------|
| vLLM | `<feature>_vllm.md` |
| SGLang | `<feature>_sglang.md` |
| TensorRT-LLM | `<feature>_trtllm.md` |
<!-- Convert table rows to markdown links -->
## See Also
- <!-- Related features -->
- <!-- Related components -->
---
orphan: true
---
# Dynamo <Component>
<!-- One-sentence description -->
See `docs/components/<component>/` for full documentation.
<!-- When using this template, replace with actual link to component docs.
For backends, use: docs/backends/<backend>/
-->
---
orphan: true
---
# <Topic>
<!-- 2-3 sentence overview of this infrastructure topic. -->
## Quick Start
<!-- Minimal steps to get started -->
## Guides
| Guide | Path |
|-------|------|
| Guide 1 | `<subtopic1>.md` |
| Guide 2 | `<subtopic2>.md` |
## Reference
<!-- Links to reference material -->
## See Also
| Topic | Path |
|-------|------|
| Related topic 1 | `../related/` |
| Related topic 2 | `../other/` |
---
orphan: true
---
# <Integration> Integration
<!-- 2-3 sentence overview of this external integration -->
## Version Compatibility
| Dynamo | <Integration> | Notes |
|--------|---------------|-------|
| 0.9.x | 1.2.x | Recommended |
| 0.8.x | 1.1.x | |
## Backend Support
| Backend | Status | Notes |
|---------|--------|-------|
| vLLM | ✅ | |
| SGLang | 🚧 | |
| TensorRT-LLM | ❌ | |
## Quick Start
```bash
# Add installation and usage from existing integration docs
# Example pattern (LMCache):
# python -m dynamo.vllm --model <model> --connector lmcache
```
## Configuration
| Parameter | Default | Description |
|-----------|---------|-------------|
| <!-- param --> | <!-- default --> | <!-- description --> |
## Guides
| Document | Path | Description |
|----------|------|-------------|
| `<Integration> Setup` | `<integration>_setup.md` | Installation and configuration |
| `<Integration> with vLLM` | `<integration>_vllm.md` | vLLM-specific usage |
<!-- Convert table rows to markdown links -->
## External Resources
- [<Integration> Documentation](https://...)
- [<Integration> GitHub](https://github.com/...)
...@@ -270,6 +270,8 @@ navigation: ...@@ -270,6 +270,8 @@ navigation:
path: ../pages/backends/vllm/prometheus.md path: ../pages/backends/vllm/prometheus.md
- page: Prompt Embeddings - page: Prompt Embeddings
path: ../pages/backends/vllm/prompt-embeddings.md path: ../pages/backends/vllm/prompt-embeddings.md
- page: vLLM-Omni
path: ../pages/backends/vllm/vllm-omni.md
- section: SGLang Details - section: SGLang Details
contents: contents:
- page: Expert Distribution (EPLB) - page: Expert Distribution (EPLB)
......
...@@ -82,4 +82,4 @@ If you're running Kubernetes/cloud deployment examples (EKS, AKS, GKE), you'll a ...@@ -82,4 +82,4 @@ If you're running Kubernetes/cloud deployment examples (EKS, AKS, GKE), you'll a
| **kubectl** | v1.24+ | [Install kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl) | | **kubectl** | v1.24+ | [Install kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl) |
| **Helm** | v3.0+ | [Install Helm](https://helm.sh/docs/intro/install/) | | **Helm** | v3.0+ | [Install Helm](https://helm.sh/docs/intro/install/) |
See the [Kubernetes Installation Guide](/docs/kubernetes/installation_guide.md#prerequisites) for detailed setup instructions and pre-deployment checks. See the [Kubernetes Installation Guide](/docs/pages/kubernetes/installation-guide.md#prerequisites) for detailed setup instructions and pre-deployment checks.
...@@ -74,7 +74,7 @@ extraPodSpec: ...@@ -74,7 +74,7 @@ extraPodSpec:
Before using these templates, ensure you have: Before using these templates, ensure you have:
1. **Dynamo Kubernetes Platform installed** - See [Installing Dynamo Kubernetes Platform](../../../../docs/kubernetes/installation_guide.md) 1. **Dynamo Kubernetes Platform installed** - See [Installing Dynamo Kubernetes Platform](../../../../docs/pages/kubernetes/installation-guide.md)
2. **Kubernetes cluster with GPU support** 2. **Kubernetes cluster with GPU support**
3. **Container registry access** for SGLang runtime images 3. **Container registry access** for SGLang runtime images
4. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`) 4. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`)
...@@ -144,10 +144,10 @@ All templates use **DeepSeek-R1-Distill-Llama-8B** as the default model. But you ...@@ -144,10 +144,10 @@ All templates use **DeepSeek-R1-Distill-Llama-8B** as the default model. But you
## Further Reading ## Further Reading
- **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/kubernetes/deployment/create_deployment.md) - **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/pages/kubernetes/deployment/create-deployment.md)
- **Quickstart**: [Deployment Quickstart](../../../../docs/kubernetes/README.md) - **Quickstart**: [Deployment Quickstart](../../../../docs/pages/kubernetes/README.md)
- **Platform Setup**: [Dynamo Kubernetes Platform Installation](../../../../docs/kubernetes/installation_guide.md) - **Platform Setup**: [Dynamo Kubernetes Platform Installation](../../../../docs/pages/kubernetes/installation-guide.md)
- **Examples**: [Deployment Examples](../../../../docs/examples/README.md) - **Examples**: [Deployment Examples](../../../../docs/pages/getting-started/examples.md)
- **Kubernetes CRDs**: [Custom Resources Documentation](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) - **Kubernetes CRDs**: [Custom Resources Documentation](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
## Troubleshooting ## Troubleshooting
...@@ -159,4 +159,4 @@ Common issues and solutions: ...@@ -159,4 +159,4 @@ Common issues and solutions:
3. **Health check failures**: Review model loading logs and increase `initialDelaySeconds` 3. **Health check failures**: Review model loading logs and increase `initialDelaySeconds`
4. **Out of memory**: Increase memory limits or reduce model batch size 4. **Out of memory**: Increase memory limits or reduce model batch size
For additional support, refer to the [deployment guide](../../../../docs/kubernetes/README.md). For additional support, refer to the [deployment guide](../../../../docs/pages/kubernetes/README.md).
...@@ -17,7 +17,7 @@ For this example, we will make some assumptions about your SLURM cluster: ...@@ -17,7 +17,7 @@ For this example, we will make some assumptions about your SLURM cluster:
If your cluster supports similar container based plugins, you may be able to If your cluster supports similar container based plugins, you may be able to
modify the template to use that instead. modify the template to use that instead.
3. We assume you have already built a recent Dynamo+SGLang container image as 3. We assume you have already built a recent Dynamo+SGLang container image as
described [here](../../../../docs/backends/sglang/README.md#using-docker-containers). described [here](../../../../docs/pages/backends/sglang/README.md#using-docker-containers).
This is the image that can be passed to the `--container-image` argument in later steps. This is the image that can be passed to the `--container-image` argument in later steps.
## Scripts Overview ## Scripts Overview
......
...@@ -223,6 +223,6 @@ To add other backends (TensorRT, ONNX, Python, etc.), edit the Makefile's `build ...@@ -223,6 +223,6 @@ To add other backends (TensorRT, ONNX, Python, etc.), edit the Makefile's `build
## Related Documentation ## Related Documentation
- [Dynamo Backend Guide](../../../docs/development/backend-guide.md) - [Dynamo Backend Guide](../../../docs/pages/development/backend-guide.md)
- [Triton Inference Server](https://github.com/triton-inference-server/server) - [Triton Inference Server](https://github.com/triton-inference-server/server)
- [KServe Protocol](https://kserve.github.io/website/latest/modelserving/data_plane/v2_protocol/) - [KServe Protocol](https://kserve.github.io/website/latest/modelserving/data_plane/v2_protocol/)
...@@ -53,7 +53,7 @@ Advanced disaggregated deployment with SLA-based automatic scaling. ...@@ -53,7 +53,7 @@ Advanced disaggregated deployment with SLA-based automatic scaling.
- `TRTLLMPrefillWorker`: Specialized prefill-only worker - `TRTLLMPrefillWorker`: Specialized prefill-only worker
> [!NOTE] > [!NOTE]
> This deployment requires pre-deployment profiling to be completed first. See [Pre-Deployment Profiling](../../../../docs/components/profiler/profiler_guide.md) for detailed instructions. > This deployment requires pre-deployment profiling to be completed first. See [Pre-Deployment Profiling](../../../../docs/pages/components/profiler/profiler-guide.md) for detailed instructions.
## CRD Structure ## CRD Structure
...@@ -102,7 +102,7 @@ extraPodSpec: ...@@ -102,7 +102,7 @@ extraPodSpec:
Before using these templates, ensure you have: Before using these templates, ensure you have:
1. **Dynamo Kubernetes Platform installed** - See [Quickstart Guide](../../../../docs/kubernetes/README.md) 1. **Dynamo Kubernetes Platform installed** - See [Quickstart Guide](../../../../docs/pages/kubernetes/README.md)
2. **Kubernetes cluster with GPU support** 2. **Kubernetes cluster with GPU support**
3. **Container registry access** for TensorRT-LLM runtime images 3. **Container registry access** for TensorRT-LLM runtime images
4. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`) 4. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`)
...@@ -155,7 +155,7 @@ args: ...@@ -155,7 +155,7 @@ args:
### 3. Deploy ### 3. Deploy
See the [Create Deployment Guide](../../../../docs/kubernetes/deployment/create_deployment.md) to learn how to deploy the deployment file. See the [Create Deployment Guide](../../../../docs/pages/kubernetes/deployment/create-deployment.md) to learn how to deploy the deployment file.
First, create a secret for the HuggingFace token. First, create a secret for the HuggingFace token.
```bash ```bash
...@@ -219,7 +219,7 @@ TensorRT-LLM workers are configured through command-line arguments in the deploy ...@@ -219,7 +219,7 @@ TensorRT-LLM workers are configured through command-line arguments in the deploy
## Testing the Deployment ## Testing the Deployment
Send a test request to verify your deployment. See the [client section](../../../../docs/backends/vllm/README.md#client) for detailed instructions. Send a test request to verify your deployment. See the [client section](../../../../docs/pages/backends/vllm/README.md#client) for detailed instructions.
**Note:** For multi-node deployments, target the node running `python3 -m dynamo.frontend <args>`. **Note:** For multi-node deployments, target the node running `python3 -m dynamo.frontend <args>`.
...@@ -241,11 +241,11 @@ TensorRT-LLM supports two methods for KV cache transfer in disaggregated serving ...@@ -241,11 +241,11 @@ TensorRT-LLM supports two methods for KV cache transfer in disaggregated serving
- **UCX** (default): Standard method for KV cache transfer - **UCX** (default): Standard method for KV cache transfer
- **NIXL** (experimental): Alternative transfer method - **NIXL** (experimental): Alternative transfer method
For detailed configuration instructions, see the [KV cache transfer guide](../../../../docs/backends/trtllm/kv-cache-transfer.md). For detailed configuration instructions, see the [KV cache transfer guide](../../../../docs/pages/backends/trtllm/kv-cache-transfer.md).
## Request Migration ## Request Migration
You can enable [request migration](../../../../docs/fault_tolerance/request_migration.md) to handle worker failures gracefully by adding the migration limit argument to worker configurations: You can enable [request migration](../../../../docs/pages/fault-tolerance/request-migration.md) to handle worker failures gracefully by adding the migration limit argument to worker configurations:
```yaml ```yaml
args: args:
...@@ -264,13 +264,13 @@ Configure the `model` name and `host` based on your deployment. ...@@ -264,13 +264,13 @@ Configure the `model` name and `host` based on your deployment.
## Further Reading ## Further Reading
- **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/kubernetes/deployment/create_deployment.md) - **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/pages/kubernetes/deployment/create-deployment.md)
- **Quickstart**: [Deployment Quickstart](../../../../docs/kubernetes/README.md) - **Quickstart**: [Deployment Quickstart](../../../../docs/pages/kubernetes/README.md)
- **Platform Setup**: [Dynamo Kubernetes Platform Installation](../../../../docs/kubernetes/installation_guide.md) - **Platform Setup**: [Dynamo Kubernetes Platform Installation](../../../../docs/pages/kubernetes/installation-guide.md)
- **Examples**: [Deployment Examples](../../../../docs/examples/README.md) - **Examples**: [Deployment Examples](../../../../docs/pages/getting-started/examples.md)
- **Architecture Docs**: [Disaggregated Serving](../../../../docs/design_docs/disagg_serving.md), [KV-Aware Routing](../../../../docs/components/router/README.md) - **Architecture Docs**: [Disaggregated Serving](../../../../docs/pages/design-docs/disagg-serving.md), [KV-Aware Routing](../../../../docs/pages/components/router/README.md)
- **Multinode Deployment**: [Multinode Examples](../../../../docs/backends/trtllm/multinode/multinode-examples.md) - **Multinode Deployment**: [Multinode Examples](../../../../docs/pages/backends/trtllm/multinode/multinode-examples.md)
- **Speculative Decoding**: [Llama 4 + Eagle Guide](../../../../docs/backends/trtllm/llama4_plus_eagle.md) - **Speculative Decoding**: [Llama 4 + Eagle Guide](../../../../docs/pages/backends/trtllm/llama4-plus-eagle.md)
- **Kubernetes CRDs**: [Custom Resources Documentation](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) - **Kubernetes CRDs**: [Custom Resources Documentation](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
## Troubleshooting ## Troubleshooting
...@@ -285,4 +285,4 @@ Common issues and solutions: ...@@ -285,4 +285,4 @@ Common issues and solutions:
6. **Git LFS issues**: Ensure git-lfs is installed before building containers 6. **Git LFS issues**: Ensure git-lfs is installed before building containers
7. **ARM deployment**: Use `--platform linux/arm64` when building on ARM machines 7. **ARM deployment**: Use `--platform linux/arm64` when building on ARM machines
For additional support, refer to the [deployment troubleshooting guide](../../../../docs/kubernetes/README.md). For additional support, refer to the [deployment troubleshooting guide](../../../../docs/pages/kubernetes/README.md).
...@@ -41,7 +41,7 @@ Please note that: ...@@ -41,7 +41,7 @@ Please note that:
3. `post_process.py` - Scan the aiperf results to produce a json with entries to each config point. 3. `post_process.py` - Scan the aiperf results to produce a json with entries to each config point.
4. `plot_performance_comparison.py` - Takes the json result file for disaggregated and/or aggregated configuration sweeps and plots a pareto line for better visualization. 4. `plot_performance_comparison.py` - Takes the json result file for disaggregated and/or aggregated configuration sweeps and plots a pareto line for better visualization.
For more finer grained details on how to launch TRTLLM backend workers with DeepSeek R1 on GB200 slurm, please refer [multinode-examples.md](../../../../docs/backends/trtllm/multinode/multinode-examples.md). This guide shares similar assumption to the multinode examples guide. For more finer grained details on how to launch TRTLLM backend workers with DeepSeek R1 on GB200 slurm, please refer [multinode-examples.md](../../../../docs/pages/backends/trtllm/multinode/multinode-examples.md). This guide shares similar assumption to the multinode examples guide.
## Usage ## Usage
...@@ -49,7 +49,7 @@ For more finer grained details on how to launch TRTLLM backend workers with Deep ...@@ -49,7 +49,7 @@ For more finer grained details on how to launch TRTLLM backend workers with Deep
Before running the scripts, ensure you have: Before running the scripts, ensure you have:
1. Access to a SLURM cluster 1. Access to a SLURM cluster
2. Container image of Dynamo with TensorRT-LLM built using instructions from [here](https://github.com/ai-dynamo/dynamo/tree/main/docs/backends/trtllm/README.md#build-container). 2. Container image of Dynamo with TensorRT-LLM built using instructions from [here](https://github.com/ai-dynamo/dynamo/tree/main/docs/pages/backends/trtllm/README.md#build-container).
3. Model files accessible on the cluster 3. Model files accessible on the cluster
4. Required environment variables set 4. Required environment variables set
...@@ -69,7 +69,7 @@ export SLURM_JOB_NAME="" ...@@ -69,7 +69,7 @@ export SLURM_JOB_NAME=""
# NOTE: IMAGE must be set manually for now # NOTE: IMAGE must be set manually for now
# To build an iamge, see the steps here: # To build an iamge, see the steps here:
# https://github.com/ai-dynamo/dynamo/tree/main/docs/backends/trtllm/README.md#build-container # https://github.com/ai-dynamo/dynamo/tree/main/docs/pages/backends/trtllm/README.md#build-container
export IMAGE="<dynamo_trtllm_image>" export IMAGE="<dynamo_trtllm_image>"
# NOTE: In general, Deepseek R1 is very large, so it is recommended to # NOTE: In general, Deepseek R1 is very large, so it is recommended to
......
...@@ -92,7 +92,7 @@ extraPodSpec: ...@@ -92,7 +92,7 @@ extraPodSpec:
Before using these templates, ensure you have: Before using these templates, ensure you have:
1. **Dynamo Kubernetes Platform installed** - See [Quickstart Guide](../../../../docs/kubernetes/README.md) 1. **Dynamo Kubernetes Platform installed** - See [Quickstart Guide](../../../../docs/pages/kubernetes/README.md)
2. **Kubernetes cluster with GPU support** 2. **Kubernetes cluster with GPU support**
3. **Container registry access** for vLLM runtime images 3. **Container registry access** for vLLM runtime images
4. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`) 4. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`)
...@@ -110,7 +110,7 @@ docker build -f container/rendered.Dockerfile . ...@@ -110,7 +110,7 @@ docker build -f container/rendered.Dockerfile .
### Pre-Deployment Profiling (SLA Planner Only) ### Pre-Deployment Profiling (SLA Planner Only)
If using the SLA Planner deployment (`disagg_planner.yaml`), follow the [pre-deployment profiling guide](../../../../docs/components/profiler/profiler_guide.md) to run pre-deployment profiling. If using the SLA Planner deployment (`disagg_planner.yaml`), follow the [pre-deployment profiling guide](../../../../docs/pages/components/profiler/profiler-guide.md) to run pre-deployment profiling.
## Usage ## Usage
...@@ -235,7 +235,7 @@ All templates use **Qwen/Qwen3-0.6B** as the default model, but you can use any ...@@ -235,7 +235,7 @@ All templates use **Qwen/Qwen3-0.6B** as the default model, but you can use any
## Request Migration ## Request Migration
You can enable [request migration](../../../../docs/fault_tolerance/request_migration.md) to handle worker failures gracefully by adding the migration limit argument to worker configurations: You can enable [request migration](../../../../docs/pages/fault-tolerance/request-migration.md) to handle worker failures gracefully by adding the migration limit argument to worker configurations:
```yaml ```yaml
args: args:
...@@ -245,12 +245,12 @@ args: ...@@ -245,12 +245,12 @@ args:
## Further Reading ## Further Reading
- **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/kubernetes/deployment/create_deployment.md) - **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/pages/kubernetes/deployment/create-deployment.md)
- **Quickstart**: [Deployment Quickstart](../../../../docs/kubernetes/README.md) - **Quickstart**: [Deployment Quickstart](../../../../docs/pages/kubernetes/README.md)
- **Platform Setup**: [Dynamo Kubernetes Platform Installation](../../../../docs/kubernetes/installation_guide.md) - **Platform Setup**: [Dynamo Kubernetes Platform Installation](../../../../docs/pages/kubernetes/installation-guide.md)
- **SLA Planner**: [SLA Planner Quickstart Guide](../../../../docs/components/planner/planner_guide.md) - **SLA Planner**: [SLA Planner Quickstart Guide](../../../../docs/pages/components/planner/planner-guide.md)
- **Examples**: [Deployment Examples](../../../../docs/examples/README.md) - **Examples**: [Deployment Examples](../../../../docs/pages/getting-started/examples.md)
- **Architecture Docs**: [Disaggregated Serving](../../../../docs/design_docs/disagg_serving.md), [KV-Aware Routing](../../../../docs/components/router/README.md) - **Architecture Docs**: [Disaggregated Serving](../../../../docs/pages/design-docs/disagg-serving.md), [KV-Aware Routing](../../../../docs/pages/components/router/README.md)
## Troubleshooting ## Troubleshooting
...@@ -262,4 +262,4 @@ Common issues and solutions: ...@@ -262,4 +262,4 @@ Common issues and solutions:
4. **Out of memory**: Increase memory limits or reduce model batch size 4. **Out of memory**: Increase memory limits or reduce model batch size
5. **Port forwarding issues**: Ensure correct pod UUID in port-forward command 5. **Port forwarding issues**: Ensure correct pod UUID in port-forward command
For additional support, refer to the [deployment troubleshooting guide](../../../../docs/kubernetes/README.md). For additional support, refer to the [deployment troubleshooting guide](../../../../docs/pages/kubernetes/README.md).
...@@ -11,7 +11,7 @@ This deployment pattern enables dynamic LoRA adapter loading from S3-compatible ...@@ -11,7 +11,7 @@ This deployment pattern enables dynamic LoRA adapter loading from S3-compatible
- Kubernetes cluster with GPU support - Kubernetes cluster with GPU support
- Helm 3.x installed - Helm 3.x installed
- `kubectl` configured to access your cluster - `kubectl` configured to access your cluster
- Dynamo Kubernetes Platform installed ([Installation Guide](../../../../../docs/kubernetes/installation_guide.md)) - Dynamo Kubernetes Platform installed ([Installation Guide](../../../../../docs/pages/kubernetes/installation-guide.md))
- HuggingFace token for downloading Base and LoRA adapters - HuggingFace token for downloading Base and LoRA adapters
## Files in This Directory ## Files in This Directory
...@@ -293,5 +293,5 @@ kubectl delete secret hf-token-secret -n ${NAMESPACE} ...@@ -293,5 +293,5 @@ kubectl delete secret hf-token-secret -n ${NAMESPACE}
## Further Reading ## Further Reading
- [vLLM Deployment Guide](../README.md) - Other deployment patterns - [vLLM Deployment Guide](../README.md) - Other deployment patterns
- [Dynamo Kubernetes Guide](../../../../../docs/kubernetes/README.md) - Platform setup - [Dynamo Kubernetes Guide](../../../../../docs/pages/kubernetes/README.md) - Platform setup
- [Installation Guide](../../../../../docs/kubernetes/installation_guide.md) - Platform installation - [Installation Guide](../../../../../docs/pages/kubernetes/installation-guide.md) - Platform installation
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment