Unverified Commit ece08dc9 authored by Neal Vaidya's avatar Neal Vaidya Committed by GitHub
Browse files

docs: restructure docs directory and move fern config to fern/ (#6700)


Signed-off-by: default avatarNeal Vaidya <nealv@nvidia.com>
Co-authored-by: default avatarClaude Opus 4.6 <noreply@anthropic.com>
parent 1412e44b
...@@ -82,4 +82,4 @@ If you're running Kubernetes/cloud deployment examples (EKS, AKS, GKE), you'll a ...@@ -82,4 +82,4 @@ If you're running Kubernetes/cloud deployment examples (EKS, AKS, GKE), you'll a
| **kubectl** | v1.24+ | [Install kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl) | | **kubectl** | v1.24+ | [Install kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl) |
| **Helm** | v3.0+ | [Install Helm](https://helm.sh/docs/intro/install/) | | **Helm** | v3.0+ | [Install Helm](https://helm.sh/docs/intro/install/) |
See the [Kubernetes Installation Guide](/docs/pages/kubernetes/installation-guide.md#prerequisites) for detailed setup instructions and pre-deployment checks. See the [Kubernetes Installation Guide](/docs/kubernetes/installation-guide.md#prerequisites) for detailed setup instructions and pre-deployment checks.
...@@ -74,7 +74,7 @@ extraPodSpec: ...@@ -74,7 +74,7 @@ extraPodSpec:
Before using these templates, ensure you have: Before using these templates, ensure you have:
1. **Dynamo Kubernetes Platform installed** - See [Installing Dynamo Kubernetes Platform](../../../../docs/pages/kubernetes/installation-guide.md) 1. **Dynamo Kubernetes Platform installed** - See [Installing Dynamo Kubernetes Platform](../../../../docs/kubernetes/installation-guide.md)
2. **Kubernetes cluster with GPU support** 2. **Kubernetes cluster with GPU support**
3. **Container registry access** for SGLang runtime images 3. **Container registry access** for SGLang runtime images
4. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`) 4. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`)
...@@ -144,10 +144,10 @@ All templates use **DeepSeek-R1-Distill-Llama-8B** as the default model. But you ...@@ -144,10 +144,10 @@ All templates use **DeepSeek-R1-Distill-Llama-8B** as the default model. But you
## Further Reading ## Further Reading
- **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/pages/kubernetes/deployment/create-deployment.md) - **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/kubernetes/deployment/create-deployment.md)
- **Quickstart**: [Deployment Quickstart](../../../../docs/pages/kubernetes/README.md) - **Quickstart**: [Deployment Quickstart](../../../../docs/kubernetes/README.md)
- **Platform Setup**: [Dynamo Kubernetes Platform Installation](../../../../docs/pages/kubernetes/installation-guide.md) - **Platform Setup**: [Dynamo Kubernetes Platform Installation](../../../../docs/kubernetes/installation-guide.md)
- **Examples**: [Deployment Examples](../../../../docs/pages/getting-started/examples.md) - **Examples**: [Deployment Examples](../../../../docs/getting-started/examples.md)
- **Kubernetes CRDs**: [Custom Resources Documentation](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) - **Kubernetes CRDs**: [Custom Resources Documentation](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
## Troubleshooting ## Troubleshooting
...@@ -159,4 +159,4 @@ Common issues and solutions: ...@@ -159,4 +159,4 @@ Common issues and solutions:
3. **Health check failures**: Review model loading logs and increase `initialDelaySeconds` 3. **Health check failures**: Review model loading logs and increase `initialDelaySeconds`
4. **Out of memory**: Increase memory limits or reduce model batch size 4. **Out of memory**: Increase memory limits or reduce model batch size
For additional support, refer to the [deployment guide](../../../../docs/pages/kubernetes/README.md). For additional support, refer to the [deployment guide](../../../../docs/kubernetes/README.md).
...@@ -223,6 +223,6 @@ To add other backends (TensorRT, ONNX, Python, etc.), edit the Makefile's `build ...@@ -223,6 +223,6 @@ To add other backends (TensorRT, ONNX, Python, etc.), edit the Makefile's `build
## Related Documentation ## Related Documentation
- [Dynamo Backend Guide](../../../docs/pages/development/backend-guide.md) - [Dynamo Backend Guide](../../../docs/development/backend-guide.md)
- [Triton Inference Server](https://github.com/triton-inference-server/server) - [Triton Inference Server](https://github.com/triton-inference-server/server)
- [KServe Protocol](https://kserve.github.io/website/latest/modelserving/data_plane/v2_protocol/) - [KServe Protocol](https://kserve.github.io/website/latest/modelserving/data_plane/v2_protocol/)
...@@ -53,7 +53,7 @@ Advanced disaggregated deployment with SLA-based automatic scaling. ...@@ -53,7 +53,7 @@ Advanced disaggregated deployment with SLA-based automatic scaling.
- `TRTLLMPrefillWorker`: Specialized prefill-only worker - `TRTLLMPrefillWorker`: Specialized prefill-only worker
> [!NOTE] > [!NOTE]
> This deployment requires pre-deployment profiling to be completed first. See [Pre-Deployment Profiling](../../../../docs/pages/components/profiler/profiler-guide.md) for detailed instructions. > This deployment requires pre-deployment profiling to be completed first. See [Pre-Deployment Profiling](../../../../docs/components/profiler/profiler-guide.md) for detailed instructions.
## CRD Structure ## CRD Structure
...@@ -102,7 +102,7 @@ extraPodSpec: ...@@ -102,7 +102,7 @@ extraPodSpec:
Before using these templates, ensure you have: Before using these templates, ensure you have:
1. **Dynamo Kubernetes Platform installed** - See [Quickstart Guide](../../../../docs/pages/kubernetes/README.md) 1. **Dynamo Kubernetes Platform installed** - See [Quickstart Guide](../../../../docs/kubernetes/README.md)
2. **Kubernetes cluster with GPU support** 2. **Kubernetes cluster with GPU support**
3. **Container registry access** for TensorRT-LLM runtime images 3. **Container registry access** for TensorRT-LLM runtime images
4. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`) 4. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`)
...@@ -155,7 +155,7 @@ args: ...@@ -155,7 +155,7 @@ args:
### 3. Deploy ### 3. Deploy
See the [Create Deployment Guide](../../../../docs/pages/kubernetes/deployment/create-deployment.md) to learn how to deploy the deployment file. See the [Create Deployment Guide](../../../../docs/kubernetes/deployment/create-deployment.md) to learn how to deploy the deployment file.
First, create a secret for the HuggingFace token. First, create a secret for the HuggingFace token.
```bash ```bash
...@@ -219,7 +219,7 @@ TensorRT-LLM workers are configured through command-line arguments in the deploy ...@@ -219,7 +219,7 @@ TensorRT-LLM workers are configured through command-line arguments in the deploy
## Testing the Deployment ## Testing the Deployment
Send a test request to verify your deployment. See the [client section](../../../../docs/pages/backends/vllm/README.md#client) for detailed instructions. Send a test request to verify your deployment. See the [client section](../../../../docs/backends/vllm/README.md#client) for detailed instructions.
**Note:** For multi-node deployments, target the node running `python3 -m dynamo.frontend <args>`. **Note:** For multi-node deployments, target the node running `python3 -m dynamo.frontend <args>`.
...@@ -241,11 +241,11 @@ TensorRT-LLM supports two methods for KV cache transfer in disaggregated serving ...@@ -241,11 +241,11 @@ TensorRT-LLM supports two methods for KV cache transfer in disaggregated serving
- **UCX** (default): Standard method for KV cache transfer - **UCX** (default): Standard method for KV cache transfer
- **NIXL** (experimental): Alternative transfer method - **NIXL** (experimental): Alternative transfer method
For detailed configuration instructions, see the [KV cache transfer guide](../../../../docs/pages/backends/trtllm/kv-cache-transfer.md). For detailed configuration instructions, see the [KV cache transfer guide](../../../../docs/backends/trtllm/kv-cache-transfer.md).
## Request Migration ## Request Migration
You can enable [request migration](../../../../docs/pages/fault-tolerance/request-migration.md) to handle worker failures gracefully by adding the migration limit argument to worker configurations: You can enable [request migration](../../../../docs/fault-tolerance/request-migration.md) to handle worker failures gracefully by adding the migration limit argument to worker configurations:
```yaml ```yaml
args: args:
...@@ -264,13 +264,13 @@ Configure the `model` name and `host` based on your deployment. ...@@ -264,13 +264,13 @@ Configure the `model` name and `host` based on your deployment.
## Further Reading ## Further Reading
- **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/pages/kubernetes/deployment/create-deployment.md) - **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/kubernetes/deployment/create-deployment.md)
- **Quickstart**: [Deployment Quickstart](../../../../docs/pages/kubernetes/README.md) - **Quickstart**: [Deployment Quickstart](../../../../docs/kubernetes/README.md)
- **Platform Setup**: [Dynamo Kubernetes Platform Installation](../../../../docs/pages/kubernetes/installation-guide.md) - **Platform Setup**: [Dynamo Kubernetes Platform Installation](../../../../docs/kubernetes/installation-guide.md)
- **Examples**: [Deployment Examples](../../../../docs/pages/getting-started/examples.md) - **Examples**: [Deployment Examples](../../../../docs/getting-started/examples.md)
- **Architecture Docs**: [Disaggregated Serving](../../../../docs/pages/design-docs/disagg-serving.md), [KV-Aware Routing](../../../../docs/pages/components/router/README.md) - **Architecture Docs**: [Disaggregated Serving](../../../../docs/design-docs/disagg-serving.md), [KV-Aware Routing](../../../../docs/components/router/README.md)
- **Multinode Deployment**: [Multinode Examples](../../../../docs/pages/backends/trtllm/multinode/multinode-examples.md) - **Multinode Deployment**: [Multinode Examples](../../../../docs/backends/trtllm/multinode/multinode-examples.md)
- **Speculative Decoding**: [Llama 4 + Eagle Guide](../../../../docs/pages/backends/trtllm/llama4-plus-eagle.md) - **Speculative Decoding**: [Llama 4 + Eagle Guide](../../../../docs/backends/trtllm/llama4-plus-eagle.md)
- **Kubernetes CRDs**: [Custom Resources Documentation](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) - **Kubernetes CRDs**: [Custom Resources Documentation](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
## Troubleshooting ## Troubleshooting
...@@ -285,4 +285,4 @@ Common issues and solutions: ...@@ -285,4 +285,4 @@ Common issues and solutions:
6. **Git LFS issues**: Ensure git-lfs is installed before building containers 6. **Git LFS issues**: Ensure git-lfs is installed before building containers
7. **ARM deployment**: Use `--platform linux/arm64` when building on ARM machines 7. **ARM deployment**: Use `--platform linux/arm64` when building on ARM machines
For additional support, refer to the [deployment troubleshooting guide](../../../../docs/pages/kubernetes/README.md). For additional support, refer to the [deployment troubleshooting guide](../../../../docs/kubernetes/README.md).
...@@ -41,7 +41,7 @@ Please note that: ...@@ -41,7 +41,7 @@ Please note that:
3. `post_process.py` - Scan the aiperf results to produce a json with entries to each config point. 3. `post_process.py` - Scan the aiperf results to produce a json with entries to each config point.
4. `plot_performance_comparison.py` - Takes the json result file for disaggregated and/or aggregated configuration sweeps and plots a pareto line for better visualization. 4. `plot_performance_comparison.py` - Takes the json result file for disaggregated and/or aggregated configuration sweeps and plots a pareto line for better visualization.
For more finer grained details on how to launch TRTLLM backend workers with DeepSeek R1 on GB200 slurm, please refer [multinode-examples.md](../../../../docs/pages/backends/trtllm/multinode/multinode-examples.md). This guide shares similar assumption to the multinode examples guide. For more finer grained details on how to launch TRTLLM backend workers with DeepSeek R1 on GB200 slurm, please refer [multinode-examples.md](../../../../docs/backends/trtllm/multinode/multinode-examples.md). This guide shares similar assumption to the multinode examples guide.
## Usage ## Usage
...@@ -49,7 +49,7 @@ For more finer grained details on how to launch TRTLLM backend workers with Deep ...@@ -49,7 +49,7 @@ For more finer grained details on how to launch TRTLLM backend workers with Deep
Before running the scripts, ensure you have: Before running the scripts, ensure you have:
1. Access to a SLURM cluster 1. Access to a SLURM cluster
2. Container image of Dynamo with TensorRT-LLM built using instructions from [here](https://github.com/ai-dynamo/dynamo/tree/main/docs/pages/backends/trtllm/README.md#build-container). 2. Container image of Dynamo with TensorRT-LLM built using instructions from [here](https://github.com/ai-dynamo/dynamo/tree/main/docs/backends/trtllm/README.md#build-container).
3. Model files accessible on the cluster 3. Model files accessible on the cluster
4. Required environment variables set 4. Required environment variables set
...@@ -69,7 +69,7 @@ export SLURM_JOB_NAME="" ...@@ -69,7 +69,7 @@ export SLURM_JOB_NAME=""
# NOTE: IMAGE must be set manually for now # NOTE: IMAGE must be set manually for now
# To build an iamge, see the steps here: # To build an iamge, see the steps here:
# https://github.com/ai-dynamo/dynamo/tree/main/docs/pages/backends/trtllm/README.md#build-container # https://github.com/ai-dynamo/dynamo/tree/main/docs/backends/trtllm/README.md#build-container
export IMAGE="<dynamo_trtllm_image>" export IMAGE="<dynamo_trtllm_image>"
# NOTE: In general, Deepseek R1 is very large, so it is recommended to # NOTE: In general, Deepseek R1 is very large, so it is recommended to
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment