Unverified Commit ece08dc9 authored by Neal Vaidya's avatar Neal Vaidya Committed by GitHub
Browse files

docs: restructure docs directory and move fern config to fern/ (#6700)


Signed-off-by: default avatarNeal Vaidya <nealv@nvidia.com>
Co-authored-by: default avatarClaude Opus 4.6 <noreply@anthropic.com>
parent 1412e44b
...@@ -101,4 +101,4 @@ request arrives. ...@@ -101,4 +101,4 @@ request arrives.
4. The KV router routes the speculative request to the same worker, warming its cache. 4. The KV router routes the speculative request to the same worker, warming its cache.
5. When the real next-turn request arrives, the KV router sees high cache overlap on that worker and routes there, yielding a much lower TTFT. 5. When the real next-turn request arrives, the KV router sees high cache overlap on that worker and routes there, yielding a much lower TTFT.
See also: [Agent Hints documentation](../../../../docs/pages/components/router/agent-hints.md) See also: [Agent Hints documentation](../../../../docs/components/router/agent-hints.md)
...@@ -35,7 +35,7 @@ The Dynamo KVBM is a distributed KV-cache block management system designed for s ...@@ -35,7 +35,7 @@ The Dynamo KVBM is a distributed KV-cache block management system designed for s
pip install kvbm pip install kvbm
``` ```
See the [support matrix](../../../docs/pages/reference/support-matrix.md) for version compatibility questions. See the [support matrix](../../../docs/reference/support-matrix.md) for version compatibility questions.
## Build from Source ## Build from Source
...@@ -115,7 +115,7 @@ DYN_KVBM_CPU_CACHE_GB=100 vllm serve \ ...@@ -115,7 +115,7 @@ DYN_KVBM_CPU_CACHE_GB=100 vllm serve \
Qwen/Qwen3-8B Qwen/Qwen3-8B
``` ```
For more detailed integration with dynamo, disaggregated serving support and benchmarking, please check [vllm-setup](../../../docs/pages/components/kvbm/kvbm-guide.md#run-kvbm-in-dynamo-with-vllm) For more detailed integration with dynamo, disaggregated serving support and benchmarking, please check [vllm-setup](../../../docs/components/kvbm/kvbm-guide.md#run-kvbm-in-dynamo-with-vllm)
### TensorRT-LLM ### TensorRT-LLM
...@@ -137,11 +137,11 @@ DYN_KVBM_CPU_CACHE_GB=100 trtllm-serve Qwen/Qwen3-8B \ ...@@ -137,11 +137,11 @@ DYN_KVBM_CPU_CACHE_GB=100 trtllm-serve Qwen/Qwen3-8B \
--extra_llm_api_options /tmp/kvbm_llm_api_config.yaml --extra_llm_api_options /tmp/kvbm_llm_api_config.yaml
``` ```
For more detailed integration with dynamo and benchmarking, please check [trtllm-setup](../../../docs/pages/components/kvbm/kvbm-guide.md#run-kvbm-in-dynamo-with-tensorrt-llm) For more detailed integration with dynamo and benchmarking, please check [trtllm-setup](../../../docs/components/kvbm/kvbm-guide.md#run-kvbm-in-dynamo-with-tensorrt-llm)
## 📚 Docs ## 📚 Docs
- [Architecture](../../../docs/pages/components/kvbm/README.md#architecture) - [Architecture](../../../docs/components/kvbm/README.md#architecture)
- [Design Deepdive](../../../docs/pages/design-docs/kvbm-design.md) - [Design Deepdive](../../../docs/design-docs/kvbm-design.md)
- [NIXL Overview](https://github.com/ai-dynamo/nixl/blob/main/docs/nixl.md) - [NIXL Overview](https://github.com/ai-dynamo/nixl/blob/main/docs/nixl.md)
...@@ -50,7 +50,7 @@ maturin develop --uv ...@@ -50,7 +50,7 @@ maturin develop --uv
### Prerequisite ### Prerequisite
See [README.md](../../../docs/pages/development/runtime-guide.md#prerequisites). See [README.md](../../../docs/development/runtime-guide.md#prerequisites).
### Hello World Example ### Hello World Example
......
...@@ -36,7 +36,7 @@ ...@@ -36,7 +36,7 @@
//! (`dynamo_component_inflight_requests`, `dynamo_component_requests_total`, etc.) //! (`dynamo_component_inflight_requests`, `dynamo_component_requests_total`, etc.)
//! via the system status server when `DYN_SYSTEM_PORT` is explicitly set. //! via the system status server when `DYN_SYSTEM_PORT` is explicitly set.
//! //!
//! See also: `docs/pages/observability/metrics.md` (Router Metrics section). //! See also: `docs/observability/metrics.md` (Router Metrics section).
use std::sync::{Arc, LazyLock, OnceLock}; use std::sync::{Arc, LazyLock, OnceLock};
use std::time::Duration; use std::time::Duration;
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
Production-tested Kubernetes deployment recipes for LLM inference using NVIDIA Dynamo. Production-tested Kubernetes deployment recipes for LLM inference using NVIDIA Dynamo.
> **Prerequisites:** This guide assumes you have already installed the Dynamo Kubernetes Platform. > **Prerequisites:** This guide assumes you have already installed the Dynamo Kubernetes Platform.
> If not, follow the **[Kubernetes Deployment Guide](../docs/pages/kubernetes/README.md)** first. > If not, follow the **[Kubernetes Deployment Guide](../docs/kubernetes/README.md)** first.
## Available Recipes ## Available Recipes
...@@ -67,8 +67,8 @@ Each complete recipe follows this standard structure: ...@@ -67,8 +67,8 @@ Each complete recipe follows this standard structure:
The recipes require the Dynamo Kubernetes Platform to be installed. Follow the installation guide: The recipes require the Dynamo Kubernetes Platform to be installed. Follow the installation guide:
- **[Kubernetes Deployment Guide](../docs/pages/kubernetes/README.md)** - Quickstart (~10 minutes) - **[Kubernetes Deployment Guide](../docs/kubernetes/README.md)** - Quickstart (~10 minutes)
- **[Detailed Installation Guide](../docs/pages/kubernetes/installation-guide.md)** - Advanced options - **[Detailed Installation Guide](../docs/kubernetes/installation-guide.md)** - Advanced options
**2. GPU Cluster Requirements** **2. GPU Cluster Requirements**
...@@ -289,18 +289,18 @@ image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:x.y.z ...@@ -289,18 +289,18 @@ image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:x.y.z
- Review pod logs: `kubectl logs <pod-name> -n ${NAMESPACE}` - Review pod logs: `kubectl logs <pod-name> -n ${NAMESPACE}`
**For more troubleshooting:** **For more troubleshooting:**
- [Kubernetes Deployment Guide](../docs/pages/kubernetes/README.md#troubleshooting) - [Kubernetes Deployment Guide](../docs/kubernetes/README.md#troubleshooting)
- [Observability Documentation](../docs/pages/kubernetes/observability/) - [Observability Documentation](../docs/kubernetes/observability/)
## Related Documentation ## Related Documentation
- **[Kubernetes Deployment Guide](../docs/pages/kubernetes/README.md)** - Platform installation and concepts - **[Kubernetes Deployment Guide](../docs/kubernetes/README.md)** - Platform installation and concepts
- **[API Reference](../docs/pages/kubernetes/api-reference.md)** - DynamoGraphDeployment CRD specification - **[API Reference](../docs/kubernetes/api-reference.md)** - DynamoGraphDeployment CRD specification
- **[vLLM Backend Guide](../docs/pages/backends/vllm/README.md)** - vLLM-specific features - **[vLLM Backend Guide](../docs/backends/vllm/README.md)** - vLLM-specific features
- **[SGLang Backend Guide](../docs/pages/backends/sglang/README.md)** - SGLang-specific features - **[SGLang Backend Guide](../docs/backends/sglang/README.md)** - SGLang-specific features
- **[TensorRT-LLM Backend Guide](../docs/pages/backends/trtllm/README.md)** - TensorRT-LLM features - **[TensorRT-LLM Backend Guide](../docs/backends/trtllm/README.md)** - TensorRT-LLM features
- **[Observability](../docs/pages/kubernetes/observability/)** - Monitoring and logging - **[Observability](../docs/kubernetes/observability/)** - Monitoring and logging
- **[Benchmarking Guide](../docs/pages/benchmarks/benchmarking.md)** - Performance testing - **[Benchmarking Guide](../docs/benchmarks/benchmarking.md)** - Performance testing
## Contributing ## Contributing
......
...@@ -13,7 +13,7 @@ Production-ready deployments for **DeepSeek-R1** (671B MoE) across multiple back ...@@ -13,7 +13,7 @@ Production-ready deployments for **DeepSeek-R1** (671B MoE) across multiple back
## Prerequisites ## Prerequisites
1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/pages/kubernetes/README.md) 1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/kubernetes/README.md)
2. **GPU cluster** with H200 or GB200 GPUs matching the configuration requirements 2. **GPU cluster** with H200 or GB200 GPUs matching the configuration requirements
3. **HuggingFace token** with access to DeepSeek models 3. **HuggingFace token** with access to DeepSeek models
4. **High-bandwidth networking** — InfiniBand or RoCE recommended for multi-node deployments 4. **High-bandwidth networking** — InfiniBand or RoCE recommended for multi-node deployments
......
...@@ -13,7 +13,7 @@ This recipe deploys DeepSeek-R1 using vLLM in a disaggregated prefill/decode set ...@@ -13,7 +13,7 @@ This recipe deploys DeepSeek-R1 using vLLM in a disaggregated prefill/decode set
### 0) Prerequisites: Install the platform ### 0) Prerequisites: Install the platform
Follow the Kubernetes deployment guide to install the Dynamo platform and prerequisites (CRDs/operator, etc.): Follow the Kubernetes deployment guide to install the Dynamo platform and prerequisites (CRDs/operator, etc.):
- `docs/pages/kubernetes/README.md` - `docs/kubernetes/README.md`
Ensure you have a GPU-enabled cluster with sufficient capacity (32x H100/H200 "Hopper" across 4 nodes), and that the NVIDIA GPU Operator is healthy. Ensure you have a GPU-enabled cluster with sufficient capacity (32x H100/H200 "Hopper" across 4 nodes), and that the NVIDIA GPU Operator is healthy.
......
...@@ -12,7 +12,7 @@ Production-ready deployments for **Llama-3.3-70B-Instruct** using vLLM with FP8 ...@@ -12,7 +12,7 @@ Production-ready deployments for **Llama-3.3-70B-Instruct** using vLLM with FP8
## Prerequisites ## Prerequisites
1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/pages/kubernetes/README.md) 1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/kubernetes/README.md)
2. **GPU cluster** with H100 or H200 GPUs matching the configuration requirements 2. **GPU cluster** with H100 or H200 GPUs matching the configuration requirements
3. **HuggingFace token** with access to Llama models 3. **HuggingFace token** with access to Llama models
......
...@@ -11,7 +11,7 @@ Production-ready deployments for **Qwen3-235B-A22B** (MoE model with 22B active ...@@ -11,7 +11,7 @@ Production-ready deployments for **Qwen3-235B-A22B** (MoE model with 22B active
## Prerequisites ## Prerequisites
1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/pages/kubernetes/README.md) 1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/kubernetes/README.md)
2. **GPU cluster** with H100/H200 GPUs (high memory recommended) 2. **GPU cluster** with H100/H200 GPUs (high memory recommended)
3. **HuggingFace token** with access to Qwen models 3. **HuggingFace token** with access to Qwen models
......
...@@ -11,7 +11,7 @@ Production-ready deployments for **Qwen3-32B** with FP8 quantization using Tenso ...@@ -11,7 +11,7 @@ Production-ready deployments for **Qwen3-32B** with FP8 quantization using Tenso
## Prerequisites ## Prerequisites
1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/pages/kubernetes/README.md) 1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/kubernetes/README.md)
2. **GPU cluster** with H100/H200/A100 GPUs 2. **GPU cluster** with H100/H200/A100 GPUs
3. **HuggingFace token** with access to Qwen models 3. **HuggingFace token** with access to Qwen models
......
...@@ -40,7 +40,7 @@ This workload is ideal for KV-aware routing—with 36.64% cache efficiency, requ ...@@ -40,7 +40,7 @@ This workload is ideal for KV-aware routing—with 36.64% cache efficiency, requ
## Prerequisites ## Prerequisites
1. **Dynamo Platform installed** - See [Kubernetes Deployment Guide](../../docs/pages/kubernetes/README.md) 1. **Dynamo Platform installed** - See [Kubernetes Deployment Guide](../../docs/kubernetes/README.md)
2. **16x H200 GPUs** across 2 nodes 2. **16x H200 GPUs** across 2 nodes
3. **HuggingFace token** configured: 3. **HuggingFace token** configured:
```bash ```bash
......
...@@ -655,7 +655,7 @@ graph LR ...@@ -655,7 +655,7 @@ graph LR
### Install Dynamo Platform ### Install Dynamo Platform
Follow the [instructions](../../../docs/pages/kubernetes/installation-guide.md) to install `Dynamo` in your Kubernetes cluster. Follow the [instructions](../../../docs/kubernetes/installation-guide.md) to install `Dynamo` in your Kubernetes cluster.
### Mount Workspace and Kube Config ### Mount Workspace and Kube Config
......
...@@ -23,7 +23,7 @@ Use the pre-configured test deployment with sample profiling data, we provide th ...@@ -23,7 +23,7 @@ Use the pre-configured test deployment with sample profiling data, we provide th
### Option B: Use Your Own Profiling Results ### Option B: Use Your Own Profiling Results
1. Run pre-deployment profiling for your specific setup. See the [pre-deployment profiling documentation](../../docs/pages/components/profiler/profiler-guide.md) for detailed instructions. 1. Run pre-deployment profiling for your specific setup. See the [pre-deployment profiling documentation](../../docs/components/profiler/profiler-guide.md) for detailed instructions.
## Interpolator Testing ## Interpolator Testing
...@@ -165,8 +165,8 @@ Test complete scaling behavior including Kubernetes deployment and load generati ...@@ -165,8 +165,8 @@ Test complete scaling behavior including Kubernetes deployment and load generati
**Prerequisites:** **Prerequisites:**
- **[kube-prometheus-stack](../../docs/pages/kubernetes/observability/metrics.md) installed and running.** The SLA planner requires Prometheus to observe metrics and make scaling decisions. - **[kube-prometheus-stack](../../docs/kubernetes/observability/metrics.md) installed and running.** The SLA planner requires Prometheus to observe metrics and make scaling decisions.
- Ensure the Dynamo operator was installed with the Prometheus endpoint configured (see [SLA Planner Quickstart Guide](../../docs/pages/components/planner/planner-guide.md#prerequisites) for details). - Ensure the Dynamo operator was installed with the Prometheus endpoint configured (see [SLA Planner Quickstart Guide](../../docs/components/planner/planner-guide.md#prerequisites) for details).
**Prepare the test deployment manifest:** **Prepare the test deployment manifest:**
...@@ -209,7 +209,7 @@ Remove `volumes` and `volumeMounts`: ...@@ -209,7 +209,7 @@ Remove `volumes` and `volumeMounts`:
- name: planner-profile-data - name: planner-profile-data
configMap: configMap:
# Must be pre-created before deployment by the profiler # Must be pre-created before deployment by the profiler
# See docs/pages/components/planner/planner-guide.md for more details # See docs/components/planner/planner-guide.md for more details
name: planner-profile-data name: planner-profile-data
``` ```
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment