Unverified Commit ece08dc9 authored by Neal Vaidya's avatar Neal Vaidya Committed by GitHub
Browse files

docs: restructure docs directory and move fern config to fern/ (#6700)


Signed-off-by: default avatarNeal Vaidya <nealv@nvidia.com>
Co-authored-by: default avatarClaude Opus 4.6 <noreply@anthropic.com>
parent 1412e44b
......@@ -101,4 +101,4 @@ request arrives.
4. The KV router routes the speculative request to the same worker, warming its cache.
5. When the real next-turn request arrives, the KV router sees high cache overlap on that worker and routes there, yielding a much lower TTFT.
See also: [Agent Hints documentation](../../../../docs/pages/components/router/agent-hints.md)
See also: [Agent Hints documentation](../../../../docs/components/router/agent-hints.md)
......@@ -35,7 +35,7 @@ The Dynamo KVBM is a distributed KV-cache block management system designed for s
pip install kvbm
```
See the [support matrix](../../../docs/pages/reference/support-matrix.md) for version compatibility questions.
See the [support matrix](../../../docs/reference/support-matrix.md) for version compatibility questions.
## Build from Source
......@@ -115,7 +115,7 @@ DYN_KVBM_CPU_CACHE_GB=100 vllm serve \
Qwen/Qwen3-8B
```
For more detailed integration with dynamo, disaggregated serving support and benchmarking, please check [vllm-setup](../../../docs/pages/components/kvbm/kvbm-guide.md#run-kvbm-in-dynamo-with-vllm)
For more detailed integration with dynamo, disaggregated serving support and benchmarking, please check [vllm-setup](../../../docs/components/kvbm/kvbm-guide.md#run-kvbm-in-dynamo-with-vllm)
### TensorRT-LLM
......@@ -137,11 +137,11 @@ DYN_KVBM_CPU_CACHE_GB=100 trtllm-serve Qwen/Qwen3-8B \
--extra_llm_api_options /tmp/kvbm_llm_api_config.yaml
```
For more detailed integration with dynamo and benchmarking, please check [trtllm-setup](../../../docs/pages/components/kvbm/kvbm-guide.md#run-kvbm-in-dynamo-with-tensorrt-llm)
For more detailed integration with dynamo and benchmarking, please check [trtllm-setup](../../../docs/components/kvbm/kvbm-guide.md#run-kvbm-in-dynamo-with-tensorrt-llm)
## 📚 Docs
- [Architecture](../../../docs/pages/components/kvbm/README.md#architecture)
- [Design Deepdive](../../../docs/pages/design-docs/kvbm-design.md)
- [Architecture](../../../docs/components/kvbm/README.md#architecture)
- [Design Deepdive](../../../docs/design-docs/kvbm-design.md)
- [NIXL Overview](https://github.com/ai-dynamo/nixl/blob/main/docs/nixl.md)
......@@ -50,7 +50,7 @@ maturin develop --uv
### Prerequisite
See [README.md](../../../docs/pages/development/runtime-guide.md#prerequisites).
See [README.md](../../../docs/development/runtime-guide.md#prerequisites).
### Hello World Example
......
......@@ -36,7 +36,7 @@
//! (`dynamo_component_inflight_requests`, `dynamo_component_requests_total`, etc.)
//! via the system status server when `DYN_SYSTEM_PORT` is explicitly set.
//!
//! See also: `docs/pages/observability/metrics.md` (Router Metrics section).
//! See also: `docs/observability/metrics.md` (Router Metrics section).
use std::sync::{Arc, LazyLock, OnceLock};
use std::time::Duration;
......
......@@ -3,7 +3,7 @@
Production-tested Kubernetes deployment recipes for LLM inference using NVIDIA Dynamo.
> **Prerequisites:** This guide assumes you have already installed the Dynamo Kubernetes Platform.
> If not, follow the **[Kubernetes Deployment Guide](../docs/pages/kubernetes/README.md)** first.
> If not, follow the **[Kubernetes Deployment Guide](../docs/kubernetes/README.md)** first.
## Available Recipes
......@@ -67,8 +67,8 @@ Each complete recipe follows this standard structure:
The recipes require the Dynamo Kubernetes Platform to be installed. Follow the installation guide:
- **[Kubernetes Deployment Guide](../docs/pages/kubernetes/README.md)** - Quickstart (~10 minutes)
- **[Detailed Installation Guide](../docs/pages/kubernetes/installation-guide.md)** - Advanced options
- **[Kubernetes Deployment Guide](../docs/kubernetes/README.md)** - Quickstart (~10 minutes)
- **[Detailed Installation Guide](../docs/kubernetes/installation-guide.md)** - Advanced options
**2. GPU Cluster Requirements**
......@@ -289,18 +289,18 @@ image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:x.y.z
- Review pod logs: `kubectl logs <pod-name> -n ${NAMESPACE}`
**For more troubleshooting:**
- [Kubernetes Deployment Guide](../docs/pages/kubernetes/README.md#troubleshooting)
- [Observability Documentation](../docs/pages/kubernetes/observability/)
- [Kubernetes Deployment Guide](../docs/kubernetes/README.md#troubleshooting)
- [Observability Documentation](../docs/kubernetes/observability/)
## Related Documentation
- **[Kubernetes Deployment Guide](../docs/pages/kubernetes/README.md)** - Platform installation and concepts
- **[API Reference](../docs/pages/kubernetes/api-reference.md)** - DynamoGraphDeployment CRD specification
- **[vLLM Backend Guide](../docs/pages/backends/vllm/README.md)** - vLLM-specific features
- **[SGLang Backend Guide](../docs/pages/backends/sglang/README.md)** - SGLang-specific features
- **[TensorRT-LLM Backend Guide](../docs/pages/backends/trtllm/README.md)** - TensorRT-LLM features
- **[Observability](../docs/pages/kubernetes/observability/)** - Monitoring and logging
- **[Benchmarking Guide](../docs/pages/benchmarks/benchmarking.md)** - Performance testing
- **[Kubernetes Deployment Guide](../docs/kubernetes/README.md)** - Platform installation and concepts
- **[API Reference](../docs/kubernetes/api-reference.md)** - DynamoGraphDeployment CRD specification
- **[vLLM Backend Guide](../docs/backends/vllm/README.md)** - vLLM-specific features
- **[SGLang Backend Guide](../docs/backends/sglang/README.md)** - SGLang-specific features
- **[TensorRT-LLM Backend Guide](../docs/backends/trtllm/README.md)** - TensorRT-LLM features
- **[Observability](../docs/kubernetes/observability/)** - Monitoring and logging
- **[Benchmarking Guide](../docs/benchmarks/benchmarking.md)** - Performance testing
## Contributing
......
......@@ -13,7 +13,7 @@ Production-ready deployments for **DeepSeek-R1** (671B MoE) across multiple back
## Prerequisites
1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/pages/kubernetes/README.md)
1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/kubernetes/README.md)
2. **GPU cluster** with H200 or GB200 GPUs matching the configuration requirements
3. **HuggingFace token** with access to DeepSeek models
4. **High-bandwidth networking** — InfiniBand or RoCE recommended for multi-node deployments
......
......@@ -13,7 +13,7 @@ This recipe deploys DeepSeek-R1 using vLLM in a disaggregated prefill/decode set
### 0) Prerequisites: Install the platform
Follow the Kubernetes deployment guide to install the Dynamo platform and prerequisites (CRDs/operator, etc.):
- `docs/pages/kubernetes/README.md`
- `docs/kubernetes/README.md`
Ensure you have a GPU-enabled cluster with sufficient capacity (32x H100/H200 "Hopper" across 4 nodes), and that the NVIDIA GPU Operator is healthy.
......
......@@ -12,7 +12,7 @@ Production-ready deployments for **Llama-3.3-70B-Instruct** using vLLM with FP8
## Prerequisites
1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/pages/kubernetes/README.md)
1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/kubernetes/README.md)
2. **GPU cluster** with H100 or H200 GPUs matching the configuration requirements
3. **HuggingFace token** with access to Llama models
......
......@@ -11,7 +11,7 @@ Production-ready deployments for **Qwen3-235B-A22B** (MoE model with 22B active
## Prerequisites
1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/pages/kubernetes/README.md)
1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/kubernetes/README.md)
2. **GPU cluster** with H100/H200 GPUs (high memory recommended)
3. **HuggingFace token** with access to Qwen models
......
......@@ -11,7 +11,7 @@ Production-ready deployments for **Qwen3-32B** with FP8 quantization using Tenso
## Prerequisites
1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/pages/kubernetes/README.md)
1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/kubernetes/README.md)
2. **GPU cluster** with H100/H200/A100 GPUs
3. **HuggingFace token** with access to Qwen models
......
......@@ -40,7 +40,7 @@ This workload is ideal for KV-aware routing—with 36.64% cache efficiency, requ
## Prerequisites
1. **Dynamo Platform installed** - See [Kubernetes Deployment Guide](../../docs/pages/kubernetes/README.md)
1. **Dynamo Platform installed** - See [Kubernetes Deployment Guide](../../docs/kubernetes/README.md)
2. **16x H200 GPUs** across 2 nodes
3. **HuggingFace token** configured:
```bash
......
......@@ -655,7 +655,7 @@ graph LR
### Install Dynamo Platform
Follow the [instructions](../../../docs/pages/kubernetes/installation-guide.md) to install `Dynamo` in your Kubernetes cluster.
Follow the [instructions](../../../docs/kubernetes/installation-guide.md) to install `Dynamo` in your Kubernetes cluster.
### Mount Workspace and Kube Config
......
......@@ -23,7 +23,7 @@ Use the pre-configured test deployment with sample profiling data, we provide th
### Option B: Use Your Own Profiling Results
1. Run pre-deployment profiling for your specific setup. See the [pre-deployment profiling documentation](../../docs/pages/components/profiler/profiler-guide.md) for detailed instructions.
1. Run pre-deployment profiling for your specific setup. See the [pre-deployment profiling documentation](../../docs/components/profiler/profiler-guide.md) for detailed instructions.
## Interpolator Testing
......@@ -165,8 +165,8 @@ Test complete scaling behavior including Kubernetes deployment and load generati
**Prerequisites:**
- **[kube-prometheus-stack](../../docs/pages/kubernetes/observability/metrics.md) installed and running.** The SLA planner requires Prometheus to observe metrics and make scaling decisions.
- Ensure the Dynamo operator was installed with the Prometheus endpoint configured (see [SLA Planner Quickstart Guide](../../docs/pages/components/planner/planner-guide.md#prerequisites) for details).
- **[kube-prometheus-stack](../../docs/kubernetes/observability/metrics.md) installed and running.** The SLA planner requires Prometheus to observe metrics and make scaling decisions.
- Ensure the Dynamo operator was installed with the Prometheus endpoint configured (see [SLA Planner Quickstart Guide](../../docs/components/planner/planner-guide.md#prerequisites) for details).
**Prepare the test deployment manifest:**
......@@ -209,7 +209,7 @@ Remove `volumes` and `volumeMounts`:
- name: planner-profile-data
configMap:
# Must be pre-created before deployment by the profiler
# See docs/pages/components/planner/planner-guide.md for more details
# See docs/components/planner/planner-guide.md for more details
name: planner-profile-data
```
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment