docs: restructure docs directory and move fern config to fern/ (#6700)

Signed-off-by: Neal Vaidya <nealv@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

docs: restructure docs directory and move fern config to fern/ (#6700)
Signed-off-by: Neal Vaidya <nealv@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
ece08dc9 · Neal Vaidya · GitHub · 1412e44b · ece08dc9 · ece08dc9
Unverified Commit ece08dc9 authored Mar 01, 2026 by Neal Vaidya Committed by GitHub Mar 01, 2026
13 changed files
--- a/lib/bench/src/bin/README.md
+++ b/lib/bench/src/bin/README.md
@@ -101,4 +101,4 @@ request arrives.
 4. The KV router routes the speculative request to the same worker, warming its cache.
 5. When the real next-turn request arrives, the KV router sees high cache overlap on that worker and routes there, yielding a much lower TTFT.

-See also: [Agent Hints documentation](../../../../docs/pages/components/router/agent-hints.md)
+See also: [Agent Hints documentation](../../../../docs/components/router/agent-hints.md)
--- a/lib/bindings/kvbm/README.md
+++ b/lib/bindings/kvbm/README.md
@@ -35,7 +35,7 @@ The Dynamo KVBM is a distributed KV-cache block management system designed for s
 pip install kvbm
 ```

-See the [support matrix](../../../docs/pages/reference/support-matrix.md) for version compatibility questions.
+See the [support matrix](../../../docs/reference/support-matrix.md) for version compatibility questions.

 ## Build from Source

@@ -115,7 +115,7 @@ DYN_KVBM_CPU_CACHE_GB=100 vllm serve \
  Qwen/Qwen3-8B
 ```

-For more detailed integration with dynamo, disaggregated serving support and benchmarking, please check [vllm-setup](../../../docs/pages/components/kvbm/kvbm-guide.md#run-kvbm-in-dynamo-with-vllm)
+For more detailed integration with dynamo, disaggregated serving support and benchmarking, please check [vllm-setup](../../../docs/components/kvbm/kvbm-guide.md#run-kvbm-in-dynamo-with-vllm)

 ### TensorRT-LLM

@@ -137,11 +137,11 @@ DYN_KVBM_CPU_CACHE_GB=100 trtllm-serve Qwen/Qwen3-8B \
  --extra_llm_api_options /tmp/kvbm_llm_api_config.yaml
 ```

-For more detailed integration with dynamo and benchmarking, please check [trtllm-setup](../../../docs/pages/components/kvbm/kvbm-guide.md#run-kvbm-in-dynamo-with-tensorrt-llm)
+For more detailed integration with dynamo and benchmarking, please check [trtllm-setup](../../../docs/components/kvbm/kvbm-guide.md#run-kvbm-in-dynamo-with-tensorrt-llm)


 ## 📚 Docs

- [Architecture](../../../docs/pages/components/kvbm/README.md#architecture)
- [Design Deepdive](../../../docs/pages/design-docs/kvbm-design.md)
+- [Architecture](../../../docs/components/kvbm/README.md#architecture)
+- [Design Deepdive](../../../docs/design-docs/kvbm-design.md)
 - [NIXL Overview](https://github.com/ai-dynamo/nixl/blob/main/docs/nixl.md)
--- a/lib/bindings/python/README.md
+++ b/lib/bindings/python/README.md
@@ -50,7 +50,7 @@ maturin develop --uv

 ### Prerequisite

-See [README.md](../../../docs/pages/development/runtime-guide.md#prerequisites).
+See [README.md](../../../docs/development/runtime-guide.md#prerequisites).

 ### Hello World Example


--- a/lib/llm/src/kv_router/metrics.rs
+++ b/lib/llm/src/kv_router/metrics.rs
@@ -36,7 +36,7 @@
 //! (`dynamo_component_inflight_requests`, `dynamo_component_requests_total`, etc.)
 //! via the system status server when `DYN_SYSTEM_PORT` is explicitly set.
 //!
-//! See also: `docs/pages/observability/metrics.md` (Router Metrics section).
+//! See also: `docs/observability/metrics.md` (Router Metrics section).

 use std::sync::{Arc, LazyLock, OnceLock};
 use std::time::Duration;

--- a/recipes/README.md
+++ b/recipes/README.md
@@ -3,7 +3,7 @@
 Production-tested Kubernetes deployment recipes for LLM inference using NVIDIA Dynamo.

 > **Prerequisites:** This guide assumes you have already installed the Dynamo Kubernetes Platform.
-> If not, follow the **[Kubernetes Deployment Guide](../docs/pages/kubernetes/README.md)** first.
+> If not, follow the **[Kubernetes Deployment Guide](../docs/kubernetes/README.md)** first.

 ## Available Recipes

@@ -67,8 +67,8 @@ Each complete recipe follows this standard structure:

 The recipes require the Dynamo Kubernetes Platform to be installed. Follow the installation guide:

- **[Kubernetes Deployment Guide](../docs/pages/kubernetes/README.md)** - Quickstart (~10 minutes)
- **[Detailed Installation Guide](../docs/pages/kubernetes/installation-guide.md)** - Advanced options
+- **[Kubernetes Deployment Guide](../docs/kubernetes/README.md)** - Quickstart (~10 minutes)
+- **[Detailed Installation Guide](../docs/kubernetes/installation-guide.md)** - Advanced options

 **2. GPU Cluster Requirements**

@@ -289,18 +289,18 @@ image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:x.y.z
 - Review pod logs: `kubectl logs <pod-name> -n ${NAMESPACE}`

 **For more troubleshooting:**
- [Kubernetes Deployment Guide](../docs/pages/kubernetes/README.md#troubleshooting)
- [Observability Documentation](../docs/pages/kubernetes/observability/)
+- [Kubernetes Deployment Guide](../docs/kubernetes/README.md#troubleshooting)
+- [Observability Documentation](../docs/kubernetes/observability/)

 ## Related Documentation

- **[Kubernetes Deployment Guide](../docs/pages/kubernetes/README.md)** - Platform installation and concepts
- **[API Reference](../docs/pages/kubernetes/api-reference.md)** - DynamoGraphDeployment CRD specification
- **[vLLM Backend Guide](../docs/pages/backends/vllm/README.md)** - vLLM-specific features
- **[SGLang Backend Guide](../docs/pages/backends/sglang/README.md)** - SGLang-specific features
- **[TensorRT-LLM Backend Guide](../docs/pages/backends/trtllm/README.md)** - TensorRT-LLM features
- **[Observability](../docs/pages/kubernetes/observability/)** - Monitoring and logging
- **[Benchmarking Guide](../docs/pages/benchmarks/benchmarking.md)** - Performance testing
+- **[Kubernetes Deployment Guide](../docs/kubernetes/README.md)** - Platform installation and concepts
+- **[API Reference](../docs/kubernetes/api-reference.md)** - DynamoGraphDeployment CRD specification
+- **[vLLM Backend Guide](../docs/backends/vllm/README.md)** - vLLM-specific features
+- **[SGLang Backend Guide](../docs/backends/sglang/README.md)** - SGLang-specific features
+- **[TensorRT-LLM Backend Guide](../docs/backends/trtllm/README.md)** - TensorRT-LLM features
+- **[Observability](../docs/kubernetes/observability/)** - Monitoring and logging
+- **[Benchmarking Guide](../docs/benchmarks/benchmarking.md)** - Performance testing

 ## Contributing


--- a/recipes/deepseek-r1/README.md
+++ b/recipes/deepseek-r1/README.md
@@ -13,7 +13,7 @@ Production-ready deployments for **DeepSeek-R1** (671B MoE) across multiple back

 ## Prerequisites

-1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/pages/kubernetes/README.md)
+1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/kubernetes/README.md)
 2. **GPU cluster** with H200 or GB200 GPUs matching the configuration requirements
 3. **HuggingFace token** with access to DeepSeek models
 4. **High-bandwidth networking** — InfiniBand or RoCE recommended for multi-node deployments

--- a/recipes/deepseek-r1/vllm/disagg/README.md
+++ b/recipes/deepseek-r1/vllm/disagg/README.md
@@ -13,7 +13,7 @@ This recipe deploys DeepSeek-R1 using vLLM in a disaggregated prefill/decode set
 ### 0) Prerequisites: Install the platform

 Follow the Kubernetes deployment guide to install the Dynamo platform and prerequisites (CRDs/operator, etc.):
- `docs/pages/kubernetes/README.md`
+- `docs/kubernetes/README.md`

 Ensure you have a GPU-enabled cluster with sufficient capacity (32x H100/H200 "Hopper" across 4 nodes), and that the NVIDIA GPU Operator is healthy.


--- a/recipes/llama-3-70b/README.md
+++ b/recipes/llama-3-70b/README.md
@@ -12,7 +12,7 @@ Production-ready deployments for **Llama-3.3-70B-Instruct** using vLLM with FP8

 ## Prerequisites

-1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/pages/kubernetes/README.md)
+1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/kubernetes/README.md)
 2. **GPU cluster** with H100 or H200 GPUs matching the configuration requirements
 3. **HuggingFace token** with access to Llama models


--- a/recipes/qwen3-235b-a22b-fp8/README.md
+++ b/recipes/qwen3-235b-a22b-fp8/README.md
@@ -11,7 +11,7 @@ Production-ready deployments for **Qwen3-235B-A22B** (MoE model with 22B active

 ## Prerequisites

-1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/pages/kubernetes/README.md)
+1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/kubernetes/README.md)
 2. **GPU cluster** with H100/H200 GPUs (high memory recommended)
 3. **HuggingFace token** with access to Qwen models


--- a/recipes/qwen3-32b-fp8/README.md
+++ b/recipes/qwen3-32b-fp8/README.md
@@ -11,7 +11,7 @@ Production-ready deployments for **Qwen3-32B** with FP8 quantization using Tenso

 ## Prerequisites

-1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/pages/kubernetes/README.md)
+1. **Dynamo Platform installed** — See [Kubernetes Deployment Guide](../../docs/kubernetes/README.md)
 2. **GPU cluster** with H100/H200/A100 GPUs
 3. **HuggingFace token** with access to Qwen models


--- a/recipes/qwen3-32b/README.md
+++ b/recipes/qwen3-32b/README.md
@@ -40,7 +40,7 @@ This workload is ideal for KV-aware routing—with 36.64% cache efficiency, requ

 ## Prerequisites

-1. **Dynamo Platform installed** - See [Kubernetes Deployment Guide](../../docs/pages/kubernetes/README.md)
+1. **Dynamo Platform installed** - See [Kubernetes Deployment Guide](../../docs/kubernetes/README.md)
 2. **16x H200 GPUs** across 2 nodes
 3. **HuggingFace token** configured:
   ```bash

--- a/tests/fault_tolerance/deploy/README.md
+++ b/tests/fault_tolerance/deploy/README.md
@@ -655,7 +655,7 @@ graph LR

 ### Install Dynamo Platform

-Follow the [instructions](../../../docs/pages/kubernetes/installation-guide.md) to install `Dynamo` in your Kubernetes cluster.
+Follow the [instructions](../../../docs/kubernetes/installation-guide.md) to install `Dynamo` in your Kubernetes cluster.

 ### Mount Workspace and Kube Config


--- a/tests/planner/README.md
+++ b/tests/planner/README.md
@@ -23,7 +23,7 @@ Use the pre-configured test deployment with sample profiling data, we provide th

 ### Option B: Use Your Own Profiling Results

-1. Run pre-deployment profiling for your specific setup. See the [pre-deployment profiling documentation](../../docs/pages/components/profiler/profiler-guide.md) for detailed instructions.
+1. Run pre-deployment profiling for your specific setup. See the [pre-deployment profiling documentation](../../docs/components/profiler/profiler-guide.md) for detailed instructions.

 ## Interpolator Testing

@@ -165,8 +165,8 @@ Test complete scaling behavior including Kubernetes deployment and load generati

 **Prerequisites:**

- **[kube-prometheus-stack](../../docs/pages/kubernetes/observability/metrics.md) installed and running.** The SLA planner requires Prometheus to observe metrics and make scaling decisions.
- Ensure the Dynamo operator was installed with the Prometheus endpoint configured (see [SLA Planner Quickstart Guide](../../docs/pages/components/planner/planner-guide.md#prerequisites) for details).
+- **[kube-prometheus-stack](../../docs/kubernetes/observability/metrics.md) installed and running.** The SLA planner requires Prometheus to observe metrics and make scaling decisions.
+- Ensure the Dynamo operator was installed with the Prometheus endpoint configured (see [SLA Planner Quickstart Guide](../../docs/components/planner/planner-guide.md#prerequisites) for details).

 **Prepare the test deployment manifest:**

@@ -209,7 +209,7 @@ Remove `volumes` and `volumeMounts`:
          - name: planner-profile-data
            configMap:
              # Must be pre-created before deployment by the profiler
-              # See docs/pages/components/planner/planner-guide.md for more details
+              # See docs/components/planner/planner-guide.md for more details
              name: planner-profile-data
 ```