docs: clean up toctree navigation and add disaggregated serving guide (#6024)

Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>

docs: clean up toctree navigation and add disaggregated serving guide (#6024)
Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
07db5895 · dagil-nvidia · GitHub · 219e5c45 · 07db5895 · 07db5895
Unverified Commit 07db5895 authored Feb 06, 2026 by dagil-nvidia Committed by GitHub Feb 06, 2026
10 changed files
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -50,26 +50,24 @@ Quickstart
   :hidden:
   :caption: Kubernetes Deployment
-   Deployment Guide <_sections/k8s_deployment>
+   Deployment Guide <kubernetes/README>
-   Observability (K8s) <_sections/k8s_observability>
+   Observability (K8s) <kubernetes/observability/metrics>
-   Multinode <_sections/k8s_multinode>
+   Multinode <kubernetes/deployment/multinode-deployment>
 .. toctree::
   :hidden:
   :caption: User Guides
+   KV Cache Aware Routing <components/router/router_guide.md>
+   Disaggregated Serving Guide <features/disaggregated_serving/README.md>
   KV Cache Offloading <components/kvbm/kvbm_guide.md>
-   KV Aware Routing <components/router/router_guide.md>
+   Benchmarking <benchmarks/benchmarking.md>
-   Tool Calling <agents/tool-calling.md>
   Multimodality Support <features/multimodal/README.md>
+   Tool Calling <agents/tool-calling.md>
   LoRA Adapters <features/lora/README.md>
-   Finding Best Initial Configs <performance/aiconfigurator.md>
+   Observability (Local) <observability/README>
-   Benchmarking <benchmarks/benchmarking.md>
+   Fault Tolerance <fault_tolerance/README>
-   Tuning Disaggregated Performance <performance/tuning.md>
   Writing Python Workers in Dynamo <development/backend-guide.md>
-   Observability (Local) <_sections/observability>
-   Fault Tolerance <_sections/fault_tolerance>
-   Glossary <reference/glossary.md>
 .. toctree::
   :hidden:
@@ -82,6 +80,15 @@ Quickstart
   Profiler <components/profiler/README>
   KVBM <components/kvbm/README>
+.. toctree::
+   :hidden:
+   :caption: Integrations
+   LMCache <integrations/lmcache_integration.md>
+   SGLang HiCache <integrations/sglang_hicache.md>
+   FlexKV <integrations/flexkv_integration.md>
+   KV Events for Custom Engines <integrations/kv_events_custom_engines.md>
 .. toctree::
   :hidden:
   :caption: Design Docs
@@ -90,7 +97,8 @@ Quickstart
   Architecture Flow <design_docs/dynamo_flow.md>
   Disaggregated Serving <design_docs/disagg_serving.md>
   Distributed Runtime <design_docs/distributed_runtime.md>
-   Router Design <design_docs/router_design.md>
   Request Plane <design_docs/request_plane.md>
   Event Plane <design_docs/event_plane.md>
+   Router Design <design_docs/router_design.md>
+   KVBM Design <design_docs/kvbm_design.md>
   Planner Design <design_docs/planner_design.md>
--- a/docs/kubernetes/README.md
+++ b/docs/kubernetes/README.md
@@ -251,3 +251,16 @@ Key customization points include:
 - **[Grove](/docs/kubernetes/grove.md)** - For grove details and custom installation
 - **[Monitoring](/docs/kubernetes/observability/metrics.md)** - For monitoring setup
 - **[Model Caching with Fluid](/docs/kubernetes/model_caching_with_fluid.md)** - For model caching with Fluid
+```{toctree}
+:hidden:
+Detailed Installation Guide <installation_guide>
+Dynamo Operator <dynamo_operator>
+Service Discovery <service_discovery>
+Webhooks <webhooks>
+Minikube Setup <deployment/minikube>
+Managing Models with DynamoModel <deployment/dynamomodel-guide>
+Autoscaling <autoscaling>
+Checkpointing <chrek/README>
+```
--- a/docs/kubernetes/deployment/multinode-deployment.md
+++ b/docs/kubernetes/deployment/multinode-deployment.md
@@ -306,3 +306,9 @@ For additional support and examples, see the working multinode configurations in
 - **vLLM**: [examples/backends/vllm/deploy/](../../../examples/backends/vllm/deploy/)
 These examples demonstrate proper usage of the `multinode` section with corresponding `gpu` limits and correct `tp-size` configuration.
+```{toctree}
+:hidden:
+Grove <../grove>
+```
--- a/docs/kubernetes/observability/metrics.md
+++ b/docs/kubernetes/observability/metrics.md
@@ -178,3 +178,10 @@ Once logged in, find the Dynamo dashboard under General.
 > **Note:** The metrics described above are for Dynamo **applications** (frontends, workers). The Dynamo **Operator** itself also exposes metrics for monitoring controller reconciliation, webhook validation, and resource inventory.
 >
 > See the **[Operator Metrics Guide](operator-metrics.md)** for details on operator-specific metrics and the operator dashboard.
+```{toctree}
+:hidden:
+Logging <logging>
+Operator Metrics <operator-metrics>
+```
--- a/docs/observability/README.md
+++ b/docs/observability/README.md
@@ -97,3 +97,14 @@ The following configuration files are located in the `deploy/observability/` dir
 - [grafana_dashboards/dcgm-metrics.json](../../deploy/observability/grafana_dashboards/dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics
 - [grafana_dashboards/kvbm.json](../../deploy/observability/grafana_dashboards/kvbm.json): Contains Grafana dashboard configuration for KVBM metrics
+```{toctree}
+:hidden:
+Prometheus + Grafana Setup <prometheus-grafana>
+Metrics <metrics>
+Metrics Developer Guide <metrics-developer-guide>
+Health Checks <health-checks>
+Tracing <tracing>
+Logging <logging>
+```
--- a/docs/performance/aiconfigurator.md
+++ b/docs/performance/aiconfigurator.md
-<!--
-SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-SPDX-License-Identifier: Apache-2.0
-->
-# Finding Best Initial Configs using AIConfigurator
-[AIConfigurator](https://github.com/ai-dynamo/aiconfigurator/tree/main) is a performance optimization tool that helps you find the optimal configuration for deploying LLMs with Dynamo. It automatically determines the best number of prefill and decode workers, parallelism settings, and deployment parameters to meet your SLA targets while maximizing throughput.
-## Why Use AIConfigurator?
-When deploying LLMs with Dynamo, you need to make several critical decisions:
- **Aggregated vs Disaggregated**: Which architecture gives better performance for your workload?
- **Worker Configuration**: How many prefill and decode workers to deploy?
- **Parallelism Settings**: What tensor/pipeline parallel configuration to use?
- **SLA Compliance**: How to meet your TTFT and TPOT targets?
-AIConfigurator answers these questions in seconds, providing:
- Optimal configurations that meet your SLA requirements
- Ready-to-deploy Dynamo configuration files
- Performance comparisons between different deployment strategies
- Up to 1.7x better throughput compared to manual configuration
-## Quick Start
-```bash
-# Install
-pip3 install aiconfigurator
-# Find optimal configuration
-aiconfigurator cli default \
-  --model QWEN3_32B \        # Model name (QWEN3_32B, LLAMA3.1_70B, etc.)
-  --total_gpus 32 \          # Number of available GPUs
-  --system h200_sxm \        # GPU type (h100_sxm, h200_sxm, a100_sxm)
-  --isl 4000 \               # Input sequence length (tokens)
-  --osl 500 \                # Output sequence length (tokens)
-  --ttft 300 \               # Target Time To First Token (ms)
-  --tpot 10 \                # Target Time Per Output Token (ms)
-  --save_dir ./dynamo-configs
-# Deploy
-kubectl apply -f ./dynamo-configs/disagg/top1/disagg/k8s_deploy.yaml
-```
-## Example Output
-```text
-********************************************************************************
-*                     Dynamo aiconfigurator Final Results                      *
-********************************************************************************
-  ----------------------------------------------------------------------------
-  Input Configuration & SLA Target:
-    Model: QWEN3_32B (is_moe: False)
-    Total GPUs: 32
-    Best Experiment Chosen: disagg at 812.92 tokens/s/gpu (1.70x better)
-  ----------------------------------------------------------------------------
-  Overall Best Configuration:
-    - Best Throughput: 812.92 tokens/s/gpu
-    - User Throughput: 120.23 tokens/s/user
-    - TTFT: 276.76ms
-    - TPOT: 8.32ms
-  ----------------------------------------------------------------------------
-  Pareto Frontier:
-               QWEN3_32B Pareto Frontier: tokens/s/gpu vs tokens/s/user
-      ┌────────────────────────────────────────────────────────────────────────┐
-1600.0┤ •• disagg                                                              │
-      │ ff agg                                                                 │
-      │ xx disagg best                                                         │
-      │                                                                        │
-1333.3┤   f                                                                    │
-      │   ff                                                                   │
-      │     ff    •                                                            │
-      │       f   ••••••••                                                     │
-1066.7┤        f         ••                                                    │
-      │         fff       ••••••••                                             │
-      │            f              ••                                           │
-      │            f                ••••                                       │
- 800.0┤            fffff                •••x                                   │
-      │                 fff                 ••                                 │
-      │                   fff                •                                 │
-      │                     fffff             ••                               │
- 533.3┤                         ffff            ••                             │
-      │                             ffff          ••                           │
-      │                                 fffffff     •••••                      │
-      │                                        ffffff    ••                    │
- 266.7┤                                              fffff •••••••••           │
-      │                                                   ffffffffff           │
-      │                                                             f          │
-      │                                                                        │
-   0.0┤                                                                        │
-      └┬─────────────────┬─────────────────┬────────────────┬─────────────────┬┘
-       0                60                120              180              240
-tokens/s/gpu                         tokens/s/user
-1. **Performance Comparison**: Shows disaggregated vs aggregated serving performance
-2. **Optimal Configuration**: The best configuration that meets your SLA targets
-3. **Deployment Files**: Ready-to-use Dynamo configuration files
-## Key Features
-### Fast Profiling Integration
-```bash
-# Use with Dynamo's SLA planner (20-30 seconds vs hours)
-python3 -m benchmarks.profiler.profile_sla \
-   --config ./examples/backends/trtllm/deploy/disagg.yaml \
-   --backend trtllm \
-   --use-ai-configurator \
-   --aic-system h200_sxm \
-   --aic-model-name QWEN3_32B
-```
-### Custom Configuration
-```bash
-# For advanced users: define custom search space
-aiconfigurator cli exp --yaml_path custom_config.yaml
-```
-## Common Use Cases
-```bash
-# Strict SLAs (low latency)
-aiconfigurator cli default --model QWEN2.5_7B --total_gpus 8 --system h200_sxm --ttft 100 --tpot 5
-# High throughput (relaxed latency)
-aiconfigurator cli default --model QWEN3_32B --total_gpus 32 --system h200_sxm --ttft 1000 --tpot 50
-```
-## Supported Configurations
-**Models**: GPT, LLAMA2/3, QWEN2.5/3, Mixtral, DEEPSEEK_V3
-**GPUs**: H100, H200, A100, B200 (preview), GB200 (preview)
-**Backend**: TensorRT-LLM (vLLM and SGLang coming soon)
-## Additional Options
-```bash
-# Web interface
-aiconfigurator webapp  # Visit http://127.0.0.1:7860
-# Docker
-docker run -it --rm nvcr.io/nvidia/aiconfigurator:latest \
-  aiconfigurator cli default --model LLAMA3.1_70B --total_gpus 16 --system h100_sxm
-```
-## Troubleshooting
-**Model name mismatch**: Use exact model name that matches your deployment
-**GPU allocation**: Verify available GPUs match `--total_gpus`
-**Performance variance**: Results are estimates - benchmark actual deployment
-## Learn More
- [Dynamo Installation Guide](/docs/kubernetes/installation_guide.md)
- [SLA Planner Guide](/docs/components/planner/planner_guide.md)
- [Benchmarking Guide](/docs/benchmarks/benchmarking.md)
\ No newline at end of file
--- a/docs/templates/EXAMPLE_SKILL.md
+++ b/docs/templates/EXAMPLE_SKILL.md
---
-orphan: true
---
-# Documentation Migration Skill
-This file is a ready-to-use Cursor skill for AI-assisted documentation migration.
---
-## How to Use This Skill
-### Option 1: Cursor IDE
-1. Create the skill directory:
-   ```bash
-   mkdir -p .cursor/skills/docs-migration
-   ```
-2. Copy this file:
-   ```bash
-   cp docs/templates/EXAMPLE_SKILL.md .cursor/skills/docs-migration/SKILL.md
-   ```
-3. Remove the `orphan: true` header and this "How to Use" section
-4. The skill will be available when working on documentation migration
-### Option 2: Claude (or any AI)
-1. Copy everything below the separator line (`---`)
-2. Paste into your conversation as context
-3. Ask the AI: "Help me migrate the [component] documentation to the new structure"
---
-## Skill Content
-Copy everything below this line for use as an AI prompt:
---
-name: docs-migration
-description: Migrate Dynamo documentation to the 9-category hierarchy. Use when migrating components, backends, features, or other docs to the new structure.
---
-# Documentation Migration
-This skill guides you through migrating Dynamo documentation to the new 9-category hierarchy.
-## Inputs
-| Input | Required | Description |
-|-------|----------|-------------|
-| Component/Topic | Yes | What to migrate (e.g., "planner", "kubernetes", "multimodal") |
-| Source Path | Yes | Current location (e.g., `docs/planner/`) |
-| Target Category | Yes | One of: components, backends, features, deploy, performance, infrastructure, integrations |
-## Directory Hierarchy
-```
-docs/
-├── components/          # Router, Planner, KVBM, Frontend, Profiler
-├── backends/            # vLLM, SGLang, TRT-LLM
-├── features/            # Multimodal, LoRA, Speculative Decoding
-├── deploy/              # Kubernetes, Helm, Operator
-├── performance/         # Tuning, Benchmarks
-├── infrastructure/      # Observability, Fault Tolerance, Development
-├── integrations/        # LMCache, HiCache, NIXL
-├── reference/           # CLI, Glossary, Support Matrix
-└── design_docs/         # Tier 3 design documents
-```
---
-## Phase 1: Analyze Existing Docs
-### Step 1.1: Inventory Current Files
-```bash
-# List existing documentation
-ls -la docs/<source_path>/
-# Count lines in each file
-wc -l docs/<source_path>/*.md
-```
-### Step 1.2: Categorize Content
-For each file, identify content type:
-| Category | Target File | Description |
-|----------|-------------|-------------|
-| Overview | README.md | Component description, feature matrix |
-| Quick Start | README.md | Minimal steps to get running |
-| Deployment | `<name>_guide.md` | Setup, prerequisites, container images |
-| Configuration | `<name>_guide.md` | CLI args, env vars, config files |
-| Integration | `<name>_guide.md` | Connecting to other components |
-| Troubleshooting | `<name>_guide.md` | Common issues and fixes |
-| Examples | `<name>_examples.md` | Code samples, YAML configs |
-| Architecture | `<name>_design.md` | Design decisions, algorithms |
-**>>> STOP: Share your analysis. Ask if there are content priorities or known issues.**
---
-## Phase 2: Create Migration Mapping
-### Step 2.1: Document Current → Target Mapping
-Create a mapping showing where each section will move:
-```markdown
-## Content Migration Mapping
-### README.md (Tier 2)
-| New Section | Source | Est. Lines |
-|-------------|--------|------------|
-| Overview | source_file.md → Section | X |
-| Feature Matrix | source_file.md → Section | X |
-| Quick Start | source_file.md → Section | X |
-| Next Steps | New | 10 |
-### <name>_guide.md (Tier 2)
-| New Section | Source | Est. Lines |
-|-------------|--------|------------|
-| Deployment | source_file.md → Section | X |
-| Configuration | source_file.md → Section | X |
-| Integration | source_file.md → Section | X |
-| Troubleshooting | source_file.md → Section | X |
-```
-**>>> STOP: Share mapping. Ask if any content should be prioritized or excluded.**
---
-## Phase 3: Create File Structure
-### Step 3.1: Create Target Directory
-```bash
-# For components
-mkdir -p docs/components/<name>
-# For other categories
-mkdir -p docs/<category>/<name>
-```
-### Step 3.2: Create Files
-```bash
-touch docs/<category>/<name>/README.md
-touch docs/<category>/<name>/<name>_guide.md
-touch docs/<category>/<name>/<name>_examples.md
-touch docs/design_docs/<name>_design.md
-```
-### Step 3.3: Create Tier 1 Stub (Components Only)
-For components, create redirect stub:
-```markdown
-# Dynamo <Component>
-<One-sentence description.>
-See `docs/components/<name>/` for documentation.
-```
---
-## Phase 4: Migrate Content
-### Step 4.1: Use Templates
-Reference templates in `docs/templates/`:
- `component_readme.md` - Tier 2 README
- `component_guide.md` - Tier 2 Guide
- `component_examples.md` - Tier 2 Examples
- `component_design.md` - Tier 3 Design
-### Step 4.2: Preserve All Content
- Copy content exactly unless errors exist
- Preserve code examples
- Preserve diagrams (Mermaid, images)
- Update internal links to new paths
-**>>> STOP: Share migrated documents. Ask if content is complete.**
---
-## Phase 5: Update Links
-### Step 5.1: Find Files Linking to Old Path
-```bash
-# Find all files with links to old path
-rg -l "docs/<old_path>" docs/ fern/
-# Find RST cross-references
-rg ":doc:\`.*<old_path>" docs/
-# Find relative markdown links
-rg "\]\(.*<old_path>" docs/
-```
-### Step 5.2: Update Sphinx Navigation
-1. **index.rst** - Update toctree entries:
-   ```rst
-   .. toctree::
-      Page Title <../new/path/file>
-   ```
-2. **_sections/*.rst** - Update section toctrees
-3. **conf.py** - Add redirect for moved files:
-   ```python
-   redirects = {
-       "old/path/file": "../new/path/file.html",
-   }
-   ```
-### Step 5.3: Update Fern Navigation
-1. **versions/next.yml** - Update page paths:
-   ```yaml
-   - page: Page Title
-     path: ../pages/new/path.md
-   ```
-2. **Move files** in `fern/pages/` to match new structure
-### Step 5.4: Update Cross-References in Other Docs
-For each file found in Step 5.1:
- Update relative paths to new locations
- Verify links work
-**>>> STOP: Share link update summary. List files modified.**
---
-## Phase 6: Edit for Style
-Review migrated documents for FLOW, STYLE, and CONSISTENCY.
-**Do NOT change content meaning - only improve presentation.**
-### Step 6.1: Review Checklist
-For each document:
-**FLOW:**
- [ ] First paragraph states what the doc covers
- [ ] Sections ordered: Overview → Setup → Usage → Troubleshooting
- [ ] No orphaned paragraphs (single sentences between sections)
-**STYLE:**
- [ ] Instructions use active voice ("Run", "Create", "Add")
- [ ] No redundant phrases ("To" not "In order to")
- [ ] Sentences ≤25 words
-**CONSISTENCY:**
- [ ] Component names: vLLM, SGLang, TensorRT-LLM
- [ ] Status indicators: ✅ 🚧 ❌
- [ ] Heading hierarchy: # → ## → ### (no skips)
- [ ] Code blocks specify language
-### Step 6.2: Generate Suggested Edits
-Present suggestions using FLAG format:
-```markdown
---
-### FLAG: [FLOW|STYLE|CONSISTENCY] - [Brief Description]
-**File:** `path/to/file.md`
-**Line(s):** X-Y
-**Current:**
-> [Original text]
-**Suggested:**
-> [Improved text]
-**Reasoning:** [Why this improves flow/style/consistency]
---
-```
-### Step 6.3: Apply Approved Edits
-After user reviews:
- Apply approved edits only
- Skip rejected suggestions
- Document patterns for future reference
-**>>> STOP: Share suggested edits. Ask which to apply.**
---
-## Phase 7: Validate and Cleanup
-### Step 7.1: Validation Checklist
-```
-Validation for: [COMPONENT_NAME]
- [ ] All content from original files preserved
- [ ] No broken links (test with docs build)
- [ ] Feature matrix matches current capabilities
- [ ] Code examples are correct
- [ ] Mermaid diagrams render
- [ ] Navigation links work between files
- [ ] Sphinx toctree updated
- [ ] Fern navigation updated
- [ ] conf.py redirects added
-```
-### Step 7.2: Test Docs Build
-```bash
-# Build Sphinx docs
-cd docs && make html
-# Check for warnings about missing references
-```
-### Step 7.3: Cleanup Old Files
-After validation and approval:
-1. Delete original files
-2. Keep deprecated files with deprecation notice if needed
-3. Commit changes
-**>>> STOP: Share validation results. Ask for approval before deleting originals.**
---
-## Category-Specific Notes
-### Components (Router, Planner, KVBM, Frontend, Profiler)
- Target: `docs/components/<name>/`
- Requires Tier 1 stub in `components/src/dynamo/<name>/README.md`
- Tier 3 design doc in `docs/design_docs/<name>_design.md`
-### Backends (vLLM, SGLang, TRT-LLM)
- Target: `docs/backends/<name>/`
- Tier 3 is external (upstream project docs)
- Create `docs/backends/README.md` for backend comparison
-### Deploy (Kubernetes)
- Target: `docs/deploy/`
- Flat structure (no subdirectories per topic)
- Examples go in `docs/deploy/examples/`
-### Performance
- Target: `docs/performance/`
- Includes tuning and benchmarks (merged)
- Flat structure
-### Infrastructure (Observability, Fault Tolerance, Development)
- Target: `docs/infrastructure/<topic>/`
- Subdirectory per topic
- Development guides for contributors
--- a/docs/templates/MIGRATION_GUIDE.md
+++ b/docs/templates/MIGRATION_GUIDE.md
---
-orphan: true
---
-# Documentation Migration Guide
-This guide covers migrating Dynamo documentation to the new 9-category hierarchy.
---
-## Directory Hierarchy
-Documentation is organized into 9 top-level categories:
-```
-docs/
-├── components/          # Router, Planner, KVBM, Frontend, Profiler
-├── backends/            # vLLM, SGLang, TRT-LLM
-├── features/            # Multimodal, LoRA, Speculative Decoding
-├── deploy/              # Kubernetes, Helm, Operator
-├── performance/         # Tuning, Benchmarks
-├── infrastructure/      # Observability, Fault Tolerance, Development
-├── integrations/        # LMCache, HiCache, NIXL
-├── reference/           # CLI, Glossary, Support Matrix
-└── design_docs/         # Tier 3 design documents
-```
---
-## Category Reference
-| Category | Location | Content Type |
-|----------|----------|--------------|
-| **Components** | `docs/components/<name>/` | Standalone deployable services (Router, Planner, KVBM, Frontend, Profiler) |
-| **Backends** | `docs/backends/<name>/` | LLM inference engine integrations (vLLM, SGLang, TRT-LLM) |
-| **Features** | `docs/features/<name>/` | Cross-cutting capabilities (Multimodal, LoRA, Speculative Decoding) |
-| **Deploy** | `docs/deploy/` | Kubernetes deployment, operator, Helm charts |
-| **Performance** | `docs/performance/` | Performance tuning, benchmarking, profiling |
-| **Infrastructure** | `docs/infrastructure/<topic>/` | Observability, fault tolerance, development guides |
-| **Integrations** | `docs/integrations/<name>/` | External tool integrations (LMCache, HiCache, NIXL) |
-| **Reference** | `docs/reference/` | CLI reference, glossary, support matrix |
-| **Design** | `docs/design_docs/` | Architecture and algorithm documentation (Tier 3) |
---
-## Three-Tier Pattern
-Components and backends follow a three-tier documentation pattern:
-| Tier | Location | Purpose | Audience |
-|------|----------|---------|----------|
-| **Tier 1** | `components/src/dynamo/<name>/README.md` | Redirect stub (5 lines) | Developers browsing code |
-| **Tier 2** | `docs/<category>/<name>/` | User documentation | Users, operators |
-| **Tier 3** | `docs/design_docs/<name>_design.md` | Design documentation | Contributors |
---
-## Link Update Checklist
-When moving documentation, update links in these locations:
-### 1. Internal Markdown Links
-Find files linking to the old path:
-```bash
-# Find all files with links to old path
-rg -l "docs/old_path" docs/ fern/
-# Find relative markdown links
-rg "\]\(.*old_path" docs/
-```
-### 2. Sphinx Configuration
-**Files to update:**
-| File | Content |
-|------|---------|
-| `docs/index.rst` | Main toctree with section references |
-| `docs/_sections/*.rst` | Section toctrees (8 files) |
-| `docs/hidden_toctree.rst` | Orphaned pages not in main nav |
-| `docs/conf.py` | Redirects mapping (lines 40-98) |
-**Toctree syntax:**
-```rst
-.. toctree::
-   :hidden:
-   Page Title <../new/path/file>
-```
-**Add redirect in conf.py:**
-```python
-redirects = {
-    "old/path/file": "../new/path/file.html",
-}
-```
-### 3. Fern Configuration
-**Files to update:**
-| File | Content |
-|------|---------|
-| `fern/docs.yml` | Site config and version reference |
-| `fern/versions/next.yml` | Full navigation structure |
-**Navigation syntax:**
-```yaml
- page: Page Title
-  path: ../pages/new/path.md
-```
-**Move files in `fern/pages/` to match new structure.**
-### 4. RST Cross-References
-Find and update:
-```rst
-:doc:`../old/path/file`
-```
-### 5. Include Directives
-Check `docs/_includes/` for includes:
-```rst
-.. include:: ../old/path/file.rst
-```
---
-## Pre-Migration Link Validation
-Before migrating, validate source docs to avoid carrying over broken links.
-### Pre-flight Broken Link Check
-```bash
-# Install lychee (if not available)
-cargo install lychee   # or: brew install lychee
-# Check source files (example: migrating kvbm docs)
-lychee docs/kvbm/ --offline --exclude-path docs/_build
-# Or use the full check with external URLs
-lychee docs/kvbm/ --exclude-path docs/_build
-```
-If lychee is unavailable, use ripgrep to find potentially broken links:
-```bash
-# Find all internal markdown links and spot-check targets
-rg -n '\]\([^http][^)]*\.md' docs/kvbm/
-```
-### Golden Rule
-**Only link to files that exist.** Before adding any link:
-1. Verify the target file exists at the expected path
-2. Test the relative path calculation (count `../` correctly)
-3. For cross-section links, consider using the cross-reference path table
-### Post-Migration Validation
-After moving files, run link check again to catch broken references:
-```bash
-# Check all docs after migration
-lychee docs/ --offline --exclude-path docs/_build
-# Check specific migrated directory (example: after moving to components/kvbm)
-lychee docs/components/kvbm/ --offline
-```
---
-## Style Editing Guidelines
-After migrating content, review for FLOW, STYLE, and CONSISTENCY.
-**Do NOT change content meaning - only improve presentation.**
-### FLOW Rules
-| Rule | Description |
-|------|-------------|
-| Lead with the point (BLUF) | First paragraph states what the doc covers |
-| Logical section order | Overview → Setup → Usage → Troubleshooting |
-| One idea per paragraph | Split paragraphs with multiple topics |
-| No orphaned sentences | Avoid single sentences between sections |
-### STYLE Rules
-| Rule | Example |
-|------|---------|
-| Active voice for instructions | "Run the command" not "The command should be run" |
-| Consistent tense | All steps in present tense |
-| No redundant phrases | "To" not "In order to" |
-| Short sentences | Target ≤25 words |
-### CONSISTENCY Rules
-| Rule | Standard |
-|------|----------|
-| Component names | vLLM, SGLang, TensorRT-LLM (or TRT-LLM) |
-| Status indicators | ✅ Supported, 🚧 Experimental, ❌ Not Supported |
-| Heading hierarchy | # → ## → ### (no skips) |
-| Code block languages | Always specify (```python, ```bash, ```yaml) |
---
-## Related Files
- [SOURCE_TARGET_MAPPING.md](SOURCE_TARGET_MAPPING.md) - Comprehensive file-level source → target mapping
- [README.md](README.md) - Template overview and selection guide
- [EXAMPLE_SKILL.md](EXAMPLE_SKILL.md) - Cursor skill for AI-assisted migration
- Individual templates: `component_readme.md`, `component_guide.md`, etc.
--- a/docs/templates/README.md
+++ b/docs/templates/README.md
@@ -176,24 +176,3 @@ After adding new documentation:
 2. **Fern (future):** Update `fern/docs.yml` with your new pages
 See [docs/README.md](../README.md) for documentation build instructions.
-## Migrating Existing Docs
-For migrating existing documentation to the new structure:
-### Quick Reference
- [MIGRATION_GUIDE.md](MIGRATION_GUIDE.md) - Comprehensive migration guide with link update checklist
- [EXAMPLE_SKILL.md](EXAMPLE_SKILL.md) - AI-assisted migration skill (works with Cursor and Claude)
-### Using the AI Skill
-1. **Cursor IDE:** Copy `EXAMPLE_SKILL.md` to `.cursor/skills/docs-migration/SKILL.md`
-2. **Claude/Other AI:** Copy the skill content into your conversation as context
-3. Follow the phased approach with STOP points for review
-### Manual Migration
-1. Read [MIGRATION_GUIDE.md](MIGRATION_GUIDE.md) for the full process
-2. Use the link update checklist to update Sphinx and Fern navigation
-3. Apply style guidelines for FLOW, STYLE, and CONSISTENCY
--- a/docs/templates/SOURCE_TARGET_MAPPING.md
+++ b/docs/templates/SOURCE_TARGET_MAPPING.md
---
-orphan: true
---
-# Source-to-Target File Mapping
-This document provides a comprehensive file-level mapping from current documentation locations to the new hierarchy for Components, Backends, Features, and Integrations.
---
-## How to Use This Mapping
-1. Find the source file you want to migrate in the tables below
-2. Note the **Target** path and **Action** type
-3. Follow the [MIGRATION_GUIDE.md](MIGRATION_GUIDE.md) for link updates
-4. Use [EXAMPLE_SKILL.md](EXAMPLE_SKILL.md) for AI-assisted migration
-### Legend
-| Symbol | Content Type |
-|--------|--------------|
-| **O** | Overview - entry point, introduction |
-| **G** | Guide - step-by-step instructions |
-| **E** | Examples - code samples, templates |
-| **D** | Design - architecture, algorithms |
-| **R** | Reference - API specs, CLI docs |
-| Action | Description |
-|--------|-------------|
-| **Move** | Relocate file to new path |
-| **Merge** | Combine multiple files into one |
-| **Split** | Separate one file into multiple |
-| **Convert** | Transform RST to Markdown |
-| **Create** | New content needed |
-| **Extract** | Pull content from another file |
---
-## 1. Components
-### Router (1,334 lines)
-| Source | Lines | Type | Target | Action | Notes |
-|--------|-------|------|--------|--------|-------|
-| `docs/router/README.md` | 316 | O | `docs/components/router/README.md` | Move | Quick start, configuration |
-| `docs/router/kv_cache_routing.md` | 733 | G | `docs/components/router/router_guide.md` | Move | Deep technical guide |
-| `docs/router/kv_events.md` | 285 | G | `docs/components/router/router_guide.md` | Merge | Append to guide |
-**Tier 1 (In-Code):**
-| Source | Target | Action |
-|--------|--------|--------|
-| `components/src/dynamo/router/README.md` | Keep | Update link to `docs/components/router/` |
-### Planner (863 lines)
-| Source | Lines | Type | Target | Action | Notes |
-|--------|-------|------|--------|--------|-------|
-| `docs/planner/planner_intro.rst` | 82 | O | `docs/components/planner/README.md` | Convert | RST→MD, merge overview |
-| `docs/planner/sla_planner_quickstart.md` | 521 | G | `docs/components/planner/planner_guide.md` | Split | Guide + examples |
-| `docs/planner/sla_planner.md` | 203 | D | `docs/design_docs/planner_design.md` | Move | Architecture content |
-| `docs/planner/load_planner.md` | 57 | G | `docs/components/planner/load_planner.md` | Move | Keep as deprecated |
-**Tier 1 (In-Code):**
-| Source | Target | Action |
-|--------|--------|--------|
-| `components/src/dynamo/planner/README.md` | Keep | Update link to `docs/components/planner/` |
-### KVBM (972 lines)
-| Source | Lines | Type | Target | Action | Notes |
-|--------|-------|------|--------|--------|-------|
-| `docs/kvbm/kvbm_intro.rst` | 69 | O | `docs/components/kvbm/README.md` | Convert | RST→MD |
-| `docs/kvbm/kvbm_architecture.md` | 40 | O | `docs/components/kvbm/README.md` | Merge | Combine overviews |
-| `docs/kvbm/kvbm_components.md` | 71 | O | `docs/components/kvbm/README.md` | Merge | Combine overviews |
-| `docs/kvbm/kvbm_motivation.md` | 44 | O | `docs/components/kvbm/README.md` | Merge | Combine overviews |
-| `docs/kvbm/kvbm_integrations.md` | 45 | G | `docs/components/kvbm/kvbm_guide.md` | Move | Integration instructions |
-| `docs/kvbm/vllm-setup.md` | 195 | E | `docs/components/kvbm/kvbm_examples.md` | Merge | Combine examples |
-| `docs/kvbm/trtllm-setup.md` | 223 | E | `docs/components/kvbm/kvbm_examples.md` | Merge | Combine examples |
-| `docs/kvbm/kvbm_design_deepdive.md` | 262 | D | `docs/design_docs/kvbm_design.md` | Move | Design content |
-| `docs/kvbm/kvbm_reading.md` | 23 | R | `docs/components/kvbm/kvbm_guide.md` | Merge | Append references |
-| `docs/kvbm/kvbm_metrics_grafana.png` | — | — | `docs/components/kvbm/images/` | Move | Image asset |
-**Tier 1 (In-Code):**
-| Source | Target | Action |
-|--------|--------|--------|
-| `lib/bindings/kvbm/README.md` | Keep | Update link to `docs/components/kvbm/` |
-### Frontend (2,991 lines)
-| Source | Lines | Type | Target | Action | Notes |
-|--------|-------|------|--------|--------|-------|
-| `docs/frontends/kserve.md` | 99 | G | `docs/components/frontend/frontend_guide.md` | Move | KServe integration |
-| `docs/frontends/openapi.json` | 2,892 | R | `docs/reference/api/openapi.json` | Move | API spec to reference |
-| — | — | O | `docs/components/frontend/README.md` | Create | New overview needed |
-**Tier 1 (In-Code):**
-| Source | Target | Action |
-|--------|--------|--------|
-| `components/src/dynamo/http/README.md` | Create | Stub linking to `docs/components/frontend/` |
-### Profiler (New)
-| Source | Lines | Type | Target | Action | Notes |
-|--------|-------|------|--------|--------|-------|
-| — | — | O | `docs/components/profiler/README.md` | Create | New overview |
-| — | — | G | `docs/components/profiler/profiler_guide.md` | Create | New guide |
---
-## 2. Backends
-Backends remain at `docs/backends/<backend>/` with minimal structural changes.
-### vLLM
-| Source | Lines | Type | Target | Action | Notes |
-|--------|-------|------|--------|--------|-------|
-| `docs/backends/vllm/README.md` | — | O | Keep | — | No change |
-| `docs/backends/vllm/disagg.md` | — | G | Keep | — | No change |
-| `docs/backends/vllm/aggregated.md` | — | G | Keep | — | No change |
-| `docs/backends/vllm/speculative_decoding.md` | — | G | `docs/features/speculative_decoding/` | Extract | Move to features |
-| `docs/backends/vllm/LMCache_Integration.md` | — | G | `docs/integrations/lmcache/` | Extract | Move to integrations |
-### SGLang
-| Source | Lines | Type | Target | Action | Notes |
-|--------|-------|------|--------|--------|-------|
-| `docs/backends/sglang/README.md` | — | O | Keep | — | No change |
-| `docs/backends/sglang/disagg.md` | — | G | Keep | — | No change |
-| `docs/backends/sglang/aggregated.md` | — | G | Keep | — | No change |
-| `docs/backends/sglang/sgl-hicache-example.md` | — | G | `docs/integrations/hicache/` | Extract | Move to integrations |
-### TRT-LLM
-| Source | Lines | Type | Target | Action | Notes |
-|--------|-------|------|--------|--------|-------|
-| `docs/backends/trtllm/README.md` | — | O | Keep | — | No change |
-| `docs/backends/trtllm/disagg.md` | — | G | Keep | — | No change |
-| `docs/backends/trtllm/aggregated.md` | — | G | Keep | — | No change |
---
-## 3. Features
-### Multimodal (1,644 lines)
-| Source | Lines | Type | Target | Action | Notes |
-|--------|-------|------|--------|--------|-------|
-| `docs/multimodal/index.md` | 213 | O | `docs/features/multimodal/README.md` | Move | Rename to README |
-| `docs/multimodal/vllm.md` | 522 | G | `docs/features/multimodal/multimodal_vllm.md` | Move | |
-| `docs/multimodal/sglang.md` | 433 | G | `docs/features/multimodal/multimodal_sglang.md` | Move | |
-| `docs/multimodal/trtllm.md` | 476 | G | `docs/features/multimodal/multimodal_trtllm.md` | Move | |
-### Speculative Decoding (New)
-| Source | Lines | Type | Target | Action | Notes |
-|--------|-------|------|--------|--------|-------|
-| `docs/backends/vllm/speculative_decoding.md` | — | G | `docs/features/speculative_decoding/README.md` | Extract | From vLLM backend |
-### Agents (183 lines)
-| Source | Lines | Type | Target | Action | Notes |
-|--------|-------|------|--------|--------|-------|
-| `docs/agents/tool-calling.md` | 183 | G | `docs/features/agents/README.md` | Move | Agent/tool calling |
---
-## 4. Integrations
-### Extracted from Backends
-| Source | Lines | Type | Target | Action | Notes |
-|--------|-------|------|--------|--------|-------|
-| `docs/backends/vllm/LMCache_Integration.md` | — | G | `docs/integrations/lmcache/README.md` | Extract | From vLLM |
-| `docs/backends/sglang/sgl-hicache-example.md` | — | G | `docs/integrations/hicache/README.md` | Extract | From SGLang |
-### NIXL
-| Source | Lines | Type | Target | Action | Notes |
-|--------|-------|------|--------|--------|-------|
-| `docs/api/nixl_connect/*` | — | G | `docs/integrations/nixl/` | Move | Entire folder |
---
-## Summary Statistics
-### By Category
-| Category | Files | Lines | Actions |
-|----------|-------|-------|---------|
-| Components | 18 | ~3,200 | Move, Merge, Convert, Create |
-| Backends | 12 | varies | Extract only |
-| Features | 6 | ~1,800 | Move, Extract |
-| Integrations | 3+ | varies | Extract, Move |
-### By Action Type
-| Action | Count | Description |
-|--------|-------|-------------|
-| **Move** | ~15 | Simple relocation |
-| **Merge** | ~8 | Combine multiple files |
-| **Convert** | 2 | RST to Markdown |
-| **Extract** | 4 | Pull from other files |
-| **Create** | 4 | New content needed |
---
-## Related Files
- [MIGRATION_GUIDE.md](MIGRATION_GUIDE.md) - Link update checklist and style guidelines
- [EXAMPLE_SKILL.md](EXAMPLE_SKILL.md) - Cursor skill for AI-assisted migration
- [README.md](README.md) - Template overview and selection guide