docs: reorganizing documentation to make things clearer (#3658)

Signed-off-by: athreesh <anish.maddipoti@utexas.edu> Co-authored-by: Claude <noreply@anthropic.com>

docs: reorganizing documentation to make things clearer (#3658)
Signed-off-by: athreesh <anish.maddipoti@utexas.edu> Co-authored-by: Claude <noreply@anthropic.com>
598cbbb7 · Anish · GitHub · 34fc9693 · 598cbbb7 · 598cbbb7
Unverified Commit 598cbbb7 authored Oct 16, 2025 by Anish Committed by GitHub Oct 16, 2025
20 changed files
--- a/docs/backends/trtllm/README.md
+++ b/docs/backends/trtllm/README.md
@@ -55,9 +55,9 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 | [**Disaggregated Serving**](../../../docs/architecture/disagg_serving.md) | ✅ |  |
 | [**Conditional Disaggregation**](../../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | Not supported yet |
 | [**KV-Aware Routing**](../../../docs/architecture/kv_cache_routing.md) | ✅ |  |
-| [**SLA-Based Planner**](../../../docs/architecture/sla_planner.md) | ✅ |  |
-| [**Load Based Planner**](../../../docs/architecture/load_planner.md) | 🚧 | Planned |
-| [**KVBM**](../../../docs/architecture/kvbm_architecture.md) | 🚧 | Planned |
+| [**SLA-Based Planner**](../../../docs/planner/sla_planner.md) | ✅ |  |
+| [**Load Based Planner**](../../../docs/planner/load_planner.md) | 🚧 | Planned |
+| [**KVBM**](../../../docs/kvbm/kvbm_architecture.md) | 🚧 | Planned |

 ### Large Scale P/D and WideEP Features

@@ -308,4 +308,4 @@ For detailed instructions on running comprehensive performance sweeps across bot

 Dynamo with TensorRT-LLM currently supports integration with the Dynamo KV Block Manager. This integration can significantly reduce time-to-first-token (TTFT) latency, particularly in usage patterns such as multi-turn conversations and repeated long-context requests.

-Here is the instruction: [Running KVBM in TensorRT-LLM](./../../../docs/guides/run_kvbm_in_trtllm.md) .
+Here is the instruction: [Running KVBM in TensorRT-LLM](./../../../docs/kvbm/trtllm-setup.md) .
--- a/docs/backends/vllm/README.md
+++ b/docs/backends/vllm/README.md
@@ -38,9 +38,9 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 | [**Disaggregated Serving**](../../../docs/architecture/disagg_serving.md) | ✅ |  |
 | [**Conditional Disaggregation**](../../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP |
 | [**KV-Aware Routing**](../../../docs/architecture/kv_cache_routing.md) | ✅ |  |
-| [**SLA-Based Planner**](../../../docs/architecture/sla_planner.md) | ✅ |  |
-| [**Load Based Planner**](../../../docs/architecture/load_planner.md) | 🚧 | WIP |
-| [**KVBM**](../../../docs/architecture/kvbm_architecture.md) | ✅ |  |
+| [**SLA-Based Planner**](../../../docs/planner/sla_planner.md) | ✅ |  |
+| [**Load Based Planner**](../../../docs/planner/load_planner.md) | 🚧 | WIP |
+| [**KVBM**](../../../docs/kvbm/kvbm_architecture.md) | ✅ |  |
 | [**LMCache**](./LMCache_Integration.md) | ✅ |  |

 ### Large Scale P/D and WideEP Features

--- a/docs/backends/vllm/prometheus.md
+++ b/docs/backends/vllm/prometheus.md
@@ -10,7 +10,7 @@ When running vLLM through Dynamo, vLLM engine metrics are automatically passed t

 For the complete and authoritative list of all vLLM metrics, always refer to the official documentation linked above.

-Dynamo runtime metrics are documented in [docs/guides/metrics.md](../../guides/metrics.md).
+Dynamo runtime metrics are documented in [docs/observability/metrics.md](../../observability/metrics.md).

 ## Metric Reference

@@ -96,7 +96,7 @@ vllm:time_to_first_token_seconds_sum{model_name="meta-llama/Llama-3.1-8B"} 89.38
 - [vLLM GitHub - Metrics Implementation](https://github.com/vllm-project/vllm/tree/main/vllm/engine/metrics)

 ### Dynamo Metrics
- **Dynamo Metrics Guide**: See `docs/guides/metrics.md` for complete documentation on Dynamo runtime metrics
+- **Dynamo Metrics Guide**: See [docs/observability/metrics.md](../../observability/metrics.md) for complete documentation on Dynamo runtime metrics
 - **Dynamo Runtime Metrics**: Metrics prefixed with `dynamo_*` for runtime, components, endpoints, and namespaces
  - Implementation: `lib/runtime/src/metrics.rs` (Rust runtime metrics)
  - Metric names: `lib/runtime/src/metrics/prometheus_names.rs` (metric name constants)

--- a/docs/benchmarks/pre_deployment_profiling.md
+++ b/docs/benchmarks/pre_deployment_profiling.md
 # Pre-Deployment Profiling

 > [!TIP]
-> **New to SLA Planner?** For a complete workflow including profiling and deployment, see the [SLA Planner Quick Start Guide](/docs/kubernetes/sla_planner_quickstart.md).
+> **New to SLA Planner?** For a complete workflow including profiling and deployment, see the [SLA Planner Quick Start Guide](/docs/planner/sla_planner_quickstart.md).

 ## Profiling Script

@@ -99,7 +99,7 @@ SLA planner can work with any interpolation data that follows the above format.
 ## Detailed Kubernetes Profiling Instructions

 > [!TIP]
-> For a complete step-by-step workflow, see the [SLA Planner Quick Start Guide](/docs/kubernetes/sla_planner_quickstart.md).
+> For a complete step-by-step workflow, see the [SLA Planner Quick Start Guide](/docs/planner/sla_planner_quickstart.md).

 This section provides detailed technical information for advanced users who need to customize the profiling process.


--- a/docs/deploy/metrics/docker-compose.yml
+++ b/docs/deploy/metrics/docker-compose.yml
-../../../deploy/metrics/docker-compose.yml
\ No newline at end of file
--- a/docs/guides/backend.md
+++ b/docs/guides/backend.md
--- a/docs/runtime/README.md
+++ b/docs/runtime/README.md
--- a/docs/guides/tool_calling.md
+++ b/docs/guides/tool_calling.md
--- a/docs/hidden_toctree.rst
+++ b/docs/hidden_toctree.rst
@@ -11,18 +11,18 @@
   :maxdepth: 2
   :hidden:

-   runtime/README.md
-   API/nixl_connect/connector.md
-   API/nixl_connect/descriptor.md
-   API/nixl_connect/device.md
-   API/nixl_connect/device_kind.md
-   API/nixl_connect/operation_status.md
-   API/nixl_connect/rdma_metadata.md
-   API/nixl_connect/readable_operation.md
-   API/nixl_connect/writable_operation.md
-   API/nixl_connect/read_operation.md
-   API/nixl_connect/write_operation.md
-   API/nixl_connect/README.md
+   development/runtime-guide.md
+   api/nixl_connect/connector.md
+   api/nixl_connect/descriptor.md
+   api/nixl_connect/device.md
+   api/nixl_connect/device_kind.md
+   api/nixl_connect/operation_status.md
+   api/nixl_connect/rdma_metadata.md
+   api/nixl_connect/readable_operation.md
+   api/nixl_connect/writable_operation.md
+   api/nixl_connect/read_operation.md
+   api/nixl_connect/write_operation.md
+   api/nixl_connect/README.md

   kubernetes/api_reference.md
   kubernetes/create_deployment.md
@@ -32,14 +32,14 @@
   kubernetes/grove.md
   kubernetes/model_caching_with_fluid.md
   kubernetes/README.md
-   guides/dynamo_run.md
-   guides/metrics.md
-   guides/run_kvbm_in_vllm.md
-   guides/run_kvbm_in_trtllm.md
-   guides/tool_calling.md
+   reference/cli.md
+   observability/metrics.md
+   kvbm/vllm-setup.md
+   kvbm/trtllm-setup.md
+   guides/tool-calling.md

   architecture/kv_cache_routing.md
-   architecture/load_planner.md
+   planner/load_planner.md
   architecture/request_migration.md
   architecture/request_cancellation.md


--- a/docs/index.rst
+++ b/docs/index.rst
@@ -42,7 +42,7 @@ Quickstart

   Quickstart <self>
   Installation <_sections/installation>
-   Support Matrix <support_matrix.md>
+   Support Matrix <reference/support-matrix.md>
   Architecture <_sections/architecture>
   Examples <_sections/examples>

@@ -63,18 +63,18 @@ Quickstart
   :caption: Components

   Backends <_sections/backends>
-   Router <components/router/README>
-   Planner <architecture/planner_intro>
-   KVBM <architecture/kvbm_intro>
+   Router <router/README>
+   Planner <planner/planner_intro>
+   KVBM <kvbm/kvbm_intro>

 .. toctree::
   :hidden:
   :caption: Developer Guide

   Benchmarking Guide <benchmarks/benchmarking.md>
-   SLA Planner (Autoscaling) Quickstart <kubernetes/sla_planner_quickstart>
-   Logging <guides/logging.md>
-   Health Checks <guides/health_check.md>
-   Tuning Disaggregated Serving Performance <guides/disagg_perf_tuning.md>
-   Writing Python Workers in Dynamo <guides/backend.md>
-   Glossary <dynamo_glossary.md>
+   SLA Planner (Autoscaling) Quickstart <planner/sla_planner_quickstart>
+   Logging <observability/logging.md>
+   Health Checks <observability/health-checks.md>
+   Tuning Disaggregated Serving Performance <performance/tuning.md>
+   Writing Python Workers in Dynamo <development/backend-guide.md>
+   Glossary <reference/glossary.md>
--- a/docs/kubernetes/create_deployment.md
+++ b/docs/kubernetes/create_deployment.md
@@ -90,7 +90,7 @@ Consult the corresponding sh file. Each of the python commands to launch a compo

 The front end is launched with "python3 -m dynamo.frontend [--http-port 8000] [--router-mode kv]"
 Each worker will launch `python -m dynamo.YOUR_INFERENCE_BACKEND --model YOUR_MODEL --your-flags `command.
-If you are a Dynamo contributor the [dynamo run guide](/docs/guides/dynamo_run.md) for details on how to run this command.
+If you are a Dynamo contributor the [dynamo run guide](/docs/reference/cli.md) for details on how to run this command.


 ## Step 3: Key Customization Points

--- a/docs/kubernetes/installation_guide.md
+++ b/docs/kubernetes/installation_guide.md
@@ -196,7 +196,7 @@ kubectl get pods -n ${NAMESPACE}

 3. **Optional:**
   - [Set up Prometheus & Grafana](metrics.md)
-   - [SLA Planner Quickstart Guide](sla_planner_quickstart.md) (for SLA-aware scheduling and autoscaling)
+   - [SLA Planner Quickstart Guide](../planner/sla_planner_quickstart.md) (for SLA-aware scheduling and autoscaling)

 ## Troubleshooting


--- a/docs/kubernetes/metrics.md
+++ b/docs/kubernetes/metrics.md
@@ -65,7 +65,7 @@ This will create two components:

 Both components expose a `/metrics` endpoint following the OpenMetrics format, but with different metrics appropriate to their roles. For details about:
 - Deployment configuration: See the [vLLM README](/docs/backends/vllm/README.md)
- Available metrics: See the [metrics guide](/docs/guides/metrics.md)
+- Available metrics: See the [metrics guide](/docs/observability/metrics.md)

 ### Validate the Deployment


--- a/docs/architecture/kvbm_architecture.md
+++ b/docs/architecture/kvbm_architecture.md
--- a/docs/architecture/kvbm_components.md
+++ b/docs/architecture/kvbm_components.md
--- a/docs/architecture/kvbm_intro.rst
+++ b/docs/architecture/kvbm_intro.rst
--- a/docs/architecture/kvbm_motivation.md
+++ b/docs/architecture/kvbm_motivation.md
--- a/docs/architecture/kvbm_reading.md
+++ b/docs/architecture/kvbm_reading.md
--- a/docs/guides/run_kvbm_in_trtllm.md
+++ b/docs/guides/run_kvbm_in_trtllm.md
@@ -19,7 +19,7 @@ limitations under the License.

 This guide explains how to leverage KVBM (KV Block Manager) to mange KV cache and do KV offloading in TensorRT-LLM (trtllm).

-To learn what KVBM is, please check [here](https://docs.nvidia.com/dynamo/latest/architecture/kvbm_intro.html)
+To learn what KVBM is, please check [here](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_intro.html)

 > [!Note]
 > - Ensure that `etcd` and `nats` are running before starting.

--- a/docs/guides/run_kvbm_in_vllm.md
+++ b/docs/guides/run_kvbm_in_vllm.md
@@ -19,7 +19,7 @@ limitations under the License.

 This guide explains how to leverage KVBM (KV Block Manager) to mange KV cache and do KV offloading in vLLM.

-To learn what KVBM is, please check [here](https://docs.nvidia.com/dynamo/latest/architecture/kvbm_intro.html)
+To learn what KVBM is, please check [here](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_intro.html)

 ## Quick Start