docs: address Harry/VDR feedback + fixing broken links across repository (#3802)

Signed-off-by: Harry Kim <harry_kim@live.com> Signed-off-by: athreesh <anish.maddipoti@utexas.edu> Signed-off-by: akshatha-k <33278067+akshatha-k@users.noreply.github.com> Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com> Co-authored-by: Harry Kim <harry_kim@live.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: akshatha-k <33278067+akshatha-k@users.noreply.github.com> Co-authored-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>

docs: address Harry/VDR feedback + fixing broken links across repository (#3802)
Signed-off-by: Harry Kim <harry_kim@live.com> Signed-off-by: athreesh <anish.maddipoti@utexas.edu> Signed-off-by: akshatha-k <33278067+akshatha-k@users.noreply.github.com> Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com> Co-authored-by: Harry Kim <harry_kim@live.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: akshatha-k <33278067+akshatha-k@users.noreply.github.com> Co-authored-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>
c6b59045 · Anish · GitHub · d712ce8d · c6b59045 · c6b59045
Unverified Commit c6b59045 authored Oct 22, 2025 by Anish Committed by GitHub Oct 22, 2025
20 changed files
--- a/README.md
+++ b/README.md
@@ -22,7 +22,7 @@ limitations under the License.
 [![Discord](https://dcbadge.limes.pink/api/server/D92uqZRjCZ?style=flat)](https://discord.gg/D92uqZRjCZ)
 [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/ai-dynamo/dynamo)
-| **[Roadmap](https://github.com/ai-dynamo/dynamo/issues/762)** | **[Support matrix](https://github.com/ai-dynamo/dynamo/blob/main/docs/reference/support-matrix.md)** | **[Documentation](https://docs.nvidia.com/dynamo/latest/index.html)** | **[Examples](https://github.com/ai-dynamo/dynamo/tree/main/examples)** | **[Prebuilt containers](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/collections/ai-dynamo)** | **[Design Proposals](https://github.com/ai-dynamo/enhancements)** | **[Blogs](https://developer.nvidia.com/blog/tag/nvidia-dynamo)**
+| **[Roadmap](https://github.com/ai-dynamo/dynamo/issues/2486)** | **[Support matrix](https://github.com/ai-dynamo/dynamo/blob/main/docs/reference/support-matrix.md)** | **[Documentation](https://docs.nvidia.com/dynamo/latest/index.html)** | **[Examples](https://github.com/ai-dynamo/dynamo/tree/main/examples)** | **[Prebuilt containers](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/collections/ai-dynamo)** | **[Design Proposals](https://github.com/ai-dynamo/enhancements)** | **[Blogs](https://developer.nvidia.com/blog/tag/nvidia-dynamo)**
 # NVIDIA Dynamo
@@ -56,9 +56,9 @@ Dynamo is designed to be inference engine agnostic (supports TRT-LLM, vLLM, SGLa
 | Feature                                                                                           | vLLM | SGLang | TensorRT-LLM |
 | ------------------------------------------------------------------------------------------------- | ---- | ------ | ------------ |
-| [**Disaggregated Serving**](/docs/architecture/disagg_serving.md)                                 | ✅   | ✅     | ✅           |
+| [**Disaggregated Serving**](/docs/design_docs/disagg_serving.md)                                 | ✅   | ✅     | ✅           |
-| [**Conditional Disaggregation**](/docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧   | 🚧     | 🚧           |
+| [**Conditional Disaggregation**](/docs/design_docs/disagg_serving.md#conditional-disaggregation) | 🚧   | 🚧     | 🚧           |
-| [**KV-Aware Routing**](/docs/architecture/kv_cache_routing.md)                                    | ✅   | ✅     | ✅           |
+| [**KV-Aware Routing**](/docs/router/kv_cache_routing.md)                                    | ✅   | ✅     | ✅           |
 | [**Load Based Planner**](docs/planner/load_planner.md)                                      | 🚧   | 🚧     | 🚧           |
 | [**SLA-Based Planner**](docs/planner/sla_planner.md)                                        | ✅   | ✅     | ✅           |
 | [**KVBM**](docs/kvbm/kvbm_architecture.md)                                               | ✅   | 🚧     | ✅           |

--- a/benchmarks/router/README.md
+++ b/benchmarks/router/README.md
@@ -116,7 +116,7 @@ To see all available router arguments, run:
 python -m dynamo.frontend --help
 ```
-For detailed explanations of router arguments (especially KV cache routing parameters), see the [KV Cache Routing documentation](../../docs/architecture/kv_cache_routing.md).
+For detailed explanations of router arguments (especially KV cache routing parameters), see the [KV Cache Routing documentation](../../docs/router/kv_cache_routing.md).
 #### Disaggregated Serving with Automatic Prefill Routing
@@ -125,7 +125,7 @@ When you launch prefill workers using `run_engines.sh --prefill`, the frontend a
 - Uses KV-aware routing regardless of the frontend's `--router-mode` setting
 - Seamlessly integrates with your decode workers for token generation
-No additional configuration is needed - simply launch both decode and prefill workers, and the system handles the rest. See the [KV Cache Routing documentation](../../docs/architecture/kv_cache_routing.md#disaggregated-serving-prefill-and-decode) for more details.
+No additional configuration is needed - simply launch both decode and prefill workers, and the system handles the rest. See the [KV Cache Routing documentation](../../docs/router/kv_cache_routing.md#disaggregated-serving-prefill-and-decode) for more details.
 **Note**: If you're unsure whether your backend engines correctly emit KV events for certain models (e.g., hybrid models like gpt-oss or nemotron nano 2), use the `--no-kv-events` flag to disable KV event tracking and use approximate KV indexing instead:

--- a/components/README.md
+++ b/components/README.md
@@ -31,7 +31,7 @@ Each engine provides launch scripts for different deployment patterns in their r
 ## Core Components
-### [Backends](src/dynamo/)
+### [Backends](backends/)
 The backends directory contains inference engine integrations and implementations, with a key focus on:

--- a/components/backends/sglang/deploy/README.md
+++ b/components/backends/sglang/deploy/README.md
@@ -144,7 +144,7 @@ All templates use **DeepSeek-R1-Distill-Llama-8B** as the default model. But you
 ## Further Reading
- **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/kubernetes/create_deployment.md)
+- **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/kubernetes/deployment/create_deployment.md)
 - **Quickstart**: [Deployment Quickstart](../../../../docs/kubernetes/README.md)
 - **Platform Setup**: [Dynamo Cloud Installation](../../../../docs/kubernetes/installation_guide.md)
 - **Examples**: [Deployment Examples](../../../../docs/examples/README.md)

--- a/components/backends/trtllm/deploy/README.md
+++ b/components/backends/trtllm/deploy/README.md
@@ -153,7 +153,7 @@ args:
 ### 3. Deploy
-See the [Create Deployment Guide](../../../../docs/kubernetes/create_deployment.md) to learn how to deploy the deployment file.
+See the [Create Deployment Guide](../../../../docs/kubernetes/deployment/create_deployment.md) to learn how to deploy the deployment file.
 First, create a secret for the HuggingFace token.
 ```bash
@@ -258,7 +258,7 @@ For detailed configuration instructions, see the [KV cache transfer guide](../..
 ## Request Migration
-You can enable [request migration](../../../../docs/architecture/request_migration.md) to handle worker failures gracefully by adding the migration limit argument to worker configurations:
+You can enable [request migration](../../../../docs/fault_tolerance/request_migration.md) to handle worker failures gracefully by adding the migration limit argument to worker configurations:
 ```yaml
 args:
@@ -277,11 +277,11 @@ Configure the `model` name and `host` based on your deployment.
 ## Further Reading
- **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/kubernetes/create_deployment.md)
+- **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/kubernetes/deployment/create_deployment.md)
 - **Quickstart**: [Deployment Quickstart](../../../../docs/kubernetes/README.md)
 - **Platform Setup**: [Dynamo Cloud Installation](../../../../docs/kubernetes/installation_guide.md)
 - **Examples**: [Deployment Examples](../../../../docs/examples/README.md)
- **Architecture Docs**: [Disaggregated Serving](../../../../docs/architecture/disagg_serving.md), [KV-Aware Routing](../../../../docs/architecture/kv_cache_routing.md)
+- **Architecture Docs**: [Disaggregated Serving](../../../../docs/design_docs/disagg_serving.md), [KV-Aware Routing](../../../../docs/router/kv_cache_routing.md)
 - **Multinode Deployment**: [Multinode Examples](../../../../docs/backends/trtllm/multinode/multinode-examples.md)
 - **Speculative Decoding**: [Llama 4 + Eagle Guide](../../../../docs/backends/trtllm/llama4_plus_eagle.md)
 - **Kubernetes CRDs**: [Custom Resources Documentation](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)

--- a/components/backends/vllm/deploy/README.md
+++ b/components/backends/vllm/deploy/README.md
@@ -224,7 +224,7 @@ All templates use **Qwen/Qwen3-0.6B** as the default model, but you can use any
 ## Request Migration
-You can enable [request migration](../../../../docs/architecture/request_migration.md) to handle worker failures gracefully by adding the migration limit argument to worker configurations:
+You can enable [request migration](../../../../docs/fault_tolerance/request_migration.md) to handle worker failures gracefully by adding the migration limit argument to worker configurations:
 ```yaml
 args:
@@ -234,12 +234,12 @@ args:
 ## Further Reading
- **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/kubernetes/create_deployment.md)
+- **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/kubernetes/deployment/create_deployment.md)
 - **Quickstart**: [Deployment Quickstart](../../../../docs/kubernetes/README.md)
 - **Platform Setup**: [Dynamo Cloud Installation](../../../../docs/kubernetes/installation_guide.md)
 - **SLA Planner**: [SLA Planner Quickstart Guide](../../../../docs/planner/sla_planner_quickstart.md)
 - **Examples**: [Deployment Examples](../../../../docs/examples/README.md)
- **Architecture Docs**: [Disaggregated Serving](../../../../docs/architecture/disagg_serving.md), [KV-Aware Routing](../../../../docs/architecture/kv_cache_routing.md)
+- **Architecture Docs**: [Disaggregated Serving](../../../../docs/design_docs/disagg_serving.md), [KV-Aware Routing](../../../../docs/router/kv_cache_routing.md)
 ## Troubleshooting

--- a/components/src/dynamo/router/README.md
+++ b/components/src/dynamo/router/README.md
@@ -3,7 +3,7 @@
 # Standalone Router
-A backend-agnostic standalone KV-aware router service for Dynamo deployments. For details on how KV-aware routing works, see the [KV Cache Routing documentation](/docs/architecture/kv_cache_routing.md).
+A backend-agnostic standalone KV-aware router service for Dynamo deployments. For details on how KV-aware routing works, see the [KV Cache Routing documentation](/docs/router/kv_cache_routing.md).
 ## Overview
@@ -29,7 +29,7 @@ python -m dynamo.router \
 - `--endpoint`: Full endpoint path for workers in the format `namespace.component.endpoint` (e.g., `dynamo.prefill.generate`)
 **Router Configuration:**
-For detailed descriptions of all KV router configuration options including `--block-size`, `--kv-overlap-score-weight`, `--router-temperature`, `--no-kv-events`, `--router-replica-sync`, `--router-snapshot-threshold`, `--router-reset-states`, and `--no-track-active-blocks`, see the [KV Cache Routing documentation](/docs/architecture/kv_cache_routing.md).
+For detailed descriptions of all KV router configuration options including `--block-size`, `--kv-overlap-score-weight`, `--router-temperature`, `--no-kv-events`, `--router-replica-sync`, `--router-snapshot-threshold`, `--router-reset-states`, and `--no-track-active-blocks`, see the [KV Cache Routing documentation](/docs/router/kv_cache_routing.md).
 ## Architecture
@@ -43,7 +43,7 @@ Clients query the `find_best_worker` endpoint to determine which worker should p
 ## Example: Manual Disaggregated Serving (Alternative Setup)
 > [!Note]
-> **This is an alternative advanced setup.** The recommended approach for disaggregated serving is to use the frontend's automatic prefill routing, which activates when you register workers with `ModelType.Prefill`. See the [KV Cache Routing documentation](/docs/architecture/kv_cache_routing.md#disaggregated-serving-prefill-and-decode) for the default setup.
+> **This is an alternative advanced setup.** The recommended approach for disaggregated serving is to use the frontend's automatic prefill routing, which activates when you register workers with `ModelType.Prefill`. See the [KV Cache Routing documentation](../../../../docs/router/kv_cache_routing.md#disaggregated-serving-prefill-and-decode) for the default setup.
 >
 > Use this manual setup if you need explicit control over prefill routing configuration or want to manage prefill and decode routers separately.
@@ -103,6 +103,6 @@ See [`components/src/dynamo/vllm/handlers.py`](../vllm/handlers.py) for a refere
 ## See Also
- [KV Cache Routing Architecture](/docs/architecture/kv_cache_routing.md) - Detailed explanation of KV-aware routing
+- [KV Cache Routing Architecture](/docs/router/kv_cache_routing.md) - Detailed explanation of KV-aware routing
 - [Frontend Router](../frontend/README.md) - Main HTTP frontend with integrated routing
 - [Router Benchmarking](/benchmarks/router/README.md) - Performance testing and tuning
--- a/deploy/cloud/pre-deployment/README.md
+++ b/deploy/cloud/pre-deployment/README.md
@@ -21,7 +21,7 @@ This directory contains a pre-deployment check script that verifies your Kuberne
 - For NCCL tests, please refer to the [NCCL tests](https://docs.nebius.com/kubernetes/gpu/nccl-test#run-tests) for more details.
- For NIXL benchmark, please refer to the [NIXL benchmark pre-deployment checks](/deploy/cloud/pre-deployment/nixl/README.md) for more details.
+For the latest pre-deployment check instructions, see the [main branch version of this README](https://github.com/ai-dynamo/dynamo/blob/main/deploy/cloud/pre-deployment/README.md).
 ## Usage

--- a/deploy/inference-gateway/README.md
+++ b/deploy/inference-gateway/README.md
@@ -16,7 +16,7 @@ Currently, these setups are only supported with the kGateway based Inference Gat
 - [Prerequisites](#prerequisites)
 - [Installation Steps](#installation-steps)
- [Usage](#usage)
+- [Usage](#6-usage)
 ## Prerequisites
@@ -160,7 +160,7 @@ You can configure the plugin by setting environment vars in your [values-dynamo-
  - Set `DYNAMO_OVERLAP_SCORE_WEIGHT` to weigh how heavily the score uses token overlap (predicted KV cache hits) versus other factors (load, historical hit rate). Higher weight biases toward reusing workers with similar cached prefixes.
  - Set `DYNAMO_ROUTER_TEMPERATURE` to soften or sharpen the selection curve when combining scores. Low temperature makes the router pick the top candidate deterministically; higher temperature lets lower-scoring workers through more often (exploration).
  - Set `DYNAMO_USE_KV_EVENTS=false` if you want to disable KV event tracking while using kv-routing
-  - See the [KV cache routing design](../../docs/architecture/kv_cache_routing.md) for details.
+  - See the [KV cache routing design](../../docs/router/kv_cache_routing.md) for details.

--- a/deploy/logging/README.md
+++ b/deploy/logging/README.md
 # Dynamo Logging on Kubernetes
-For detailed documentation on collecting and visualizing logs on Kubernetes, see [docs/kubernetes/logging.md](../../docs/kubernetes/logging.md).
+For detailed documentation on collecting and visualizing logs on Kubernetes, see [docs/kubernetes/observability/logging.md](../../docs/kubernetes/observability/logging.md).
--- a/deploy/metrics/k8s/README.md
+++ b/deploy/metrics/k8s/README.md
 # Dynamo Metrics Collection on Kubernetes
-For detailed documentation on collecting and visualizing metrics on Kubernetes, see [docs/kubernetes/metrics.md](../../../docs/kubernetes/metrics.md).
+For detailed documentation on collecting and visualizing metrics on Kubernetes, see [docs/kubernetes/observability/metrics.md](../../../docs/kubernetes/observability/metrics.md).
--- a/docs/_sections/architecture.rst
+++ b/docs/_sections/architecture.rst
-Overview
-============
-.. include:: ../architecture/architecture.md
-   :parser: myst_parser.sphinx_
-.. toctree::
-   :hidden:
-   Overview <self>
-   Disaggregated Serving <../architecture/disagg_serving>
--- a/docs/_sections/backends.rst
+++ b/docs/_sections/backends.rst
-..
-    SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-    SPDX-License-Identifier: Apache-2.0
-    Licensed under the Apache License, Version 2.0 (the "License");
-    you may not use this file except in compliance with the License.
-    You may obtain a copy of the License at
-    http://www.apache.org/licenses/LICENSE-2.0
-    Unless required by applicable law or agreed to in writing, software
-    distributed under the License is distributed on an "AS IS" BASIS,
-    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-    See the License for the specific language governing permissions and
-    limitations under the License.
 Backends
 ========
-NVIDIA Dynamo supports multiple inference backends to provide flexibility and performance optimization for different use cases and model architectures. Backends are the underlying engines that execute AI model inference, each optimized for specific scenarios, hardware configurations, and performance requirements.
-Overview
--------
-Dynamo's multi-backend architecture allows you to:
-* **Choose the optimal engine** for your specific workload and hardware
-* **Switch between backends** without changing your application code
-* **Leverage specialized optimizations** from each backend
-* **Scale flexibly** across different deployment scenarios
-Supported Backends
------------------
-Dynamo currently supports the following high-performance inference backends:
 .. toctree::
   :maxdepth: 1

--- a/docs/_sections/k8s_deployment.rst
+++ b/docs/_sections/k8s_deployment.rst
+Deployment Guide
+================
+.. toctree::
+   :hidden:
+   Kubernetes Quickstart <../kubernetes/README>
+   Detailed Installation Guide <../kubernetes/installation_guide>
+   Dynamo Operator <../kubernetes/dynamo_operator>
+   Minikube Setup <../kubernetes/deployment/minikube>
--- a/docs/_sections/k8s_multinode.rst
+++ b/docs/_sections/k8s_multinode.rst
+Multinode
+=========
+.. toctree::
+   :hidden:
+   Multinode Deployments <../kubernetes/deployment/multinode-deployment>
+   Grove <../kubernetes/grove>
--- a/docs/_sections/k8s_observability.rst
+++ b/docs/_sections/k8s_observability.rst
+Observability
+=============
+.. toctree::
+   :hidden:
+   Metrics <../kubernetes/observability/metrics>
+   Logging <../kubernetes/observability/logging>
--- a/docs/_sections/observability.rst
+++ b/docs/_sections/observability.rst
+Observability
+=============
+.. toctree::
+   :hidden:
+   Metrics <../observability/metrics>
+   Logging <../observability/logging>
+   Health Checks <../observability/health-checks>
\ No newline at end of file
--- a/docs/guides/tool-calling.md
+++ b/docs/guides/tool-calling.md
--- a/docs/backends/sglang/README.md
+++ b/docs/backends/sglang/README.md
@@ -34,9 +34,9 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 | Feature | SGLang | Notes |
 |---------|--------|-------|
-| [**Disaggregated Serving**](../../architecture/disagg_serving.md) | ✅ |  |
+| [**Disaggregated Serving**](../../design_docs/disagg_serving.md) | ✅ |  |
-| [**Conditional Disaggregation**](../../architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP [PR](https://github.com/sgl-project/sglang/pull/7730) |
+| [**Conditional Disaggregation**](../../design_docs/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP [PR](https://github.com/sgl-project/sglang/pull/7730) |
-| [**KV-Aware Routing**](../../architecture/kv_cache_routing.md) | ✅ |  |
+| [**KV-Aware Routing**](../../router/kv_cache_routing.md) | ✅ |  |
 | [**SLA-Based Planner**](../../planner/sla_planner.md) | ✅ |  |
 | [**Multimodal EPD Disaggregation**](multimodal_epd.md) | ✅ |  |
 | [**KVBM**](../../kvbm/kvbm_architecture.md) | ❌ | Planned |
@@ -55,7 +55,7 @@ Dynamo SGLang uses SGLang's native argument parser, so **most SGLang engine argu
 | Argument | Description | Default | SGLang Equivalent |
 |----------|-------------|---------|-------------------|
 | `--endpoint` | Dynamo endpoint in `dyn://namespace.component.endpoint` format | Auto-generated based on mode | N/A |
-| `--migration-limit` | Max times a request can migrate between workers for fault tolerance. See [Request Migration Architecture](../../../docs/architecture/request_migration.md). | `0` (disabled) | N/A |
+| `--migration-limit` | Max times a request can migrate between workers for fault tolerance. See [Request Migration Architecture](../../fault_tolerance/request_migration.md). | `0` (disabled) | N/A |
 | `--dyn-tool-call-parser` | Tool call parser for structured outputs (takes precedence over `--tool-call-parser`) | `None` | `--tool-call-parser` |
 | `--dyn-reasoning-parser` | Reasoning parser for CoT models (takes precedence over `--reasoning-parser`) | `None` | `--reasoning-parser` |
 | `--use-sglang-tokenizer` | Use SGLang's tokenizer instead of Dynamo's | `False` | N/A |
@@ -83,7 +83,7 @@ When a user cancels a request (e.g., by disconnecting from the frontend), the re
 > [!WARNING]
 > ⚠️ SGLang backend currently does not support cancellation during remote prefill phase in disaggregated mode.
-For more details, see the [Request Cancellation Architecture](../../architecture/request_cancellation.md) documentation.
+For more details, see the [Request Cancellation Architecture](../../fault_tolerance/request_cancellation.md) documentation.
 ## Installation

--- a/docs/backends/sglang/multimodal_epd.md
+++ b/docs/backends/sglang/multimodal_epd.md
@@ -23,7 +23,7 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 ### Components
- workers: For aggregated serving, we have two workers, [MultimodalEncodeWorker](src/dynamo/sglang/request_handlers/multimodal_encode_worker_handler.py) for encoding and [MultimodalWorker](src/dynamo/sglang/request_handlers/multimodal_worker_handler.py) for prefilling and decoding.
+- workers: For aggregated serving, we have two workers, [MultimodalEncodeWorkerHandler](../../../components/src/dynamo/sglang/request_handlers/multimodal/encode_worker_handler.py) for encoding and [MultimodalWorkerHandler](../../../components/src/dynamo/sglang/request_handlers/multimodal/worker_handler.py) for prefilling and decoding.
 - processor: Tokenizes the prompt and passes it to the MultimodalEncodeWorker.
 ### Workflow
@@ -109,7 +109,7 @@ You should see a response similar to this:
 ### Components
- workers: For disaggregated serving, we have three workers, [MultimodalEncodeWorker](src/dynamo/sglang/request_handlers/multimodal_encode_worker_handler.py) for encoding, [MultimodalWorker](src/dynamo/sglang/request_handlers/multimodal_worker_handler.py) for decoding, and [MultimodalPrefillWorker](src/dynamo/sglang/request_handlers/multimodal_worker_handler.py) for prefilling.
+- workers: For disaggregated serving, we have three workers, [MultimodalEncodeWorkerHandler](../../../components/src/dynamo/sglang/request_handlers/multimodal/encode_worker_handler.py) for encoding, [MultimodalWorkerHandler](../../../components/src/dynamo/sglang/request_handlers/multimodal/worker_handler.py) for decoding, and [MultimodalPrefillWorkerHandler](../../../components/src/dynamo/sglang/request_handlers/multimodal/worker_handler.py) for prefilling.
 - processor: Tokenizes the prompt and passes it to the MultimodalEncodeWorker.
 ### Workflow