Unverified Commit 598cbbb7 authored by Anish's avatar Anish Committed by GitHub
Browse files

docs: reorganizing documentation to make things clearer (#3658)


Signed-off-by: default avatarathreesh <anish.maddipoti@utexas.edu>
Co-authored-by: default avatarClaude <noreply@anthropic.com>
parent 34fc9693
......@@ -59,9 +59,9 @@ Dynamo is designed to be inference engine agnostic (supports TRT-LLM, vLLM, SGLa
| [**Disaggregated Serving**](/docs/architecture/disagg_serving.md) | ✅ | ✅ | ✅ |
| [**Conditional Disaggregation**](/docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | 🚧 | 🚧 |
| [**KV-Aware Routing**](/docs/architecture/kv_cache_routing.md) | ✅ | ✅ | ✅ |
| [**Load Based Planner**](/docs/architecture/load_planner.md) | 🚧 | 🚧 | 🚧 |
| [**SLA-Based Planner**](/docs/architecture/sla_planner.md) | ✅ | ✅ | ✅ |
| [**KVBM**](/docs/architecture/kvbm_architecture.md) | ✅ | 🚧 | ✅ |
| [**Load Based Planner**](docs/planner/load_planner.md) | 🚧 | 🚧 | 🚧 |
| [**SLA-Based Planner**](docs/planner/sla_planner.md) | ✅ | ✅ | ✅ |
| [**KVBM**](docs/kvbm/kvbm_architecture.md) | ✅ | 🚧 | ✅ |
To learn more about each framework and their capabilities, check out each framework's README!
......@@ -74,7 +74,7 @@ Built in Rust for performance and in Python for extensibility, Dynamo is fully o
# Installation
The following examples require a few system level packages.
Recommended to use Ubuntu 24.04 with a x86_64 CPU. See [docs/support_matrix.md](docs/support_matrix.md)
Recommended to use Ubuntu 24.04 with a x86_64 CPU. See [docs/reference/support-matrix.md](docs/reference/support-matrix.md)
## 1. Initial setup
......
......@@ -237,7 +237,7 @@ args:
- **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/kubernetes/create_deployment.md)
- **Quickstart**: [Deployment Quickstart](../../../../docs/kubernetes/README.md)
- **Platform Setup**: [Dynamo Cloud Installation](../../../../docs/kubernetes/installation_guide.md)
- **SLA Planner**: [SLA Planner Quickstart Guide](../../../../docs/kubernetes/sla_planner_quickstart.md)
- **SLA Planner**: [SLA Planner Quickstart Guide](../../../../docs/planner/sla_planner_quickstart.md)
- **Examples**: [Deployment Examples](../../../../docs/examples/README.md)
- **Architecture Docs**: [Disaggregated Serving](../../../../docs/architecture/disagg_serving.md), [KV-Aware Routing](../../../../docs/architecture/kv_cache_routing.md)
......
......@@ -15,4 +15,4 @@ See the License for the specific language governing permissions and
limitations under the License.
-->
Please refer to [planner docs](../../docs/architecture/planner_intro.rst) for planner documentation.
Please refer to [planner docs](../../../../docs/planner/planner_intro.rst) for planner documentation.
......@@ -3,7 +3,7 @@
This directory contains configuration for visualizing metrics from the metrics aggregation service using Prometheus and Grafana.
> [!NOTE]
> For detailed information about Dynamo's metrics system, including hierarchical metrics, automatic labeling, and usage examples, see the [Metrics Guide](../../docs/guides/metrics.md).
> For detailed information about Dynamo's metrics system, including hierarchical metrics, automatic labeling, and usage examples, see the [Metrics Guide](../../docs/observability/metrics.md).
## Overview
......
......@@ -19,7 +19,7 @@ Dynamo supports OpenTelemetry-based distributed tracing, allowing you to visuali
## Environment Variables
Dynamo's tracing is configured via environment variables. For complete logging documentation, see [docs/guides/logging.md](../../docs/guides/logging.md).
Dynamo's tracing is configured via environment variables. For complete logging documentation, see [docs/observability/logging.md](../../docs/observability/logging.md).
### Required Environment Variables
......
......@@ -54,8 +54,8 @@ The following diagram outlines Dynamo's high-level architecture. To enable large
- [Dynamo Disaggregated Serving](disagg_serving.md)
- [Dynamo Smart Router](kv_cache_routing.md)
- [Dynamo KV Cache Block Manager](kvbm_intro.rst)
- [Planner](planner_intro.rst)
- [Dynamo KV Cache Block Manager](../kvbm/kvbm_intro.rst)
- [Planner](../planner/planner_intro.rst)
- [NVIDIA Inference Transfer Library (NIXL)](https://github.com/ai-dynamo/nixl/blob/main/docs/nixl.md)
Every component in the Dynamo architecture is independently scalable and portable. The API server can adapt to task-specific deployment. A smart router processes user requests to route them to the optimal worker for performance. Specifically, for Large Language Models (LLMs), Dynamo employs KV cache-aware routing, which directs requests to the worker with the highest cache hit rate while maintaining load balance, expediting decoding. This routing strategy leverages a KV cache manager that maintains a global radix tree registry for hit rate calculation. The KV cache manager also oversees a multi-tiered memory system, enabling rapid KV cache storage and eviction. This design results in substantial TTFT reductions, increased throughput, and the ability to process extensive context lengths.
......
......@@ -156,7 +156,7 @@ For improved fault tolerance, you can launch multiple frontend + router replicas
### Router State Management
The KV Router tracks two types of state (see [KV Router Architecture](../components/router/README.md) for details):
The KV Router tracks two types of state (see [KV Router Architecture](../router/README.md) for details):
1. **Prefix blocks (cached KV blocks)**: Maintained in a radix tree, tracking which blocks are cached on each worker. This state is **persistent** - backed by NATS JetStream events and object store snapshots. New router replicas automatically sync this state on startup, ensuring consistent cache awareness across restarts.
......@@ -515,4 +515,4 @@ This approach gives you complete control over routing decisions, allowing you to
- **Maximize cache reuse**: Use `best_worker_id()` which considers both prefill and decode loads
- **Balance load**: Consider both `potential_prefill_tokens` and `potential_decode_blocks` together
See [KV Router Architecture](../components/router/README.md) for performance tuning details.
See [KV Router Architecture](../router/README.md) for performance tuning details.
......@@ -37,9 +37,9 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
| [**Disaggregated Serving**](../../architecture/disagg_serving.md) | ✅ | |
| [**Conditional Disaggregation**](../../architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP [PR](https://github.com/sgl-project/sglang/pull/7730) |
| [**KV-Aware Routing**](../../architecture/kv_cache_routing.md) | ✅ | |
| [**SLA-Based Planner**](../../architecture/sla_planner.md) | ✅ | |
| [**SLA-Based Planner**](../../planner/sla_planner.md) | ✅ | |
| [**Multimodal EPD Disaggregation**](multimodal_epd.md) | ✅ | |
| [**KVBM**](../../architecture/kvbm_architecture.md) | ❌ | Planned |
| [**KVBM**](../../kvbm/kvbm_architecture.md) | ❌ | Planned |
## Dynamo SGLang Integration
......
......@@ -10,7 +10,7 @@ When running SGLang through Dynamo, SGLang engine metrics are automatically pass
For the complete and authoritative list of all SGLang metrics, always refer to the official documentation linked above.
Dynamo runtime metrics are documented in [docs/guides/metrics.md](../../guides/metrics.md).
Dynamo runtime metrics are documented in [docs/observability/metrics.md](../../observability/metrics.md).
## Metric Reference
......@@ -91,7 +91,7 @@ sglang:cache_hit_rate{model_name="meta-llama/Llama-3.1-8B-Instruct"} 0.0075
- [SGLang GitHub - Metrics Collector](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/metrics/collector.py)
### Dynamo Metrics
- **Dynamo Metrics Guide**: See `docs/guides/metrics.md` for complete documentation on Dynamo runtime metrics
- **Dynamo Metrics Guide**: See [docs/observability/metrics.md](../../observability/metrics.md) for complete documentation on Dynamo runtime metrics
- **Dynamo Runtime Metrics**: Metrics prefixed with `dynamo_*` for runtime, components, endpoints, and namespaces
- Implementation: `lib/runtime/src/metrics.rs` (Rust runtime metrics)
- Metric names: `lib/runtime/src/metrics/prometheus_names.rs` (metric name constants)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment