Unverified Commit 4ef8b8e6 authored by dagil-nvidia's avatar dagil-nvidia Committed by GitHub
Browse files

docs: fix broken links, typo, blog nav, and CUDA version (#7380)


Signed-off-by: default avatarDan Gil <dagil@nvidia.com>
parent 5c48130d
...@@ -24,11 +24,11 @@ Dynamo supports multiple KV cache offloading backends for vLLM, allowing you to ...@@ -24,11 +24,11 @@ Dynamo supports multiple KV cache offloading backends for vLLM, allowing you to
| Deployment | Launch Script | | Deployment | Launch Script |
| -------------------------- | --------------------------------------------------------------------------------------- | | -------------------------- | --------------------------------------------------------------------------------------- |
| Aggregated | [`agg_kvbm.sh`](../../../examples/backends/vllm/launch/agg_kvbm.sh) | | Aggregated | [`agg_kvbm.sh`](https://github.com/ai-dynamo/dynamo/blob/main/examples/backends/vllm/launch/agg_kvbm.sh) |
| Aggregated + KV routing | [`agg_kvbm_router.sh`](../../../examples/backends/vllm/launch/agg_kvbm_router.sh) | | Aggregated + KV routing | [`agg_kvbm_router.sh`](https://github.com/ai-dynamo/dynamo/blob/main/examples/backends/vllm/launch/agg_kvbm_router.sh) |
| Disaggregated (1P1D) | [`disagg_kvbm.sh`](../../../examples/backends/vllm/launch/disagg_kvbm.sh) | | Disaggregated (1P1D) | [`disagg_kvbm.sh`](https://github.com/ai-dynamo/dynamo/blob/main/examples/backends/vllm/launch/disagg_kvbm.sh) |
| Disaggregated (2P2D) | [`disagg_kvbm_2p2d.sh`](../../../examples/backends/vllm/launch/disagg_kvbm_2p2d.sh) | | Disaggregated (2P2D) | [`disagg_kvbm_2p2d.sh`](https://github.com/ai-dynamo/dynamo/blob/main/examples/backends/vllm/launch/disagg_kvbm_2p2d.sh) |
| Disaggregated + KV routing | [`disagg_kvbm_router.sh`](../../../examples/backends/vllm/launch/disagg_kvbm_router.sh) | | Disaggregated + KV routing | [`disagg_kvbm_router.sh`](https://github.com/ai-dynamo/dynamo/blob/main/examples/backends/vllm/launch/disagg_kvbm_router.sh) |
For configuration details, see the [KVBM Guide](../../components/kvbm/kvbm-guide.md). For configuration details, see the [KVBM Guide](../../components/kvbm/kvbm-guide.md).
...@@ -40,9 +40,9 @@ For configuration details, see the [KVBM Guide](../../components/kvbm/kvbm-guide ...@@ -40,9 +40,9 @@ For configuration details, see the [KVBM Guide](../../components/kvbm/kvbm-guide
| Deployment | Launch Script | | Deployment | Launch Script |
| --------------------------------- | --------------------------------------------------------------------------------------------- | | --------------------------------- | --------------------------------------------------------------------------------------------- |
| Aggregated | [`agg_lmcache.sh`](../../../examples/backends/vllm/launch/agg_lmcache.sh) | | Aggregated | [`agg_lmcache.sh`](https://github.com/ai-dynamo/dynamo/blob/main/examples/backends/vllm/launch/agg_lmcache.sh) |
| Aggregated (multiprocess metrics) | [`agg_lmcache_multiproc.sh`](../../../examples/backends/vllm/launch/agg_lmcache_multiproc.sh) | | Aggregated (multiprocess metrics) | [`agg_lmcache_multiproc.sh`](https://github.com/ai-dynamo/dynamo/blob/main/examples/backends/vllm/launch/agg_lmcache_multiproc.sh) |
| Disaggregated | [`disagg_lmcache.sh`](../../../examples/backends/vllm/launch/disagg_lmcache.sh) | | Disaggregated | [`disagg_lmcache.sh`](https://github.com/ai-dynamo/dynamo/blob/main/examples/backends/vllm/launch/disagg_lmcache.sh) |
For configuration details, see the [LMCache Integration Guide](../../integrations/lmcache-integration.md). For configuration details, see the [LMCache Integration Guide](../../integrations/lmcache-integration.md).
...@@ -54,9 +54,9 @@ For configuration details, see the [LMCache Integration Guide](../../integration ...@@ -54,9 +54,9 @@ For configuration details, see the [LMCache Integration Guide](../../integration
| Deployment | Launch Script | | Deployment | Launch Script |
| ----------------------- | ------------------------------------------------------------------------------------- | | ----------------------- | ------------------------------------------------------------------------------------- |
| Aggregated | [`agg_flexkv.sh`](../../../examples/backends/vllm/launch/agg_flexkv.sh) | | Aggregated | [`agg_flexkv.sh`](https://github.com/ai-dynamo/dynamo/blob/main/examples/backends/vllm/launch/agg_flexkv.sh) |
| Aggregated + KV routing | [`agg_flexkv_router.sh`](../../../examples/backends/vllm/launch/agg_flexkv_router.sh) | | Aggregated + KV routing | [`agg_flexkv_router.sh`](https://github.com/ai-dynamo/dynamo/blob/main/examples/backends/vllm/launch/agg_flexkv_router.sh) |
| Disaggregated | [`disagg_flexkv.sh`](../../../examples/backends/vllm/launch/disagg_flexkv.sh) | | Disaggregated | [`disagg_flexkv.sh`](https://github.com/ai-dynamo/dynamo/blob/main/examples/backends/vllm/launch/disagg_flexkv.sh) |
For configuration details, see the [FlexKV Integration Guide](../../integrations/flexkv-integration.md). For configuration details, see the [FlexKV Integration Guide](../../integrations/flexkv-integration.md).
......
...@@ -189,7 +189,7 @@ Deploy one control DGD that contains: ...@@ -189,7 +189,7 @@ Deploy one control DGD that contains:
- `GlobalRouter`: chooses which pool receives each request. - `GlobalRouter`: chooses which pool receives each request.
- `GlobalPlanner`: receives scale requests from pool planners and applies replica changes. - `GlobalPlanner`: receives scale requests from pool planners and applies replica changes.
The vLLM example topology is in [examples/global_planner/global-planner-vllm-test.yaml](../../../examples/global_planner/global-planner-vllm-test.yaml). The vLLM example topology is in [examples/global_planner/global-planner-vllm-test.yaml](https://github.com/ai-dynamo/dynamo/blob/main/examples/global_planner/global-planner-vllm-test.yaml).
The `GlobalPlanner` section is minimal: The `GlobalPlanner` section is minimal:
...@@ -215,7 +215,7 @@ The values passed to `--managed-namespaces` are the pool planners' **Dynamo name ...@@ -215,7 +215,7 @@ The values passed to `--managed-namespaces` are the pool planners' **Dynamo name
**Management modes**: When `--managed-namespaces` is set (explicit mode), only the listed Dynamo namespaces are authorized to send scale requests, and only their corresponding DGDs count toward the GPU budget. DGD names are derived from the Dynamo namespace using the operator convention `DYN_NAMESPACE = {k8s_namespace}-{dgd_name}`. When omitted (implicit mode), any caller is accepted and all DGDs in the Kubernetes namespace count toward the GPU budget. **Management modes**: When `--managed-namespaces` is set (explicit mode), only the listed Dynamo namespaces are authorized to send scale requests, and only their corresponding DGDs count toward the GPU budget. DGD names are derived from the Dynamo namespace using the operator convention `DYN_NAMESPACE = {k8s_namespace}-{dgd_name}`. When omitted (implicit mode), any caller is accepted and all DGDs in the Kubernetes namespace count toward the GPU budget.
If you want the central executor to reject scale requests that exceed a total GPU budget, add `--max-total-gpus`. See [examples/global_planner/global-planner-gpu-budget.yaml](../../../examples/global_planner/global-planner-gpu-budget.yaml). If you want the central executor to reject scale requests that exceed a total GPU budget, add `--max-total-gpus`. See [examples/global_planner/global-planner-gpu-budget.yaml](https://github.com/ai-dynamo/dynamo/blob/main/examples/global_planner/global-planner-gpu-budget.yaml).
## Step 3: Create One DGD Per Pool ## Step 3: Create One DGD Per Pool
...@@ -258,7 +258,7 @@ In the reference vLLM example: ...@@ -258,7 +258,7 @@ In the reference vLLM example:
- `gp-prefill-1` uses a 2-GPU TP2 prefill worker - `gp-prefill-1` uses a 2-GPU TP2 prefill worker
- `gp-decode-0` uses a 1-GPU TP1 decode worker - `gp-decode-0` uses a 1-GPU TP1 decode worker
See [global-planner-vllm-test.yaml](../../../examples/global_planner/global-planner-vllm-test.yaml). See [global-planner-vllm-test.yaml](https://github.com/ai-dynamo/dynamo/blob/main/examples/global_planner/global-planner-vllm-test.yaml).
## Step 4: Configure GlobalRouter To Select Pools ## Step 4: Configure GlobalRouter To Select Pools
...@@ -317,7 +317,7 @@ Clients can pass request targets through `extra_args`: ...@@ -317,7 +317,7 @@ Clients can pass request targets through `extra_args`:
} }
``` ```
For more details, see [Global Router README](../../../components/src/dynamo/global_router/README.md). For more details, see [Global Router README](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/global_router/README.md).
## Step 5: Deploy In Order ## Step 5: Deploy In Order
...@@ -391,6 +391,6 @@ This keeps profiling and pool selection simple while still giving you one public ...@@ -391,6 +391,6 @@ This keeps profiling and pool selection simple while still giving you one public
- [Planner Guide](planner-guide.md) — Planner configuration reference - [Planner Guide](planner-guide.md) — Planner configuration reference
- [Planner Examples](planner-examples.md) — DGDR examples for generating per-pool configs - [Planner Examples](planner-examples.md) — DGDR examples for generating per-pool configs
- [Profiler Guide](../profiler/profiler-guide.md) — Pre-deployment profiling workflow - [Profiler Guide](../profiler/profiler-guide.md) — Pre-deployment profiling workflow
- [Global Planner README](../../../components/src/dynamo/global_planner/README.md) — Centralized scale execution - [Global Planner README](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/global_planner/README.md) — Centralized scale execution
- [Global Router README](../../../components/src/dynamo/global_router/README.md) — Cross-pool request routing - [Global Router README](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/global_router/README.md) — Cross-pool request routing
- [vLLM global planner example](../../../examples/global_planner/global-planner-vllm-test.yaml) — End-to-end reference manifest - [vLLM global planner example](https://github.com/ai-dynamo/dynamo/blob/main/examples/global_planner/global-planner-vllm-test.yaml) — End-to-end reference manifest
...@@ -26,7 +26,7 @@ For Kubernetes, set `DYN_ROUTER_MODE=kv` on the Frontend service. Workers automa ...@@ -26,7 +26,7 @@ For Kubernetes, set `DYN_ROUTER_MODE=kv` on the Frontend service. Workers automa
### Standalone Router ### Standalone Router
You can also run the KV router as a standalone service (without the Dynamo frontend). See the [Standalone Router component](../../../components/src/dynamo/router/) for more details. You can also run the KV router as a standalone service (without the Dynamo frontend). See the [Standalone Router component](https://github.com/ai-dynamo/dynamo/tree/main/components/src/dynamo/router/) for more details.
For all CLI arguments, environment variables, K8s deployment examples, and tuning guidelines, see the [Router Guide](router-guide.md). For A/B benchmarking, see the [KV Router A/B Benchmarking Guide](../../benchmarks/kv-router-ab-testing.md). For all CLI arguments, environment variables, K8s deployment examples, and tuning guidelines, see the [Router Guide](router-guide.md). For A/B benchmarking, see the [KV Router A/B Benchmarking Guide](../../benchmarks/kv-router-ab-testing.md).
......
...@@ -105,6 +105,6 @@ For deployments using Dynamo's KV-aware routing, the local indexer is used autom ...@@ -105,6 +105,6 @@ For deployments using Dynamo's KV-aware routing, the local indexer is used autom
## See Also ## See Also
- **[KV Router Index Data Structures](../../../lib/kv-router/src/indexer/README.md)**: `RadixTree`, `ConcurrentRadixTree`, and `PositionalIndexer` internals - **[KV Router Index Data Structures](https://github.com/ai-dynamo/dynamo/blob/main/lib/kv-router/src/indexer/README.md)**: `RadixTree`, `ConcurrentRadixTree`, and `PositionalIndexer` internals
- **[Router Guide](router-guide.md)**: Configuration, deployment, and tuning for KV-aware routing - **[Router Guide](router-guide.md)**: Configuration, deployment, and tuning for KV-aware routing
- **[Router Design](../../design-docs/router-design.md)**: Architecture details and event transport modes - **[Router Design](../../design-docs/router-design.md)**: Architecture details and event transport modes
...@@ -59,7 +59,7 @@ Disaggregated mode is activated automatically when prefill workers register alon ...@@ -59,7 +59,7 @@ Disaggregated mode is activated automatically when prefill workers register alon
| **Frontend-embedded** | `python -m dynamo.frontend --router-mode kv` | Frontend HTTP port (default 8000) | Standard deployment; router runs inside the frontend process | | **Frontend-embedded** | `python -m dynamo.frontend --router-mode kv` | Frontend HTTP port (default 8000) | Standard deployment; router runs inside the frontend process |
| **Standalone** | `python -m dynamo.router` | `DYN_SYSTEM_PORT` (if set) | Multi-tier architectures, SGLang disagg prefill routing, custom pipelines | | **Standalone** | `python -m dynamo.router` | `DYN_SYSTEM_PORT` (if set) | Multi-tier architectures, SGLang disagg prefill routing, custom pipelines |
The standalone router does not include the HTTP frontend (no `/v1/chat/completions` endpoint). It exposes only the `RouterRequestMetrics` via the system status server. See the [Standalone Router README](../../../components/src/dynamo/router/README.md). The standalone router does not include the HTTP frontend (no `/v1/chat/completions` endpoint). It exposes only the `RouterRequestMetrics` via the system status server. See the [Standalone Router README](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/router/README.md).
## Quick Start ## Quick Start
...@@ -137,7 +137,7 @@ For A/B testing and advanced K8s setup, see the [KV Router A/B Benchmarking Guid ...@@ -137,7 +137,7 @@ For A/B testing and advanced K8s setup, see the [KV Router A/B Benchmarking Guid
### Standalone Router ### Standalone Router
You can also run the KV router as a standalone service (without the Dynamo frontend) for disaggregated serving (e.g., routing to prefill workers), multi-tier architectures, or any scenario requiring intelligent KV cache-aware routing decisions. See the [Standalone Router component](../../../components/src/dynamo/router/) for more details. You can also run the KV router as a standalone service (without the Dynamo frontend) for disaggregated serving (e.g., routing to prefill workers), multi-tier architectures, or any scenario requiring intelligent KV cache-aware routing decisions. See the [Standalone Router component](https://github.com/ai-dynamo/dynamo/tree/main/components/src/dynamo/router/) for more details.
## KV Cache Routing ## KV Cache Routing
...@@ -265,7 +265,7 @@ The main KV-aware routing arguments (frontend uses the same `--router-*` flag na ...@@ -265,7 +265,7 @@ The main KV-aware routing arguments (frontend uses the same `--router-*` flag na
- `--router-prune-target-ratio`: Target size ratio to prune down to when `--router-max-tree-size` is exceeded. For example, with a value of 0.8 (default) and max tree size of 1048576, the router will prune down to approximately 838860 blocks when the threshold is exceeded. Defaults to 0.8 when `--no-router-kv-events` is used. This creates headroom before the next pruning cycle. - `--router-prune-target-ratio`: Target size ratio to prune down to when `--router-max-tree-size` is exceeded. For example, with a value of 0.8 (default) and max tree size of 1048576, the router will prune down to approximately 838860 blocks when the threshold is exceeded. Defaults to 0.8 when `--no-router-kv-events` is used. This creates headroom before the next pruning cycle.
- `--router-event-threads`: Number of event processing threads for the KV indexer (default: 4). When set to 1, the router uses a single-threaded radix tree with channel-based event processing. When set to a value greater than 1 (the default), the router uses a concurrent radix tree with a thread pool of the specified size for higher event throughput. This setting only applies when KV events are enabled (the default). When `--no-router-kv-events` is set (approximate mode), the router always uses a single-threaded indexer with TTL-based expiration and pruning regardless of this setting. Can be set via `DYN_ROUTER_EVENT_THREADS` env var. For details on the underlying index data structures (`RadixTree`, `ConcurrentRadixTree`, `PositionalIndexer`) and their concurrency model (inline reads, sticky-routed writes via thread pool), see the [KV Router Index documentation](../../../lib/kv-router/src/indexer/README.md). - `--router-event-threads`: Number of event processing threads for the KV indexer (default: 4). When set to 1, the router uses a single-threaded radix tree with channel-based event processing. When set to a value greater than 1 (the default), the router uses a concurrent radix tree with a thread pool of the specified size for higher event throughput. This setting only applies when KV events are enabled (the default). When `--no-router-kv-events` is set (approximate mode), the router always uses a single-threaded indexer with TTL-based expiration and pruning regardless of this setting. This can be set via the `DYN_ROUTER_EVENT_THREADS` environment variable. For details on the underlying index data structures (`RadixTree`, `ConcurrentRadixTree`, `PositionalIndexer`) and their concurrency model (inline reads, sticky-routed writes via thread pool), see the [KV Router Index documentation](https://github.com/ai-dynamo/dynamo/blob/main/lib/kv-router/src/indexer/README.md).
To implement KV event publishing for custom inference engines, enabling them to participate in Dynamo's KV cache-aware routing, see [KV Event Publishing for Custom Engines](../../integrations/kv-events-custom-engines.md). To implement KV event publishing for custom inference engines, enabling them to participate in Dynamo's KV cache-aware routing, see [KV Event Publishing for Custom Engines](../../integrations/kv-events-custom-engines.md).
...@@ -341,7 +341,7 @@ await register_model( ...@@ -341,7 +341,7 @@ await register_model(
await prefill_endpoint.serve_endpoint(prefill_handler.generate) await prefill_endpoint.serve_endpoint(prefill_handler.generate)
``` ```
<Note>The unified frontend with automatic prefill routing is currently enabled for vLLM and TensorRT-LLM backends. For SGLang (work in progress), you need to launch a separate standalone router as the prefill router targeting the prefill endpoints. The standalone router (`python -m dynamo.router`) uses `--router-*`-prefixed flags (e.g., `--router-block-size`, `--router-kv-events`). See the [Standalone Router README](../../../components/src/dynamo/router/README.md) and example script: [`examples/backends/sglang/launch/disagg_router.sh`](https://github.com/ai-dynamo/dynamo/blob/main/examples/backends/sglang/launch/disagg_router.sh).</Note> <Note>The unified frontend with automatic prefill routing is currently enabled for vLLM and TensorRT-LLM backends. For SGLang (work in progress), you need to launch a separate standalone router as the prefill router targeting the prefill endpoints. The standalone router (`python -m dynamo.router`) uses `--router-*`-prefixed flags (e.g., `--router-block-size`, `--router-kv-events`). See the [Standalone Router README](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/router/README.md) and example script: [`examples/backends/sglang/launch/disagg_router.sh`](https://github.com/ai-dynamo/dynamo/blob/main/examples/backends/sglang/launch/disagg_router.sh).</Note>
### Request Flow ### Request Flow
...@@ -374,7 +374,7 @@ graph TD ...@@ -374,7 +374,7 @@ graph TD
## Serving Multiple Router Replicas ## Serving Multiple Router Replicas
For improved fault tolerance, you can launch multiple frontend + router replicas. If multiple `dynamo.frontend` processes share the same host or network namespace, give each instance a different HTTP port. In Kubernetes or on separate hosts, replicas can usually reuse the same container port. Alternatively, you can deploy the router separately as the standalone `python -m dynamo.router` service; see the [Standalone Router README](../../../components/src/dynamo/router/README.md). For improved fault tolerance, you can launch multiple frontend + router replicas. If multiple `dynamo.frontend` processes share the same host or network namespace, give each instance a different HTTP port. In Kubernetes or on separate hosts, replicas can usually reuse the same container port. Alternatively, you can deploy the router separately as the standalone `python -m dynamo.router` service; see the [Standalone Router README](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/router/README.md).
### Router State Management ### Router State Management
...@@ -455,7 +455,7 @@ The cli args `--router-ttl-secs`, `--router-max-tree-size`, and `--router-prune- ...@@ -455,7 +455,7 @@ The cli args `--router-ttl-secs`, `--router-max-tree-size`, and `--router-prune-
- **[Router README](README.md)**: Quick start guide for the KV Router - **[Router README](README.md)**: Quick start guide for the KV Router
- **[Router Examples](router-examples.md)**: Python API usage, K8s examples, and custom routing patterns - **[Router Examples](router-examples.md)**: Python API usage, K8s examples, and custom routing patterns
- **[KV Router Index Data Structures](../../../lib/kv-router/src/indexer/README.md)**: `RadixTree`, `ConcurrentRadixTree`, and `PositionalIndexer` internals and concurrency model - **[KV Router Index Data Structures](https://github.com/ai-dynamo/dynamo/blob/main/lib/kv-router/src/indexer/README.md)**: `RadixTree`, `ConcurrentRadixTree`, and `PositionalIndexer` internals and concurrency model
- **[KV Event Replay — Dynamo vs vLLM](kv-event-replay-comparison.md)**: How Dynamo's local indexer compares to vLLM's replay buffer for gap detection and recovery - **[KV Event Replay — Dynamo vs vLLM](kv-event-replay-comparison.md)**: How Dynamo's local indexer compares to vLLM's replay buffer for gap detection and recovery
- **[Router Design](../../design-docs/router-design.md)**: Architecture details and event transport modes - **[Router Design](../../design-docs/router-design.md)**: Architecture details and event transport modes
- **[KV Event Publishing for Custom Engines](../../integrations/kv-events-custom-engines.md)**: Integrate custom inference engines with KV-aware routing - **[KV Event Publishing for Custom Engines](../../integrations/kv-events-custom-engines.md)**: Integrate custom inference engines with KV-aware routing
......
...@@ -12,7 +12,7 @@ The standalone KV indexer (`dynamo-kv-indexer`) is a lightweight binary that mai ...@@ -12,7 +12,7 @@ The standalone KV indexer (`dynamo-kv-indexer`) is a lightweight binary that mai
- **Standalone mode** (default): Subscribes to ZMQ KV event streams directly from workers. No Dynamo runtime dependencies required. - **Standalone mode** (default): Subscribes to ZMQ KV event streams directly from workers. No Dynamo runtime dependencies required.
- **Dynamo runtime mode** (`--dynamo-runtime`): Integrates with the Dynamo runtime for automatic worker discovery via MDC, KV event ingestion via the event plane (NATS or ZMQ), and serves indexer queries over the request plane for remote frontends. - **Dynamo runtime mode** (`--dynamo-runtime`): Integrates with the Dynamo runtime for automatic worker discovery via MDC, KV event ingestion via the event plane (NATS or ZMQ), and serves indexer queries over the request plane for remote frontends.
This is distinct from the [Standalone Router](../../../components/src/dynamo/router/README.md), which is a full routing service. The standalone indexer provides only the indexing and query layer without routing logic. This is distinct from the [Standalone Router](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/router/README.md), which is a full routing service. The standalone indexer provides only the indexing and query layer without routing logic.
The HTTP API follows the [Mooncake KV Indexer RFC](https://github.com/kvcache-ai/Mooncake/issues/1403) conventions. The HTTP API follows the [Mooncake KV Indexer RFC](https://github.com/kvcache-ai/Mooncake/issues/1403) conventions.
...@@ -511,4 +511,4 @@ sequenceDiagram ...@@ -511,4 +511,4 @@ sequenceDiagram
- **[Mooncake KV Indexer RFC](https://github.com/kvcache-ai/Mooncake/issues/1403)**: Community API standardization for KV cache indexers - **[Mooncake KV Indexer RFC](https://github.com/kvcache-ai/Mooncake/issues/1403)**: Community API standardization for KV cache indexers
- **[Router Guide](router-guide.md)**: Full KV router configuration and tuning - **[Router Guide](router-guide.md)**: Full KV router configuration and tuning
- **[Router Design](../../design-docs/router-design.md)**: Architecture and event transport modes - **[Router Design](../../design-docs/router-design.md)**: Architecture and event transport modes
- **[Standalone Router](../../../components/src/dynamo/router/README.md)**: Full routing service (routes requests to workers) - **[Standalone Router](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/router/README.md)**: Full routing service (routes requests to workers)
...@@ -271,7 +271,7 @@ navigation: ...@@ -271,7 +271,7 @@ navigation:
# ==================== Blog ==================== # ==================== Blog ====================
- section: Blog - section: Blog
hidden: true collapsed: true
path: blogs/index.mdx path: blogs/index.mdx
slug: blog slug: blog
contents: contents:
......
...@@ -32,7 +32,7 @@ docker compose -f deploy/docker-observability.yml up -d ...@@ -32,7 +32,7 @@ docker compose -f deploy/docker-observability.yml up -d
For detailed setup instructions and configuration, see [Prometheus + Grafana Setup](prometheus-grafana.md). For detailed setup instructions and configuration, see [Prometheus + Grafana Setup](prometheus-grafana.md).
## Observability Documentations ## Observability Documentation
| Guide | Description | Environment Variables to Control | | Guide | Description | Environment Variables to Control |
|-------|-------------|----------------------------------| |-------|-------------|----------------------------------|
...@@ -100,9 +100,9 @@ The following configuration files are located in the `deploy/observability/` dir ...@@ -100,9 +100,9 @@ The following configuration files are located in the `deploy/observability/` dir
- [docker-observability.yml](https://github.com/ai-dynamo/dynamo/tree/main/deploy/docker-observability.yml): Defines Prometheus, Grafana, Tempo, and exporters - [docker-observability.yml](https://github.com/ai-dynamo/dynamo/tree/main/deploy/docker-observability.yml): Defines Prometheus, Grafana, Tempo, and exporters
- [prometheus.yml](https://github.com/ai-dynamo/dynamo/tree/main/deploy/observability/prometheus.yml): Contains Prometheus scraping configuration - [prometheus.yml](https://github.com/ai-dynamo/dynamo/tree/main/deploy/observability/prometheus.yml): Contains Prometheus scraping configuration
- [grafana-datasources.yml](https://github.com/ai-dynamo/dynamo/tree/main/deploy/observability/grafana-datasources.yml): Contains Grafana datasource configuration - [grafana-datasources.yml](https://github.com/ai-dynamo/dynamo/tree/main/deploy/observability/grafana-datasources.yml): Contains Grafana datasource configuration
- [otel-collector.yaml](https://github.com/ai-dynamo/dynamo/tree/main/deploy/observability/otel-collector.yaml): OpenTelemetry Collector configuration (routes traces to Tempo, logs to Loki) - [otel-collector.yaml](https://github.com/ai-dynamo/dynamo/blob/main/deploy/observability/otel-collector.yaml): OpenTelemetry Collector configuration (routes traces to Tempo, logs to Loki)
- [loki.yaml](https://github.com/ai-dynamo/dynamo/tree/main/deploy/observability/loki.yaml): Loki log aggregation configuration - [loki.yaml](https://github.com/ai-dynamo/dynamo/blob/main/deploy/observability/loki.yaml): Loki log aggregation configuration
- [loki-datasource.yml](https://github.com/ai-dynamo/dynamo/tree/main/deploy/observability/loki-datasource.yml): Grafana Loki datasource with trace ID linking to Tempo - [loki-datasource.yml](https://github.com/ai-dynamo/dynamo/blob/main/deploy/observability/loki-datasource.yml): Grafana Loki datasource with trace ID linking to Tempo
- [grafana_dashboards/dashboard-providers.yml](https://github.com/ai-dynamo/dynamo/tree/main/deploy/observability/grafana_dashboards/dashboard-providers.yml): Contains Grafana dashboard provider configuration - [grafana_dashboards/dashboard-providers.yml](https://github.com/ai-dynamo/dynamo/tree/main/deploy/observability/grafana_dashboards/dashboard-providers.yml): Contains Grafana dashboard provider configuration
- [grafana_dashboards/dynamo.json](https://github.com/ai-dynamo/dynamo/tree/main/deploy/observability/grafana_dashboards/dynamo.json): A general Dynamo Dashboard for both SW and HW metrics - [grafana_dashboards/dynamo.json](https://github.com/ai-dynamo/dynamo/tree/main/deploy/observability/grafana_dashboards/dynamo.json): A general Dynamo Dashboard for both SW and HW metrics
- [grafana_dashboards/dcgm-metrics.json](https://github.com/ai-dynamo/dynamo/tree/main/deploy/observability/grafana_dashboards/dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics - [grafana_dashboards/dcgm-metrics.json](https://github.com/ai-dynamo/dynamo/tree/main/deploy/observability/grafana_dashboards/dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics
......
...@@ -122,10 +122,10 @@ TensorRT-LLM delivers maximum inference performance and optimization, with full ...@@ -122,10 +122,10 @@ TensorRT-LLM delivers maximum inference performance and optimization, with full
[tools]: ../user-guides/tool-calling [tools]: ../user-guides/tool-calling
{/* Multimodal */} {/* Multimodal */}
[mm]: ../user-guides/multimodality-support [mm]: ../user-guides/multimodal
[mm-vllm]: ../user-guides/multimodality-support/v-llm-multimodal [mm-vllm]: ../user-guides/multimodal/multimodal-vllm
[mm-trtllm]: ../user-guides/multimodality-support/tensor-rt-llm-multimodal [mm-trtllm]: ../user-guides/multimodal/multimodal-trtllm
[mm-sglang]: ../user-guides/multimodality-support/sg-lang-multimodal [mm-sglang]: ../user-guides/multimodal/multimodal-sglang
{/* Feature-specific */} {/* Feature-specific */}
[lora]: ../kubernetes-deployment/deployment-guide/managing-models-with-dynamo-model [lora]: ../kubernetes-deployment/deployment-guide/managing-models-with-dynamo-model
...@@ -133,4 +133,4 @@ TensorRT-LLM delivers maximum inference performance and optimization, with full ...@@ -133,4 +133,4 @@ TensorRT-LLM delivers maximum inference performance and optimization, with full
[trtllm-eagle]: ../additional-resources/tensor-rt-llm-details/llama-4-eagle [trtllm-eagle]: ../additional-resources/tensor-rt-llm-details/llama-4-eagle
{/* Dynamo Snapshot */} {/* Dynamo Snapshot */}
[snapshot]: ../kubernetes/snapshot/README.md [snapshot]: ../kubernetes-deployment/deployment-guide/snapshot
...@@ -17,7 +17,7 @@ subtitle: Hardware, software, and build compatibility for Dynamo ...@@ -17,7 +17,7 @@ subtitle: Hardware, software, and build compatibility for Dynamo
| **OS** | Ubuntu 22.04, Ubuntu 24.04, CentOS Stream 9 (experimental) | | **OS** | Ubuntu 22.04, Ubuntu 24.04, CentOS Stream 9 (experimental) |
| **Arch** | x86_64, ARM64 (ARM64 requires Ubuntu 24.04) | | **Arch** | x86_64, ARM64 (ARM64 requires Ubuntu 24.04) |
| **CUDA 12** | Container images for SGLang and vLLM (CUDA 12.9) | | **CUDA 12** | Container images for SGLang and vLLM (CUDA 12.9) |
| **CUDA 13** | Container images for TensorRT-LLM (CUDA 13.0); experimental for SGLang and vLLM in v0.8.x | | **CUDA 13** | Container images for TensorRT-LLM (CUDA 13.1), SGLang and vLLM (CUDA 13.0) |
**On this page:** [Backend Dependencies](#backend-dependencies) | [CUDA and Drivers](#cuda-and-driver-requirements) | [Hardware](#hardware-compatibility) | [Platform](#platform-architecture-compatibility) | [Cloud](#cloud-service-provider-compatibility) | [Build Support](#build-support) **On this page:** [Backend Dependencies](#backend-dependencies) | [CUDA and Drivers](#cuda-and-driver-requirements) | [Hardware](#hardware-compatibility) | [Platform](#platform-architecture-compatibility) | [Cloud](#cloud-service-provider-compatibility) | [Build Support](#build-support)
......
...@@ -34,14 +34,33 @@ redirects: ...@@ -34,14 +34,33 @@ redirects:
destination: "/dynamo/" destination: "/dynamo/"
- source: "/dynamo/latest/index.html" - source: "/dynamo/latest/index.html"
destination: "/dynamo/" destination: "/dynamo/"
- source: "/dynamo/getting-started/support-matrix" # Version-scoped getting-started → resources redirects
destination: "/dynamo/resources/support-matrix" # Only for versions where these pages moved (dev, v1.0.0, latest).
- source: "/dynamo/getting-started/feature-matrix" # Older versions (v0.9.x and below) still have pages under getting-started.
destination: "/dynamo/resources/feature-matrix" - source: "/dynamo/dev/getting-started/support-matrix"
- source: "/dynamo/getting-started/release-artifacts" destination: "/dynamo/dev/resources/support-matrix"
destination: "/dynamo/resources/release-artifacts" - source: "/dynamo/dev/getting-started/feature-matrix"
- source: "/dynamo/getting-started/examples" destination: "/dynamo/dev/resources/feature-matrix"
destination: "/dynamo/resources/examples" - source: "/dynamo/dev/getting-started/release-artifacts"
destination: "/dynamo/dev/resources/release-artifacts"
- source: "/dynamo/dev/getting-started/examples"
destination: "/dynamo/dev/resources/examples"
- source: "/dynamo/v1.0.0/getting-started/support-matrix"
destination: "/dynamo/v1.0.0/resources/support-matrix"
- source: "/dynamo/v1.0.0/getting-started/feature-matrix"
destination: "/dynamo/v1.0.0/resources/feature-matrix"
- source: "/dynamo/v1.0.0/getting-started/release-artifacts"
destination: "/dynamo/v1.0.0/resources/release-artifacts"
- source: "/dynamo/v1.0.0/getting-started/examples"
destination: "/dynamo/v1.0.0/resources/examples"
- source: "/dynamo/latest/getting-started/support-matrix"
destination: "/dynamo/latest/resources/support-matrix"
- source: "/dynamo/latest/getting-started/feature-matrix"
destination: "/dynamo/latest/resources/feature-matrix"
- source: "/dynamo/latest/getting-started/release-artifacts"
destination: "/dynamo/latest/resources/release-artifacts"
- source: "/dynamo/latest/getting-started/examples"
destination: "/dynamo/latest/resources/examples"
- source: "/dynamo/dev/user-guides/multimodal-model-serving/diffusion-experimental/:slug*" - source: "/dynamo/dev/user-guides/multimodal-model-serving/diffusion-experimental/:slug*"
destination: "/dynamo/dev/user-guides/diffusion/:slug*" destination: "/dynamo/dev/user-guides/diffusion/:slug*"
- source: "/dynamo/dev/user-guides/multimodal-model-serving/diffusion-experimental" - source: "/dynamo/dev/user-guides/multimodal-model-serving/diffusion-experimental"
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment