Unverified Commit d14d6ff4 authored by dagil-nvidia's avatar dagil-nvidia Committed by GitHub
Browse files

docs: improve Fern sidebar titles, add guide subtitles, and replace ASCII diagrams with D2 (#6069)


Signed-off-by: default avatarDan Gil <dagil@nvidia.com>
Signed-off-by: default avatardagil-nvidia <dagil@nvidia.com>
Co-authored-by: default avatarCursor <cursoragent@cursor.com>
Co-authored-by: default avatarJonathan Tong <jt572@cornell.edu>
parent d48de155
direction: down
coord: "Coordination Layer" {
style.font-size: 32
direction: right
sd: "Service Discovery" {
style.font-size: 28
k8s: "K8s: CRDs + API" {
style.font-size: 22
shape: rectangle
}
bm: "Bare metal: etcd" {
style.font-size: 22
shape: rectangle
}
}
nats: "NATS (Optional)" {
style.font-size: 28
kv: "KV Cache Events" {
style.font-size: 22
shape: rectangle
}
rs: "Router Replica Sync" {
style.font-size: 22
shape: rectangle
}
js: "JetStream Persistence" {
style.font-size: 22
shape: rectangle
}
}
}
frontend: Frontend {
style.font-size: 28
shape: rectangle
}
plan: Planner {
style.font-size: 28
shape: rectangle
}
worker: Worker {
style.font-size: 28
shape: rectangle
}
coord -> frontend
coord -> plan
coord -> worker
direction: right
dr: "DistributedRuntime" {
style.font-size: 28
ns: "• Namespace" {
style.font-size: 22
shape: text
}
comp: "• Components" {
style.font-size: 22
shape: text
}
ep: "• Endpoints" {
style.font-size: 22
shape: text
}
}
lease: "Primary Lease\nTTL: 10s" {
style.font-size: 24
shape: rectangle
style.bold: true
}
etcd: etcd {
style.font-size: 28
shape: cylinder
}
dr -> lease
lease -> etcd: "Keep-Alive\nHeartbeat" {
style.font-size: 22
}
direction: down
planner: "Planner Component" {
style.font-size: 32
inputs: {
direction: right
style.border-radius: 8
mc: "Metric Collector\n(Prometheus)" {
style.font-size: 24
shape: rectangle
}
lp: "Load Predictor\n(ARIMA / Kalman / Prophet)" {
style.font-size: 24
shape: rectangle
}
pi: "Performance Interpolator\n(NPZ profiling data)" {
style.font-size: 24
shape: rectangle
}
}
sa: "Scaling Algorithm" {
style.font-size: 28
shape: rectangle
style.bold: true
}
connector: "Connector Layer" {
style.font-size: 28
direction: right
kc: "KubernetesConnector\n(PATCH DGD)" {
style.font-size: 24
shape: rectangle
}
vc: "VirtualConnector\n(Runtime bridge)" {
style.font-size: 24
shape: rectangle
}
}
inputs.mc -> sa
inputs.lp -> sa
inputs.pi -> sa
sa -> connector
}
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
--- ---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0 # SPDX-License-Identifier: Apache-2.0
subtitle: Connect Dynamo to external tools and services using function calling
--- ---
You can connect Dynamo to external tools and services using function calling (also known as tool calling). By providing a list of available functions, Dynamo can choose You can connect Dynamo to external tools and services using function calling (also known as tool calling). By providing a list of available functions, Dynamo can choose
......
--- ---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0 # SPDX-License-Identifier: Apache-2.0
subtitle: Benchmark and compare performance across Dynamo deployment configurations
--- ---
......
--- ---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0 # SPDX-License-Identifier: Apache-2.0
subtitle: Enable KV offloading using KV Block Manager (KVBM) for Dynamo deployments
--- ---
# KVBM Guide # KVBM Guide
......
--- ---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0 # SPDX-License-Identifier: Apache-2.0
subtitle: Enable KV-aware routing using Router for Dynamo deployments
--- ---
# Router Guide # Router Guide
......
...@@ -18,26 +18,7 @@ Dynamo's coordination layer adapts to the deployment environment: ...@@ -18,26 +18,7 @@ Dynamo's coordination layer adapts to the deployment environment:
> **Note:** The runtime always defaults to `kv_store` (etcd) for service discovery. Kubernetes deployments must explicitly set `DYN_DISCOVERY_BACKEND=kubernetes` - the Dynamo operator handles this automatically. > **Note:** The runtime always defaults to `kv_store` (etcd) for service discovery. Kubernetes deployments must explicitly set `DYN_DISCOVERY_BACKEND=kubernetes` - the Dynamo operator handles this automatically.
``` ![Coordination Layer showing Service Discovery and NATS connecting to Frontend, Planner, and Worker](/assets/img/event-plane-coordination.svg)
┌─────────────────────────────────────────────────────────────────────┐
│ Coordination Layer │
│ │
│ ┌─────────────────────────┐ ┌─────────────────────────────────┐ │
│ │ Service Discovery │ │ NATS │ │
│ │ │ │ (Optional) │ │
│ │ • K8s: CRDs + API │ │ • KV Cache Events │ │
│ │ • Bare metal: etcd │ │ • Router Replica Sync │ │
│ │ │ │ • JetStream Persistence │ │
│ └─────────────────────────┘ └─────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
│ │
┌──────────┴──────────┐ ┌─────────┴──────────┐
▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│Frontend │ │ Planner │ │ Worker │
└─────────┘ └─────────┘ └─────────┘
```
## Kubernetes-Native Service Discovery ## Kubernetes-Native Service Discovery
...@@ -102,19 +83,7 @@ export ETCD_ENDPOINTS=http://etcd-0:2379,http://etcd-1:2379,http://etcd-2:2379 ...@@ -102,19 +83,7 @@ export ETCD_ENDPOINTS=http://etcd-0:2379,http://etcd-1:2379,http://etcd-2:2379
Each `DistributedRuntime` maintains a primary lease with etcd: Each `DistributedRuntime` maintains a primary lease with etcd:
``` ![DistributedRuntime connected to Primary Lease with Keep-Alive Heartbeat to etcd](/assets/img/event-plane-lease.svg)
┌────────────────────┐ ┌──────────────┐
│ DistributedRuntime │◄────────│ Primary Lease │
│ │ │ TTL: 10s │
│ • Namespace │ └───────┬───────┘
│ • Components │ │
│ • Endpoints │ │ Keep-Alive
│ │ │ Heartbeat
└────────────────────┘ ▼
┌──────────────┐
│ etcd │
└──────────────┘
```
**Lease Lifecycle:** **Lease Lifecycle:**
......
...@@ -13,30 +13,7 @@ The Planner is Dynamo's autoscaling controller. It observes system metrics, pred ...@@ -13,30 +13,7 @@ The Planner is Dynamo's autoscaling controller. It observes system metrics, pred
## Architecture ## Architecture
```text ![Planner architecture showing Metric Collector, Load Predictor, and Performance Interpolator feeding into the Scaling Algorithm and Connector Layer](../assets/img/planner-architecture.svg)
┌──────────────────────────────────────────────────────────┐
│ Planner Component │
│ │
│ ┌───────────────┐ ┌───────────────┐ ┌────────────────┐ │
│ │ Metric │ │ Load │ │ Performance │ │
│ │ Collector │ │ Predictor │ │ Interpolator │ │
│ │ (Prometheus) │ │ (ARIMA/etc.) │ │ (JSON data) │ │
│ └───────┬───────┘ └───────┬───────┘ └───────┬────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Scaling Algorithm │ │
│ └───────────────────────┬───────────────────────────┘ │
│ │ │
│ ┌───────────────────────▼───────────────────────────┐ │
│ │ Connector Layer │ │
│ │ ┌───────────────────┐ ┌───────────────────────┐ │ │
│ │ │ KubernetesConn. │ │ VirtualConn. │ │ │
│ │ │ (PATCH DGD) │ │ (Runtime bridge) │ │ │
│ │ └───────────────────┘ └───────────────────────┘ │ │
│ └───────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
```
## Scaling Algorithm ## Scaling Algorithm
......
--- ---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0 # SPDX-License-Identifier: Apache-2.0
subtitle: Create custom Python workers and engines for Dynamo
--- ---
# Writing Python Workers in Dynamo # Writing Python Workers in Dynamo
......
--- ---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0 # SPDX-License-Identifier: Apache-2.0
subtitle: Handle failures gracefully with request migration, cancellation, and graceful shutdown
--- ---
# Fault Tolerance # Fault Tolerance
......
--- ---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0 # SPDX-License-Identifier: Apache-2.0
subtitle: Find optimal prefill/decode configuration for disaggregated serving deployments
--- ---
# Disaggregated Serving Guide # Disaggregated Serving Guide
......
--- ---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0 # SPDX-License-Identifier: Apache-2.0
subtitle: Serve fine-tuned LoRA adapters with dynamic loading and routing in Dynamo
--- ---
# LoRA Adapters # LoRA Adapters
......
--- ---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0 # SPDX-License-Identifier: Apache-2.0
subtitle: Deploy multimodal models with image, video, and audio support in Dynamo
--- ---
# Multimodal Inference in Dynamo # Multimodal Inference in Dynamo
......
--- ---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0 # SPDX-License-Identifier: Apache-2.0
subtitle: Monitor Dynamo deployments with metrics, logging, and tracing
--- ---
# Dynamo Observability # Dynamo Observability
......
...@@ -78,11 +78,11 @@ navigation: ...@@ -78,11 +78,11 @@ navigation:
contents: contents:
- page: KV Cache Aware Routing - page: KV Cache Aware Routing
path: ../pages/components/router/router-guide.md path: ../pages/components/router/router-guide.md
- page: Disaggregated Serving Guide - page: Disaggregated Serving
path: ../pages/features/disaggregated-serving/README.md path: ../pages/features/disaggregated-serving/README.md
- page: KV Cache Offloading - page: KV Cache Offloading
path: ../pages/components/kvbm/kvbm-guide.md path: ../pages/components/kvbm/kvbm-guide.md
- page: Dynamo Benchmarking Guide - page: Dynamo Benchmarking
path: ../pages/benchmarks/benchmarking.md path: ../pages/benchmarks/benchmarking.md
- section: Multimodality Support - section: Multimodality Support
path: ../pages/features/multimodal/README.md path: ../pages/features/multimodal/README.md
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment