index.rst 2.98 KB
Newer Older
1
..
2
    SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3
4
5
6
7
8
9
10
11
12
13
14
15
16
    SPDX-License-Identifier: Apache-2.0

    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
    You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.

17
18
19
20
..
   Main Page
..

21
22
23
Welcome to NVIDIA Dynamo
========================

24
The NVIDIA Dynamo Platform is a high-performance, low-latency inference framework designed to serve all AI models—across any framework, architecture, or deployment scale.
25

26
27
28
.. admonition:: 💎 Discover the latest developments!
   :class: seealso

29
   This guide is a snapshot of a specific point in time. For the latest information, examples, and Release Assets, see the `Dynamo GitHub repository <https://github.com/ai-dynamo/dynamo/releases/latest>`_.
30

31
32
33
Quickstart
==========
.. include:: _includes/quick_start_local.rst
34

35
36
37
..
   Sidebar
..
38
39
40

.. toctree::
   :hidden:
41
   :caption: Getting Started
42

43
   Quickstart <self>
44
   Support Matrix <reference/support-matrix.md>
45
46
   Feature Matrix <reference/feature-matrix.md>
   Release Artifacts <reference/release-artifacts.md>
47
   Examples <_sections/examples>
48
49
50

.. toctree::
   :hidden:
51
   :caption: Kubernetes Deployment
52

53
54
55
56
57
58
59
60
   Deployment Guide <_sections/k8s_deployment>
   Observability (K8s) <_sections/k8s_observability>
   Multinode <_sections/k8s_multinode>

.. toctree::
   :hidden:
   :caption: User Guides

61
   KV Cache Offloading <kvbm/kvbm_guide.md>
62
   Tool Calling <agents/tool-calling.md>
63
   Multimodality Support <features/multimodal/README.md>
64
   LoRA Adapters <features/lora/README.md>
65
66
67
68
69
   Finding Best Initial Configs <performance/aiconfigurator.md>
   Benchmarking <benchmarks/benchmarking.md>
   Tuning Disaggregated Performance <performance/tuning.md>
   Writing Python Workers in Dynamo <development/backend-guide.md>
   Observability (Local) <_sections/observability>
70
   Fault Tolerance <_sections/fault_tolerance>
71
   Glossary <reference/glossary.md>
72
73
74

.. toctree::
   :hidden:
75
   :caption: Components
76

77
   Backends <_sections/backends>
78
   Frontends <_sections/frontends>
79
80
   Router <router/README>
   Planner <planner/planner_intro>
81
   Profiler <components/profiler/README>
82
   KVBM <kvbm/kvbm_intro>
83
84
85

.. toctree::
   :hidden:
86
87
88
89
90
91
   :caption: Design Docs

   Overall Architecture <design_docs/architecture.md>
   Architecture Flow <design_docs/dynamo_flow.md>
   Disaggregated Serving <design_docs/disagg_serving.md>
   Distributed Runtime <design_docs/distributed_runtime.md>
92
93
   Request Plane <design_docs/request_plane.md>
   Event Plane <design_docs/event_plane.md>
94
   Planner Design <design_docs/planner_design.md>