index.rst 5.38 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
..
    SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
    SPDX-License-Identifier: Apache-2.0

    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
    You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.

Welcome to NVIDIA Dynamo
========================

20
The NVIDIA Dynamo Platform is a high-performance, low-latency inference framework designed to serve all AI models—across any framework, architecture, or deployment scale.
21

22
23
24
.. admonition:: 💎 Discover the latest developments!
   :class: seealso

25
   This guide is a snapshot of the `Dynamo GitHub Repository <https://github.com/ai-dynamo/dynamo>`_ at a specific point in time. For the latest information and examples, see:
26
27
28
29
30
31
32

   - `Dynamo README <https://github.com/ai-dynamo/dynamo/blob/main/README.md>`_
   - `Architecture and features doc <https://github.com/ai-dynamo/dynamo/blob/main/docs/architecture/>`_
   - `Usage guides <https://github.com/ai-dynamo/dynamo/tree/main/docs/guides>`_
   - `Dynamo examples repo <https://github.com/ai-dynamo/examples>`_


33
34
35
36
37
Quick Start
-----------------
Follow the :doc:`Quick Guide to install Dynamo Platform <guides/dynamo_deploy/quickstart>`.


38
Dive in: Examples
39
-----------------
40

41
42
The examples below assume you build the latest image yourself from source. If using a prebuilt image follow the examples from the corresponding branch.

43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
.. grid:: 1 2 2 2
    :gutter: 3
    :margin: 0
    :padding: 3 4 0 0

    .. grid-item-card:: :doc:`Hello World </examples/hello_world>`
        :link: /examples/hello_world
        :link-type: doc

        Demonstrates the basic concepts of Dynamo by creating a simple multi-service pipeline.

    .. grid-item-card:: :doc:`LLM Deployment </examples/llm_deployment>`
        :link: /examples/llm_deployment
        :link-type: doc

        Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations.

    .. grid-item-card:: :doc:`Multinode </examples/multinode>`
        :link: /examples/multinode
        :link-type: doc

        Demonstrates deployment for disaggregated serving on 3 nodes using `nvidia/Llama-3.1-405B-Instruct-FP8`.

    .. grid-item-card:: :doc:`TensorRT-LLM </examples/trtllm>`
        :link: /examples/trtllm
        :link-type: doc

        Presents TensorRT-LLM examples and reference implementations for deploying Large Language Models (LLMs) in various configurations.


.. toctree::
   :hidden:

   Welcome to Dynamo <self>
   Support Matrix <support_matrix.md>
   Getting Started <get_started.md>

.. toctree::
   :hidden:
   :caption: Architecture & Features

   High Level Architecture <architecture/architecture.md>
   Distributed Runtime <architecture/distributed_runtime.md>
   Disaggregated Serving <architecture/disagg_serving.md>
   KV Block Manager <architecture/kvbm_intro.rst>
   KV Cache Routing <architecture/kv_cache_routing.md>
89
   Planner <architecture/planner_intro.rst>
90
   Dynamo Architecture Flow <architecture/dynamo_flow.md>
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108

.. toctree::
   :hidden:
   :caption: Dynamo Command Line Interface

   CLI Overview <guides/cli_overview.md>
   Running Dynamo (dynamo run) <guides/dynamo_run.md>
   Serving Inference Graphs (dynamo serve) <guides/dynamo_serve.md>
   Building Dynamo (dynamo build) <guides/dynamo_build.md>
   Deploying Inference Graphs (dynamo deploy) <guides/dynamo_deploy/README.md>

.. toctree::
   :hidden:
   :caption: Usage Guides

   Writing Python Workers in Dynamo <guides/backend.md>
   Disaggregation and Performance Tuning <guides/disagg_perf_tuning.md>
   KV Cache Router Performance Tuning <guides/kv_router_perf_tuning.md>
109
   Working with Dynamo Kubernetes Operator <guides/dynamo_deploy/dynamo_operator.md>
110
111
112
113
114

.. toctree::
   :hidden:
   :caption: Deployment Guides

115
   Dynamo Deploy Quickstart <guides/dynamo_deploy/quickstart.md>
116
117
118
   Dynamo Cloud Kubernetes Platform <guides/dynamo_deploy/dynamo_cloud.md>
   Deploying Dynamo Inference Graphs to Kubernetes using the Dynamo Cloud Platform <guides/dynamo_deploy/operator_deployment.md>
   Manual Helm Deployment <guides/dynamo_deploy/manual_helm_deployment.md>
119
   GKE Setup Guide <guides/dynamo_deploy/gke_setup.md>
120
   Minikube Setup Guide <guides/dynamo_deploy/minikube.md>
121
   Model Caching with Fluid <guides/dynamo_deploy/model_caching_with_fluid.md>
122

123
124
125
126
.. toctree::
   :hidden:
   :caption: Benchmarking

127
   Planner Benchmark Example <guides/planner_benchmark/README.md>
128
129


130
131
132
133
134
135
136
137
138
139
140
.. toctree::
   :hidden:
   :caption: API

   SDK Reference <API/sdk.md>
   Python API <API/python_bindings.md>

.. toctree::
   :hidden:
   :caption: Examples

141
142
   Hello World Example: Basic <examples/hello_world.md>
   Hello World Example: Aggregated and Disaggregated Deployment <examples/disagg_skeleton.md>
143
144
145
146
   LLM Deployment Examples <examples/llm_deployment.md>
   Multinode Examples <examples/multinode.md>
   LLM Deployment Examples using TensorRT-LLM <examples/trtllm.md>

147
148
149
150
151
152
.. toctree::
   :hidden:
   :caption: Reference

   Glossary <dynamo_glossary.md>
   KVBM Reading <architecture/kvbm_reading.md>
153
154