index.rst 5.09 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
..
    SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
    SPDX-License-Identifier: Apache-2.0

    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
    You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.

Welcome to NVIDIA Dynamo
========================

20
The NVIDIA Dynamo Platform is a high-performance, low-latency inference framework designed to serve all AI models—across any framework, architecture, or deployment scale.
21

22
23
24
.. admonition:: 💎 Discover the latest developments!
   :class: seealso

25
   This guide is a snapshot of the `Dynamo GitHub Repository <https://github.com/ai-dynamo/dynamo>`_ at a specific point in time. For the latest information and examples, see:
26
27
28
29
30
31
32

   - `Dynamo README <https://github.com/ai-dynamo/dynamo/blob/main/README.md>`_
   - `Architecture and features doc <https://github.com/ai-dynamo/dynamo/blob/main/docs/architecture/>`_
   - `Usage guides <https://github.com/ai-dynamo/dynamo/tree/main/docs/guides>`_
   - `Dynamo examples repo <https://github.com/ai-dynamo/examples>`_


33
34
35
36
37
Quick Start
-----------------
Follow the :doc:`Quick Guide to install Dynamo Platform <guides/dynamo_deploy/quickstart>`.


38
Dive in: Examples
39
-----------------
40

41
42
The examples below assume you build the latest image yourself from source. If using a prebuilt image follow the examples from the corresponding branch.

43
44
45
46
47
.. grid:: 1 2 2 2
    :gutter: 3
    :margin: 0
    :padding: 3 4 0 0

atchernych's avatar
atchernych committed
48
49
    .. grid-item-card:: :doc:`Hello World <examples/runtime/hello_world/README>`
        :link: examples/runtime/hello_world/README
50
51
        :link-type: doc

atchernych's avatar
atchernych committed
52
        Demonstrates the basic concepts of Dynamo by creating a simple GPU-unaware graph
53

atchernych's avatar
atchernych committed
54
55
    .. grid-item-card:: :doc:`LLM Serving with VLLM <components/backends/vllm/README>`
        :link: components/backends/vllm/README
56
57
        :link-type: doc

atchernych's avatar
atchernych committed
58
        Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations with VLLM.
59

atchernych's avatar
atchernych committed
60
61
    .. grid-item-card:: :doc:`Multinode with SGLang <components/backends/sglang/docs/multinode-examples>`
        :link: components/backends/sglang/docs/multinode-examples
62
63
        :link-type: doc

atchernych's avatar
atchernych committed
64
        Demonstrates disaggregated serving on several nodes.
65

atchernych's avatar
atchernych committed
66
67
    .. grid-item-card:: :doc:`TensorRT-LLM <components/backends/trtllm/README>`
        :link: components/backends/trtllm/README
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
        :link-type: doc

        Presents TensorRT-LLM examples and reference implementations for deploying Large Language Models (LLMs) in various configurations.


.. toctree::
   :hidden:

   Welcome to Dynamo <self>
   Support Matrix <support_matrix.md>

.. toctree::
   :hidden:
   :caption: Architecture & Features

   High Level Architecture <architecture/architecture.md>
   Distributed Runtime <architecture/distributed_runtime.md>
   Disaggregated Serving <architecture/disagg_serving.md>
   KV Block Manager <architecture/kvbm_intro.rst>
   KV Cache Routing <architecture/kv_cache_routing.md>
88
   Planner <architecture/planner_intro.rst>
89
   Dynamo Architecture Flow <architecture/dynamo_flow.md>
90
91
92

.. toctree::
   :hidden:
atchernych's avatar
atchernych committed
93
   :caption: Using Dynamo
94

atchernych's avatar
atchernych committed
95
96
   Running Inference Graphs Locally (dynamo-run) <guides/dynamo_run.md>
   Deploying Inference Graphs <guides/dynamo_deploy/README.md>
97
98
99
100
101
102
103
104

.. toctree::
   :hidden:
   :caption: Usage Guides

   Writing Python Workers in Dynamo <guides/backend.md>
   Disaggregation and Performance Tuning <guides/disagg_perf_tuning.md>
   KV Cache Router Performance Tuning <guides/kv_router_perf_tuning.md>
105
   Working with Dynamo Kubernetes Operator <guides/dynamo_deploy/dynamo_operator.md>
106
107
108
109
110

.. toctree::
   :hidden:
   :caption: Deployment Guides

111
   Dynamo Deploy Quickstart <guides/dynamo_deploy/quickstart.md>
112
   Dynamo Cloud Kubernetes Platform <guides/dynamo_deploy/dynamo_cloud.md>
atchernych's avatar
atchernych committed
113
   Manual Helm Deployment <deploy/helm/README.md>
114
   GKE Setup Guide <guides/dynamo_deploy/gke_setup.md>
115
   Minikube Setup Guide <guides/dynamo_deploy/minikube.md>
116
   Model Caching with Fluid <guides/dynamo_deploy/model_caching_with_fluid.md>
117

118
119
120
121
.. toctree::
   :hidden:
   :caption: Benchmarking

122
   Planner Benchmark Example <guides/planner_benchmark/README.md>
123
124


125
126
127
128
.. toctree::
   :hidden:
   :caption: API

J Wyman's avatar
J Wyman committed
129
   NIXL Connect API <API/nixl_connect/README.md>
130
131
132
133
134

.. toctree::
   :hidden:
   :caption: Examples

atchernych's avatar
atchernych committed
135
136
137
138
   Hello World <examples/runtime/hello_world/README.md>
   LLM Deployment Examples using VLLM <components/backends/vllm/README.md>
   Multinode Examples using SGLang <components/backends/sglang/docs/multinode-examples.md>
   LLM Deployment Examples using TensorRT-LLM <components/backends/trtllm/README.md>
139

140
141
142
143
.. toctree::
   :hidden:
   :caption: Reference

atchernych's avatar
atchernych committed
144

145
146
   Glossary <dynamo_glossary.md>
   KVBM Reading <architecture/kvbm_reading.md>
147
148