index.rst 5.92 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
..
    SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
    SPDX-License-Identifier: Apache-2.0

    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
    You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.

Welcome to NVIDIA Dynamo
========================

20
The NVIDIA Dynamo Platform is a high-performance, low-latency inference framework designed to serve all AI models—across any framework, architecture, or deployment scale.
21

22
23
24
.. admonition:: 💎 Discover the latest developments!
   :class: seealso

25
   This guide is a snapshot of the `Dynamo GitHub Repository <https://github.com/ai-dynamo/dynamo>`_ at a specific point in time. For the latest information and examples, see:
26
27
28
29

   - `Dynamo README <https://github.com/ai-dynamo/dynamo/blob/main/README.md>`_
   - `Architecture and features doc <https://github.com/ai-dynamo/dynamo/blob/main/docs/architecture/>`_
   - `Usage guides <https://github.com/ai-dynamo/dynamo/tree/main/docs/guides>`_
30
   - `Dynamo examples repo <https://github.com/ai-dynamo/dynamo/tree/main/examples>`_
31
32


33
34
Quick Start
-----------------
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83

Local Deployment
~~~~~~~~~~~~~~~~

Get started with Dynamo locally in just a few commands:

**1. Install Dynamo**

.. code-block:: bash

   # Install uv (recommended Python package manager)
   curl -LsSf https://astral.sh/uv/install.sh | sh

   # Create virtual environment and install Dynamo
   uv venv venv
   source venv/bin/activate
   uv pip install "ai-dynamo[sglang]"  # or [vllm], [trtllm]

**2. Start etcd/NATS**

.. code-block:: bash

   # Start etcd and NATS using Docker Compose
   docker compose -f deploy/docker-compose.yml up -d

**3. Run Dynamo**

.. code-block:: bash

   # Start the OpenAI compatible frontend
   python -m dynamo.frontend

   # In another terminal, start an SGLang worker
   python -m dynamo.sglang.worker deepseek-ai/DeepSeek-R1-Distill-Llama-8B

**4. Test your deployment**

.. code-block:: bash

   curl localhost:8080/v1/chat/completions \
     -H "Content-Type: application/json" \
     -d '{"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
          "messages": [{"role": "user", "content": "Hello!"}],
          "max_tokens": 50}'

Kubernetes Deployment
~~~~~~~~~~~~~~~~~~~~~

For deployments on Kubernetes, follow the :doc:`Dynamo Platform Quickstart Guide <guides/dynamo_deploy/quickstart>`.
84
85


86
Dive in: Examples
87
-----------------
88

89
90
The examples below assume you build the latest image yourself from source. If using a prebuilt image follow the examples from the corresponding branch.

91
92
93
94
95
.. grid:: 1 2 2 2
    :gutter: 3
    :margin: 0
    :padding: 3 4 0 0

atchernych's avatar
atchernych committed
96
97
    .. grid-item-card:: :doc:`Hello World <examples/runtime/hello_world/README>`
        :link: examples/runtime/hello_world/README
98
99
        :link-type: doc

atchernych's avatar
atchernych committed
100
        Demonstrates the basic concepts of Dynamo by creating a simple GPU-unaware graph
101

atchernych's avatar
atchernych committed
102
103
    .. grid-item-card:: :doc:`LLM Serving with VLLM <components/backends/vllm/README>`
        :link: components/backends/vllm/README
104
105
        :link-type: doc

atchernych's avatar
atchernych committed
106
        Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations with VLLM.
107

atchernych's avatar
atchernych committed
108
109
    .. grid-item-card:: :doc:`Multinode with SGLang <components/backends/sglang/docs/multinode-examples>`
        :link: components/backends/sglang/docs/multinode-examples
110
111
        :link-type: doc

atchernych's avatar
atchernych committed
112
        Demonstrates disaggregated serving on several nodes.
113

atchernych's avatar
atchernych committed
114
115
    .. grid-item-card:: :doc:`TensorRT-LLM <components/backends/trtllm/README>`
        :link: components/backends/trtllm/README
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
        :link-type: doc

        Presents TensorRT-LLM examples and reference implementations for deploying Large Language Models (LLMs) in various configurations.


.. toctree::
   :hidden:

   Welcome to Dynamo <self>
   Support Matrix <support_matrix.md>

.. toctree::
   :hidden:
   :caption: Architecture & Features

   High Level Architecture <architecture/architecture.md>
   Distributed Runtime <architecture/distributed_runtime.md>
   Disaggregated Serving <architecture/disagg_serving.md>
   KV Block Manager <architecture/kvbm_intro.rst>
   KV Cache Routing <architecture/kv_cache_routing.md>
136
   Planner <architecture/planner_intro.rst>
137
   Dynamo Architecture Flow <architecture/dynamo_flow.md>
138
139
140

.. toctree::
   :hidden:
atchernych's avatar
atchernych committed
141
   :caption: Using Dynamo
142
143
144

   Writing Python Workers in Dynamo <guides/backend.md>
   Disaggregation and Performance Tuning <guides/disagg_perf_tuning.md>
145
   Working with Dynamo Kubernetes Operator <guides/dynamo_deploy/dynamo_operator.md>
146
147
148
149
150

.. toctree::
   :hidden:
   :caption: Deployment Guides

151
   Dynamo Deploy Quickstart <guides/dynamo_deploy/quickstart.md>
152
   Dynamo Cloud Kubernetes Platform <guides/dynamo_deploy/dynamo_cloud.md>
153
   Manual Helm Deployment <guides/dynamo_deploy/helm_install.md>
154
   Minikube Setup Guide <guides/dynamo_deploy/minikube.md>
155
   Model Caching with Fluid <guides/dynamo_deploy/model_caching_with_fluid.md>
156
157
158
159
160

.. toctree::
   :hidden:
   :caption: Examples

atchernych's avatar
atchernych committed
161
162
   Hello World <examples/runtime/hello_world/README.md>
   LLM Deployment Examples using VLLM <components/backends/vllm/README.md>
163
   LLM Deployment Examples using SGLang <components/backends/sglang/README.md>
atchernych's avatar
atchernych committed
164
   Multinode Examples using SGLang <components/backends/sglang/docs/multinode-examples.md>
165
   Planner Benchmark Example <guides/planner_benchmark/README.md>
atchernych's avatar
atchernych committed
166
   LLM Deployment Examples using TensorRT-LLM <components/backends/trtllm/README.md>
167

168
169
170
171
.. toctree::
   :hidden:
   :caption: Reference

atchernych's avatar
atchernych committed
172

173
   Glossary <dynamo_glossary.md>
174
   NIXL Connect API <API/nixl_connect/README.md>
175
   KVBM Reading <architecture/kvbm_reading.md>
176
177