README.md 4.19 KB
Newer Older
Neelay Shah's avatar
Neelay Shah committed
1
<!--
Neelay Shah's avatar
Neelay Shah committed
2
SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Neelay Shah's avatar
Neelay Shah committed
3
SPDX-License-Identifier: Apache-2.0
4
5
6
7
8
9
10
11
12
13
14
15

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Neelay Shah's avatar
Neelay Shah committed
16
17
-->

Neelay Shah's avatar
Neelay Shah committed
18
# Dynamo
Neelay Shah's avatar
Neelay Shah committed
19
20
21
22

<h4> A Datacenter Scale Distributed Inference Serving Framework </h4>

[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
Meenakshi Sharma's avatar
Meenakshi Sharma committed
23
[![GitHub Release](https://img.shields.io/github/v/release/ai-dynamo/dynamo)](https://github.com/ai-dynamo/dynamo/releases/latest)
24

Neelay Shah's avatar
Neelay Shah committed
25

Neelay Shah's avatar
Neelay Shah committed
26
Dynamo is a flexible, component based, data center scale
Neelay Shah's avatar
Neelay Shah committed
27
inference serving framework designed to leverage the strengths of the
Neelay Shah's avatar
Neelay Shah committed
28
standalone Dynamo Inference Server while expanding its capabilities
Neelay Shah's avatar
Neelay Shah committed
29
30
31
32
33
to meet the demands of complex use cases including those of Generative
AI. It is designed to enable developers to implement and customize
routing, load balancing, scaling and workflow definitions at the data
center scale without sacrificing performance or ease of use.

34
> [!NOTE]
Neelay Shah's avatar
Neelay Shah committed
35
36
37
38
> This project is currently in the alpha / experimental /
> rapid-prototyping stage and we are actively looking for feedback and
> collaborators.

Neelay Shah's avatar
Neelay Shah committed
39
## Building Dynamo
40

41
### Requirements
Neelay Shah's avatar
Neelay Shah committed
42
Dynamo development and examples are container based.
43

44
45
46
47
48
* [Docker](https://docs.docker.com/get-started/get-docker/)
* [buildx](https://github.com/docker/buildx)

### Development

Neelay Shah's avatar
Neelay Shah committed
49
You can build the Dynamo container using the build scripts
50
in `container/` (or directly with `docker build`).
51

52
We provide 3 types of builds:
53
54
55

1. `STANDARD` which includes our default set of backends (onnx, openvino...)
2. `TENSORRTLLM` which includes our TRT-LLM backend
56
57
3. `VLLM` which includes our VLLM backend using NCCL communication library.
4. `VLLM_NIXL` which includes our VLLM backend using new NIXL communication library.
58

Neelay Shah's avatar
Neelay Shah committed
59
For example, if you want to build a container for the `STANDARD` backends you can run
60

Neelay Shah's avatar
Neelay Shah committed
61
`./container/build.sh`
62
63
64

Please see the instructions in the corresponding example for specific build instructions.

Neelay Shah's avatar
Neelay Shah committed
65
## Running Dynamo for Local Testing and Development
66

Neelay Shah's avatar
Neelay Shah committed
67
You can run the Dynamo container using the run scripts in
68
69
70
71
72
73
74
`container/` (or directly with `docker run`).

The run script offers a few common workflows:

1. Running a command in a container and exiting.

```
Neelay Shah's avatar
Neelay Shah committed
75
./container/run.sh -- python3 -c "import dynamo.runtime; help(dynamo.runtime)"
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
```

2. Starting an interactive shell.
```
./container/run.sh -it
```

3. Mounting the local workspace and Starting an interactive shell.

```
./container/run.sh -it --mount-workspace
```

The last command also passes common environment variables ( ```-e
HF_TOKEN```) and mounts common directories such as ```/tmp:/tmp```,
```/mnt:/mnt```.

Please see the instructions in the corresponding example for specific
deployment instructions.
Neelay Shah's avatar
Neelay Shah committed
95

Meenakshi Sharma's avatar
Meenakshi Sharma committed
96
## Rust Based Runtime
Neelay Shah's avatar
Neelay Shah committed
97

Neelay Shah's avatar
Neelay Shah committed
98
Dynamo has a new rust based distributed runtime with
Neelay Shah's avatar
Neelay Shah committed
99
100
101
102
103
104
implementation under development. The rust based runtime enables
serving arbitrary python code as well as native rust. Please note the
APIs are subject to change.

### Hello World

Neelay Shah's avatar
Neelay Shah committed
105
[Hello World](./lib/bindings/python/examples/hello_world)
Neelay Shah's avatar
Neelay Shah committed
106
107
108
109
110
111
112
113
114
115
116

A basic example demonstrating the rust based runtime and python
bindings.

### LLM

[VLLM](./examples/python_rs/llm/vllm)

An intermediate example expanding further on the concepts introduced
in the Hello World example. In this example, we demonstrate
[Disaggregated Serving](https://arxiv.org/abs/2401.09670) as an
Neelay Shah's avatar
Neelay Shah committed
117
application of the components defined in Dynamo.
Neelay Shah's avatar
Neelay Shah committed
118

Neelay Shah's avatar
Neelay Shah committed
119
120
121
122
123
124
125
126
127
128
129
# Disclaimers

> [!NOTE]
> This project is currently in the alpha / experimental /
> rapid-prototyping stage and we will be adding new features incrementally.

1. The `TENSORRTLLM` and `VLLM` containers are WIP and not expected to
   work out of the box.

2. Testing has primarily been on single node systems with processes
   launched within a single container.