README.md 4.36 KB
Newer Older
Neelay Shah's avatar
Neelay Shah committed
1
<!--
Neelay Shah's avatar
Neelay Shah committed
2
SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Neelay Shah's avatar
Neelay Shah committed
3
SPDX-License-Identifier: Apache-2.0
4
5
6
7
8
9
10
11
12
13
14
15

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Neelay Shah's avatar
Neelay Shah committed
16
17
-->

Neelay Shah's avatar
Neelay Shah committed
18
# Dynamo
Neelay Shah's avatar
Neelay Shah committed
19
20
21
22

<h4> A Datacenter Scale Distributed Inference Serving Framework </h4>

[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
Meenakshi Sharma's avatar
Meenakshi Sharma committed
23
[![GitHub Release](https://img.shields.io/github/v/release/ai-dynamo/dynamo)](https://github.com/ai-dynamo/dynamo/releases/latest)
24

Neelay Shah's avatar
Neelay Shah committed
25

26
27
28
29
30
31
Dynamo is a flexible, component based, data center scale inference
serving framework designed to meet the demands of complex use cases
including those of Generative AI. It is designed to enable developers
to implement and customize routing, load balancing, scaling and
workflow definitions at the data center scale without sacrificing
performance or ease of use.
Neelay Shah's avatar
Neelay Shah committed
32

33
> [!NOTE]
Neelay Shah's avatar
Neelay Shah committed
34
35
36
37
> This project is currently in the alpha / experimental /
> rapid-prototyping stage and we are actively looking for feedback and
> collaborators.

Neelay Shah's avatar
Neelay Shah committed
38
## Building Dynamo
39

40
### Requirements
Neelay Shah's avatar
Neelay Shah committed
41
Dynamo development and examples are container based.
42

43
44
45
46
47
* [Docker](https://docs.docker.com/get-started/get-docker/)
* [buildx](https://github.com/docker/buildx)

### Development

Neelay Shah's avatar
Neelay Shah committed
48
You can build the Dynamo container using the build scripts
49
in `container/` (or directly with `docker build`).
50

51
We provide 3 types of builds:
52
53
54

1. `STANDARD` which includes our default set of backends (onnx, openvino...)
2. `TENSORRTLLM` which includes our TRT-LLM backend
55
56
3. `VLLM` which includes our VLLM backend using NCCL communication library.
4. `VLLM_NIXL` which includes our VLLM backend using new NIXL communication library.
57

Neelay Shah's avatar
Neelay Shah committed
58
For example, if you want to build a container for the `STANDARD` backends you can run
59

60
61
62
63
<!--pytest.mark.skip-->
```bash
./container/build.sh
```
64
65
66

Please see the instructions in the corresponding example for specific build instructions.

Neelay Shah's avatar
Neelay Shah committed
67
## Running Dynamo for Local Testing and Development
68

Neelay Shah's avatar
Neelay Shah committed
69
You can run the Dynamo container using the run scripts in
70
71
72
73
74
75
`container/` (or directly with `docker run`).

The run script offers a few common workflows:

1. Running a command in a container and exiting.

76
77
<!--pytest.mark.skip-->
```bash
Neelay Shah's avatar
Neelay Shah committed
78
./container/run.sh -- python3 -c "import dynamo.runtime; help(dynamo.runtime)"
79
```
80
<!--
81

82
83
84
85
86
# This tests the above the line but from within the container
# using pytest-codeblocks

```bash
python3 -c "import dynamo.runtime; help(dynamo.runtime)"
87
```
88
89
90
91
92
93
-- >

2. Starting an interactive shell.

<!--pytest.mark.skip-->
```bash
94
95
96
97
98
./container/run.sh -it
```

3. Mounting the local workspace and Starting an interactive shell.

99
100
<!--pytest.mark.skip-->
```bash
101
102
103
./container/run.sh -it --mount-workspace
```

104
105
106
The last command also passes common environment variables ( `-e
HF_TOKEN` ) and mounts common directories such as `/tmp:/tmp`,
`/mnt:/mnt`.
107
108
109

Please see the instructions in the corresponding example for specific
deployment instructions.
Neelay Shah's avatar
Neelay Shah committed
110

Meenakshi Sharma's avatar
Meenakshi Sharma committed
111
## Rust Based Runtime
Neelay Shah's avatar
Neelay Shah committed
112

Neelay Shah's avatar
Neelay Shah committed
113
Dynamo has a new rust based distributed runtime with
Neelay Shah's avatar
Neelay Shah committed
114
115
116
117
118
119
implementation under development. The rust based runtime enables
serving arbitrary python code as well as native rust. Please note the
APIs are subject to change.

### Hello World

Neelay Shah's avatar
Neelay Shah committed
120
[Hello World](./lib/bindings/python/examples/hello_world)
Neelay Shah's avatar
Neelay Shah committed
121
122
123
124
125
126
127
128
129
130
131

A basic example demonstrating the rust based runtime and python
bindings.

### LLM

[VLLM](./examples/python_rs/llm/vllm)

An intermediate example expanding further on the concepts introduced
in the Hello World example. In this example, we demonstrate
[Disaggregated Serving](https://arxiv.org/abs/2401.09670) as an
Neelay Shah's avatar
Neelay Shah committed
132
application of the components defined in Dynamo.
Neelay Shah's avatar
Neelay Shah committed
133

Neelay Shah's avatar
Neelay Shah committed
134
135
136
137
138
139
140
141
142
143
144
# Disclaimers

> [!NOTE]
> This project is currently in the alpha / experimental /
> rapid-prototyping stage and we will be adding new features incrementally.

1. The `TENSORRTLLM` and `VLLM` containers are WIP and not expected to
   work out of the box.

2. Testing has primarily been on single node systems with processes
   launched within a single container.