"applications/llm/bin/vscode:/vscode.git/clone" did not exist on "45b3505c8b6e0cd87bf2f7b6c9450e7c0516a97b"
README.md 4.2 KB
Newer Older
Neelay Shah's avatar
Neelay Shah committed
1
<!--
Neelay Shah's avatar
Neelay Shah committed
2
SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Neelay Shah's avatar
Neelay Shah committed
3
SPDX-License-Identifier: Apache-2.0
4
5
6
7
8
9
10
11
12
13
14
15

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Neelay Shah's avatar
Neelay Shah committed
16
17
-->

Neelay Shah's avatar
Neelay Shah committed
18
# Dynamo
Neelay Shah's avatar
Neelay Shah committed
19
20
21
22

<h4> A Datacenter Scale Distributed Inference Serving Framework </h4>

[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
Meenakshi Sharma's avatar
Meenakshi Sharma committed
23
[![GitHub Release](https://img.shields.io/github/v/release/ai-dynamo/dynamo)](https://github.com/ai-dynamo/dynamo/releases/latest)
24

Neelay Shah's avatar
Neelay Shah committed
25

26
27
28
29
30
31
Dynamo is a flexible, component based, data center scale inference
serving framework designed to meet the demands of complex use cases
including those of Generative AI. It is designed to enable developers
to implement and customize routing, load balancing, scaling and
workflow definitions at the data center scale without sacrificing
performance or ease of use.
Neelay Shah's avatar
Neelay Shah committed
32

33
> [!NOTE]
Neelay Shah's avatar
Neelay Shah committed
34
35
36
37
> This project is currently in the alpha / experimental /
> rapid-prototyping stage and we are actively looking for feedback and
> collaborators.

Neelay Shah's avatar
Neelay Shah committed
38
## Building Dynamo
39

40
### Requirements
Neelay Shah's avatar
Neelay Shah committed
41
Dynamo development and examples are container based.
42

43
44
45
46
47
* [Docker](https://docs.docker.com/get-started/get-docker/)
* [buildx](https://github.com/docker/buildx)

### Development

Neelay Shah's avatar
Neelay Shah committed
48
You can build the Dynamo container using the build scripts
49
in `container/` (or directly with `docker build`).
50

51
We provide 2 types of builds:
52

53
1. `VLLM` which includes our VLLM backend using new NIXL communication library.
54
55
2. `TENSORRTLLM` which includes our TRT-LLM backend

56
For example, if you want to build a container for the `VLLM` backend you can run
57

58
59
60
61
<!--pytest.mark.skip-->
```bash
./container/build.sh
```
62
63
64

Please see the instructions in the corresponding example for specific build instructions.

Neelay Shah's avatar
Neelay Shah committed
65
## Running Dynamo for Local Testing and Development
66

Neelay Shah's avatar
Neelay Shah committed
67
You can run the Dynamo container using the run scripts in
68
69
70
71
72
73
`container/` (or directly with `docker run`).

The run script offers a few common workflows:

1. Running a command in a container and exiting.

74
75
<!--pytest.mark.skip-->
```bash
Neelay Shah's avatar
Neelay Shah committed
76
./container/run.sh -- python3 -c "import dynamo.runtime; help(dynamo.runtime)"
77
```
78
<!--
79

80
81
82
83
84
# This tests the above the line but from within the container
# using pytest-codeblocks

```bash
python3 -c "import dynamo.runtime; help(dynamo.runtime)"
85
```
86
87
88
89
90
91
-- >

2. Starting an interactive shell.

<!--pytest.mark.skip-->
```bash
92
93
94
95
96
./container/run.sh -it
```

3. Mounting the local workspace and Starting an interactive shell.

97
98
<!--pytest.mark.skip-->
```bash
99
100
101
./container/run.sh -it --mount-workspace
```

102
103
104
The last command also passes common environment variables ( `-e
HF_TOKEN` ) and mounts common directories such as `/tmp:/tmp`,
`/mnt:/mnt`.
105
106
107

Please see the instructions in the corresponding example for specific
deployment instructions.
Neelay Shah's avatar
Neelay Shah committed
108

Meenakshi Sharma's avatar
Meenakshi Sharma committed
109
## Rust Based Runtime
Neelay Shah's avatar
Neelay Shah committed
110

Neelay Shah's avatar
Neelay Shah committed
111
Dynamo has a new rust based distributed runtime with
Neelay Shah's avatar
Neelay Shah committed
112
113
114
115
116
117
implementation under development. The rust based runtime enables
serving arbitrary python code as well as native rust. Please note the
APIs are subject to change.

### Hello World

Neelay Shah's avatar
Neelay Shah committed
118
[Hello World](./lib/bindings/python/examples/hello_world)
Neelay Shah's avatar
Neelay Shah committed
119
120
121
122
123
124
125
126
127
128
129

A basic example demonstrating the rust based runtime and python
bindings.

### LLM

[VLLM](./examples/python_rs/llm/vllm)

An intermediate example expanding further on the concepts introduced
in the Hello World example. In this example, we demonstrate
[Disaggregated Serving](https://arxiv.org/abs/2401.09670) as an
Neelay Shah's avatar
Neelay Shah committed
130
application of the components defined in Dynamo.
Neelay Shah's avatar
Neelay Shah committed
131

Neelay Shah's avatar
Neelay Shah committed
132
133
134
135
136
137
138
139
140
141
142
# Disclaimers

> [!NOTE]
> This project is currently in the alpha / experimental /
> rapid-prototyping stage and we will be adding new features incrementally.

1. The `TENSORRTLLM` and `VLLM` containers are WIP and not expected to
   work out of the box.

2. Testing has primarily been on single node systems with processes
   launched within a single container.