README.md 6.91 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# Triton Server Backend for Dynamo

> **⚠️ Work in Progress / Proof of Concept**
>
> This example demonstrates integrating NVIDIA Triton Inference Server as a backend for Dynamo.
> It is currently a proof-of-concept and may require additional work for production use.

## Overview

This example shows how to run Triton Server models through Dynamo's distributed runtime, exposing them via the KServe gRPC protocol. The integration allows Triton models to benefit from Dynamo's service discovery, routing, and infrastructure.

**Architecture:**

```
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────────────────┐
│  Triton Client  │────▶│  Dynamo Frontend│────▶│       Dynamo Worker         │
│  (KServe gRPC)  │     │  (port 8787)    │     │  ┌───────────────────────┐  │
└─────────────────┘     └─────────────────┘     │  │    Triton Server      │  │
                              │                 │  │  (Python bindings)    │  │
                              ▼                 │  └───────────────────────┘  │
                    ┌─────────────────┐         └─────────────────────────────┘
                    │    KV Store     │
                    └─────────────────┘
```

## Prerequisites

- NVIDIA GPU with CUDA support
- For local development: Python 3.10+ with Dynamo installed
- For container deployment: Docker with NVIDIA Container Toolkit

## Quick Start

### Option 1: Container Deployment

#### Step 1: Build Container Images

From the Dynamo repository root:

```bash
# Build the base Dynamo image
42
43
python container/render.py --framework=dynamo --target=runtime --short-output
docker build -f container/rendered.Dockerfile .
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228

# Build the Triton worker image
cd examples/backends/tritonserver
docker build -t dynamo-triton:latest .
```

#### Step 2: Run the Container

```bash
docker run --rm -it --gpus all --network host \
  dynamo-triton:latest \
  ./examples/backends/tritonserver/launch/identity.sh
```

#### Step 3: Test the Deployment

In another terminal:

```bash
# Install client dependencies
pip install tritonclient[grpc]

# Test with the client
cd examples/backends/tritonserver
python src/client.py --port 8000
```

### Option 2: Local Development

This requires Dynamo to be installed locally.

```bash
# From the dynamo repo root
cd examples/backends/tritonserver

# Build Triton Server (first time only, ~30 minutes)
make all

# Install Python dependencies
pip install wheelhouse/tritonserver-*.whl
pip install tritonclient[grpc]

# Launch the server
./launch/identity.sh

# In another terminal, test with the client
python src/client.py
```

## Directory Structure

```
tritonserver/
├── launch/
│   └── identity.sh      # Launch script (frontend + worker)
├── src/
│   ├── tritonworker.py  # Main Dynamo worker implementation
│   └── client.py        # Test client (KServe gRPC)
├── model_repo/
│   └── identity/        # Sample identity model
│       ├── config.pbtxt
│       └── 1/
├── backends/            # Triton backends (built by `make all`)
├── lib/                 # Triton libraries (built by `make all`)
├── wheelhouse/          # Python wheels (built by `make all`)
├── Dockerfile           # Triton worker container
└── Makefile             # Build Triton from source
```

## Configuration

### Launch Script Options

```bash
./launch/identity.sh --help

Options:
  --model-name <name>         Model name to load (default: identity)
  --model-repository <path>   Path to model repository
  --backend-directory <path>  Path to Triton backends
  --log-verbose <level>       Triton log verbosity 0-6 (default: 1)
  --store-kv <backend>        KV store backend: file, etcd, mem (default: file)
```

### Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `DYN_STORE_KV` | KV store backend: `file`, `etcd`, or `mem` | `file` |
| `DYN_LOG` | Log level (debug, info, warn, error) | `info` |
| `DYN_HTTP_PORT` | Frontend HTTP port | `8000` |
| `ETCD_ENDPOINTS` | etcd connection URL (only when `--store-kv etcd`) | `http://localhost:2379` |
| `NATS_SERVER` | NATS connection URL (only for distributed mode) | `nats://localhost:4222` |

## Adding Your Own Models

1. Create a model directory in `model_repo/`:

   ```text
   model_repo/
   └── my_model/
       ├── config.pbtxt
       └── 1/
           └── model.plan  # or other model file
   ```

2. Define the model config (`config.pbtxt`):

   ```protobuf
   name: "my_model"
   backend: "tensorrt"  # or onnxruntime, python, etc.
   max_batch_size: 8

   input [
     {
       name: "input"
       data_type: TYPE_FP32
       dims: [3, 224, 224]
     }
   ]
   output [
     {
       name: "output"
       data_type: TYPE_FP32
       dims: [1000]
     }
   ]
   ```

3. Launch with your model:

   ```bash
   ./launch/identity.sh --model-name my_model
   ```

## Known Limitations

- **Single model**: Currently loads one model at a time
- **Identity backend only**: The Makefile builds the identity backend by default; other backends require modifying the build configuration

## Building Triton from Source

Required for local development. The Makefile builds Triton Server and the identity backend.

```bash
cd examples/backends/tritonserver

# Build Triton Server (~30 minutes, clones and builds from source)
make all

# Check build status
make status

# This produces:
#   lib/libtritonserver.so     - Core library
#   bin/tritonserver           - Server binary
#   backends/identity/         - Identity backend
#   wheelhouse/*.whl           - Python bindings

# Clean up build artifacts
make clean      # Remove installed artifacts
make distclean  # Remove everything including build cache
```

To add other backends (TensorRT, ONNX, Python, etc.), edit the Makefile's `build.py` invocation to include additional `--backend=<name>` flags.

## Troubleshooting

### "Model not found" error

- Verify the model exists in `model_repo/<model_name>/`
- Check that `config.pbtxt` is valid
- Ensure the backend is available in `backends/`

### Worker fails to start

- Check `LD_LIBRARY_PATH` includes Triton libraries
- Verify GPU is available: `nvidia-smi`
- Increase log verbosity: `--log-verbose 6`

## Related Documentation

- [Dynamo Backend Guide](../../../docs/development/backend-guide.md)
- [Triton Inference Server](https://github.com/triton-inference-server/server)
- [KServe Protocol](https://kserve.github.io/website/latest/modelserving/data_plane/v2_protocol/)