lora.md 7.63 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
# S3-compatible Storage Backend LoRA Integration Guide

This guide explains how to set up and use LoRA (Low-Rank Adaptation) adapters with Dynamo using S3-compatible storage backend (e.g. MinIO, AWS S3, GCS, etc.).

## Overview

This example demonstrates how to:
1. Set up MinIO as a local S3-compatible storage
2. Download LoRA adapters from Hugging Face Hub
3. Upload LoRA adapters to MinIO
4. Load and use LoRA adapters with Dynamo
5. Run inference with LoRA-adapted models
6. Manage (load/unload) LoRA adapters

## Prerequisites

### Required Software
- Docker (for running MinIO)
- Python 3.10+
- AWS CLI: `pip install awscli`
- Hugging Face CLI: `pip install huggingface-hub[cli]`
- jq (optional, for pretty JSON output): `sudo apt install jq`

### Python Dependencies
Make sure you have Dynamo installed with your chosen backend. See the
[Dynamo quickstart guide](https://docs.nvidia.com/dynamo/getting-started/quickstart)
for setup instructions.

## Quick Start

### Step 1: Setup MinIO and Upload LoRA

Run the setup script to start MinIO and download/upload a LoRA adapter from Hugging Face:

```bash
./setup_minio.sh
```

This script will:
- Start MinIO in a Docker container
- Download a LoRA adapter from Hugging Face Hub (default: `codelion/Qwen3-0.6B-accuracy-recovery-lora`)
- Upload the LoRA to MinIO at `s3://my-loras/codelion/Qwen3-0.6B-accuracy-recovery-lora`

#### Script Options

The setup script supports different modes:

```bash
# Full setup (default) - start MinIO, download & upload LoRA
./setup_minio.sh

# Start MinIO only (without downloading/uploading)
./setup_minio.sh --start

# Stop MinIO
./setup_minio.sh --stop

# Show help
./setup_minio.sh --help
```

#### Customize the LoRA to Download

You can specify a different LoRA repository and name:

```bash
HF_LORA_REPO="username/lora-repo" \
LORA_NAME="my-lora" \
  ./setup_minio.sh
```

### Step 2: Launch Dynamo with LoRA Support

Start the Dynamo frontend and worker with LoRA support enabled:

```bash
./agg_lora.sh
```

This will:
- Set up AWS credentials for MinIO
- Start the Dynamo frontend on port 8000
- Start the Dynamo worker on port 8081 with LoRA support

Wait for the services to start (check the logs for "Application startup complete").

## Working with LoRAs

### 1. Check Available Models

List all available models (base model only at first):

```bash
curl http://localhost:8000/v1/models | jq .
```

### 2. Load a LoRA Adapter

Load a LoRA from S3-compatible storage backend (e.g. MinIO):

```bash
curl -X POST http://localhost:8081/v1/loras \
  -H "Content-Type: application/json" \
  -d '{
    "lora_name": "codelion/Qwen3-0.6B-accuracy-recovery-lora",
    "source": {
      "uri": "s3://my-loras/codelion/Qwen3-0.6B-accuracy-recovery-lora"
    }
  }' | jq .
```

Expected response:
```json
{
  "status": "success",
  "message": "LoRA adapter 'codelion/Qwen3-0.6B-accuracy-recovery-lora' loaded successfully",
  "lora_name": "codelion/Qwen3-0.6B-accuracy-recovery-lora",
  "lora_id": 1207343256
}
```

### 3. List Loaded LoRAs

Check which LoRAs are currently loaded:

```bash
curl http://localhost:8081/v1/loras | jq .
```

### 4. Verify LoRA in Models List

After loading, the LoRA should appear in the models list:

```bash
curl http://localhost:8000/v1/models | jq .
```

You should see both the base model and the LoRA adapter listed.

### 5. Run Inference with LoRA

#### Using the LoRA-adapted model:

```bash
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "codelion/Qwen3-0.6B-accuracy-recovery-lora",
    "messages": [{
      "role": "user",
      "content": "What is good low risk investment strategy?"
    }],
    "max_tokens": 300,
    "temperature": 0.1
  }' | jq .
```

#### For comparison, using the base model:

```bash
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [{
      "role": "user",
      "content": "What is good low risk investment strategy?"
    }],
    "max_tokens": 300
  }' | jq .
```

### 6. Unload a LoRA

When you no longer need a LoRA, unload it to free up resources:

```bash
curl -X DELETE http://localhost:8081/v1/loras/codelion/Qwen3-0.6B-accuracy-recovery-lora | jq .
```

Expected response:
```json
{
  "status": "success",
  "message": "LoRA unloaded successfully"
}
```

After unloading, the LoRA will be removed from both `/v1/loras` and `/v1/models` endpoints.

## Configuration

### Environment Variables

The following environment variables can be configured:

```bash
# S3-compatible storage backend Configuration
export AWS_ENDPOINT=http://localhost:9000
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
export AWS_REGION=us-east-1

# Dynamo LoRA Configuration
export DYN_LORA_ENABLED=true
export DYN_LORA_PATH=/tmp/dynamo_loras_minio
```

### MinIO Console

Access the MinIO web console at `http://localhost:9001`
- Username: `minioadmin`
- Password: `minioadmin`

## Troubleshooting

### MinIO won't start
- Check if ports 9000 and 9001 are already in use
- Ensure Docker is running
- Check Docker logs: `docker logs dynamo-minio`
- Try stopping any existing MinIO containers: `./setup_minio.sh --stop`
- Restart MinIO: `./setup_minio.sh --start`

### LoRA fails to load
- Verify the LoRA is uploaded to MinIO: `aws --endpoint-url=http://localhost:9000 s3 ls s3://my-loras/`
- Check AWS credentials are set correctly
- Ensure the LoRA files are compatible with the base model
- Check worker logs for detailed error messages

### Inference fails
- Verify the model name matches exactly (case-sensitive)
- Check if the LoRA is loaded: `curl http://localhost:8081/v1/loras`
- Ensure the base model supports the LoRA rank
- Check that max_lora_rank in the worker config is >= the LoRA rank

### Cache issues
- Check the cache directory: `ls -la /tmp/dynamo_loras_minio/`
- Clear the cache if needed: `rm -rf /tmp/dynamo_loras_minio/*`
- Ensure the cache directory is writable

## Advanced Usage

### Loading Multiple LoRAs

You can load multiple LoRA adapters simultaneously:

```bash
# Load first LoRA
curl -X POST http://localhost:8081/v1/loras \
  -H "Content-Type: application/json" \
  -d '{"lora_name": "lora1", "source": {"uri": "s3://my-loras/lora1"}}'

# Load second LoRA
curl -X POST http://localhost:8081/v1/loras \
  -H "Content-Type: application/json" \
  -d '{"lora_name": "lora2", "source": {"uri": "s3://my-loras/lora2"}}'
```

### Using Different Base Models

To use a different base model, modify the `MODEL` environment variable:

```bash
MODEL=meta-llama/Llama-2-7b-hf ./agg_lora.sh
```

Ensure your LoRAs are compatible with the chosen base model.

## Cleanup

### Stop Services

Press `Ctrl+C` in the terminal running `agg_lora.sh` to stop Dynamo services.

### Stop MinIO

```bash
# Using the setup script (recommended)
./setup_minio.sh --stop

# Or manually with Docker
docker stop dynamo-minio
docker rm dynamo-minio
```

### Clean Up Data

```bash
# Remove MinIO data
rm -rf ~/dynamo_minio_data

# Remove LoRA cache
rm -rf /tmp/dynamo_loras_minio
```

## API Reference

### Load LoRA
- **Endpoint**: `POST /v1/loras`
- **Body**: `{"lora_name": "string", "source": {"uri": "string"}}`
- **Response**: `{"status": "success", "lora_id": int}`

### List LoRAs
- **Endpoint**: `GET /v1/loras`
- **Response**: Array of loaded LoRAs

### Unload LoRA
- **Endpoint**: `DELETE /v1/loras/{lora_name}`
- **Response**: `{"status": "success", "message": "string"}`

### List Models
- **Endpoint**: `GET /v1/models`
- **Response**: OpenAI-compatible models list

### Chat Completions
- **Endpoint**: `POST /v1/chat/completions`
- **Body**: OpenAI-compatible chat completion request
- **Response**: OpenAI-compatible chat completion response