README.md 736 Bytes
Newer Older
1
# LoRA with vLLM Backend
2

3
For the full LoRA integration guide (setup, usage, API reference, troubleshooting), see [the shared LoRA guide](../../../../common/lora.md).
4
5
6
7

## Quick Start

```bash
8
9
./setup_minio.sh    # Start MinIO, download & upload LoRA
./agg_lora.sh       # Launch vLLM frontend + worker with LoRA
10
11
```

12
## vLLM-Specific Notes
13

14
15
- Default `--max-lora-rank 64` (same as SGLang)
- Override with environment variables: `MODEL`, `LORA_NAME`, `MAX_MODEL_LEN`, `MAX_CONCURRENT_SEQS`
16

17
### KV-Aware Routing (2 GPUs)
18
19

```bash
20
./agg_lora_router.sh
21
22
```

23
Launches two vLLM workers behind a KV-aware router. Load the LoRA to both workers (ports 8081 and 8082), then requests are routed with KV cache affinity for better cache hit rates.