Multimodal workers support dynamic loading and unloading of LoRA adapters at runtime via the management API. This enables serving fine-tuned multimodal models alongside the base model.
### Loading a LoRA Adapter
Load an adapter on a running multimodal worker via the `load_lora` endpoint:
```bash
# For components workers (URI-based, requires DYN_LORA_ENABLED=true)
curl -X POST http://<worker-host>:<port>/load_lora \
Requests without a LoRA name (or with the base model name) will use the base model.
### Unloading a LoRA Adapter
```bash
curl -X POST http://<worker-host>:<port>/unload_lora \
-H"Content-Type: application/json"\
-d'{"lora_name": "my-vlm-adapter"}'
```
### Listing Loaded Adapters
```bash
curl -X POST http://<worker-host>:<port>/list_loras
```
### Disaggregated Mode
In disaggregated (prefill/decode) deployments, the **same LoRA adapter must be loaded on both the prefill and decode workers**. The LoRA identity (`model` field) is automatically propagated from the prefill worker to the decode worker in the forwarded request.
# Multimodal LoRA Deployment with MinIO on Kubernetes
This guide explains how to deploy multimodal (vision-language) LoRA-enabled vLLM inference with S3-compatible storage backend on Kubernetes.
## Overview
This deployment pattern enables dynamic LoRA adapter loading from S3-compatible storage (MinIO) for vision-language models in a Kubernetes environment. It uses the aggregated single-worker architecture where the Rust OpenAIPreprocessor in the Frontend handles image URLs directly.
## Prerequisites
- Kubernetes cluster with GPU support
- Helm 3.x installed
-`kubectl` configured to access your cluster
- Dynamo Kubernetes Platform installed ([Installation Guide](../../../../docs/pages/kubernetes/installation-guide.md))
- HuggingFace token for downloading base and LoRA adapters
## Files in This Directory
| File | Description |
|------|-------------|
| `agg_qwen_lora.yaml` | DynamoGraphDeployment for multimodal vLLM with LoRA support |
| `minio-secret.yaml` | Kubernetes secret for MinIO credentials |
| `sync-lora-job.yaml` | Job to download LoRA from HuggingFace and upload to MinIO |
| `lora-model.yaml` | DynamoModel CRD for registering LoRA adapters |
---
## Step 1: Set Up Environment Variables
```bash
export NAMESPACE=dynamo # Your Dynamo namespace
export HF_TOKEN=your_hf_token # Your HuggingFace token
```
---
## Step 2: Create Secrets
### Create HuggingFace Token Secret
```bash
kubectl create secret generic hf-token-secret \
--from-literal=HF_TOKEN=${HF_TOKEN}\
-n${NAMESPACE}
```
### Create MinIO Credentials Secret
In this example, we are using the default credentials for MinIO.
You can change the credentials to point to your own S3-compatible storage.
4.**Verify adapter compatibility**: Ensure the LoRA adapter was trained for the same base model architecture (Qwen3-VL-2B) and that `max-lora-rank` (default 64) is >= the adapter's rank.