Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
be67f67b
Unverified
Commit
be67f67b
authored
Dec 03, 2025
by
Biswa Panda
Committed by
GitHub
Dec 04, 2025
Browse files
feat: add lora k8s deployment example (#4714)
parent
845a06b3
Changes
5
Hide whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
427 additions
and
0 deletions
+427
-0
examples/backends/vllm/deploy/lora/README.md
examples/backends/vllm/deploy/lora/README.md
+297
-0
examples/backends/vllm/deploy/lora/agg_lora.yaml
examples/backends/vllm/deploy/lora/agg_lora.yaml
+70
-0
examples/backends/vllm/deploy/lora/lora-model.yaml
examples/backends/vllm/deploy/lora/lora-model.yaml
+12
-0
examples/backends/vllm/deploy/lora/minio-secret.yaml
examples/backends/vllm/deploy/lora/minio-secret.yaml
+10
-0
examples/backends/vllm/deploy/lora/sync-lora-job.yaml
examples/backends/vllm/deploy/lora/sync-lora-job.yaml
+38
-0
No files found.
examples/backends/vllm/deploy/lora/README.md
0 → 100644
View file @
be67f67b
# LoRA Deployment with MinIO on Kubernetes
This guide explains how to deploy LoRA-enabled vLLM inference with S3-compatible storage backend on Kubernetes.
## Overview
This deployment pattern enables dynamic LoRA adapter loading from S3-compatible storage (MinIO) in a Kubernetes environment:
## Prerequisites
-
Kubernetes cluster with GPU support
-
Helm 3.x installed
-
`kubectl`
configured to access your cluster
-
Dynamo Cloud Platform installed (
[
Installation Guide
](
../../../../../docs/kubernetes/installation_guide.md
)
)
-
HuggingFace token for downloading Base and LoRA adapters
## Files in This Directory
| File | Description |
|------|-------------|
|
`agg_lora.yaml`
| DynamoGraphDeployment for vLLM with LoRA support |
|
`minio-secret.yaml`
| Kubernetes secret for MinIO credentials |
|
`sync-lora-job.yaml`
| Job to download LoRA from HuggingFace and upload to MinIO |
|
`lora-model.yaml`
| DynamoModel CRD for registering LoRA adapters |
---
## Step 1: Set Up Environment Variables
```
bash
export
NAMESPACE
=
dynamo
# Your Dynamo namespace
export
HF_TOKEN
=
your_hf_token
# Your HuggingFace token
```
---
## Step 2: Create Secrets
### Create HuggingFace Token Secret
```
bash
kubectl create secret generic hf-token-secret
\
--from-literal
=
HF_TOKEN
=
${
HF_TOKEN
}
\
-n
${
NAMESPACE
}
```
### Create MinIO Credentials Secret
in this example, we are using the default credentials for MinIO.
You can change the credentials to point to your own S3 compatible storage.
```
bash
kubectl apply
-f
minio-secret.yaml
-n
${
NAMESPACE
}
```
---
## Step 3: Install MinIO
### Add MinIO Helm Repository
```
bash
helm repo add minio https://charts.min.io/
helm repo update
```
### Deploy MinIO
```
bash
helm
install
minio minio/minio
\
--namespace
${
NAMESPACE
}
\
--set
rootUser
=
minioadmin
\
--set
rootPassword
=
minioadmin
\
--set
mode
=
standalone
\
--set
replicas
=
1
\
--set
persistence.enabled
=
true
\
--set
persistence.size
=
10Gi
\
--set
resources.requests.memory
=
512Mi
\
--set
service.type
=
ClusterIP
\
--set
consoleService.type
=
ClusterIP
```
### Verify MinIO Installation
```
bash
kubectl get pods
-n
${
NAMESPACE
}
|
grep
minio
kubectl get svc
-n
${
NAMESPACE
}
|
grep
minio
```
Expected output:
```
minio-xxxx-xxxx 1/1 Running 0 1m
```
### (Optional) Access MinIO Console
```
bash
kubectl port-forward svc/minio-console
-n
${
NAMESPACE
}
9001:9001 9000:9000
```
Open http://localhost:9001 in your browser:
-
Username:
`minioadmin`
-
Password:
`minioadmin`
---
## Step 4: Upload LoRA Adapters to MinIO
Use the provided Kubernetes Job to download a LoRA adapter from HuggingFace and upload it to MinIO:
```
bash
kubectl apply
-f
sync-lora-job.yaml
-n
${
NAMESPACE
}
```
### Monitor the Job
```
bash
# Watch job progress
kubectl get
jobs
-n
${
NAMESPACE
}
-w
# Check job logs
kubectl logs job/sync-hf-lora-to-minio
-n
${
NAMESPACE
}
-f
```
Wait for the job to complete successfully.
### Verify Upload (Optional)
```
bash
# Port-forward MinIO API
kubectl port-forward svc/minio
-n
${
NAMESPACE
}
9000:9000 &
# Check uploaded files
export
AWS_ACCESS_KEY_ID
=
minioadmin
export
AWS_SECRET_ACCESS_KEY
=
minioadmin
export
AWS_ENDPOINT_URL
=
http://localhost:9000
aws s3
ls
s3://my-loras/
--recursive
```
### Customizing the LoRA Adapter
To upload a different LoRA adapter, edit
`sync-lora-job.yaml`
and change the
`MODEL_NAME`
environment variable:
```
yaml
env
:
-
name
:
MODEL_NAME
value
:
your-org/your-lora-adapter
```
---
## Step 5: Deploy vLLM with LoRA Support
### Update the Image (if needed)
Edit
`agg_lora.yaml`
to use your container image:
```
bash
# Using yq to update the image
export
FRAMEWORK_RUNTIME_IMAGE
=
your-registry/your-image:tag
yq
'.spec.services.[].extraPodSpec.mainContainer.image = env(FRAMEWORK_RUNTIME_IMAGE)'
agg_lora.yaml
>
agg_lora_updated.yaml
```
### Deploy the LoRA-enabled vLLM Graph
```
bash
kubectl apply
-f
agg_lora.yaml
-n
${
NAMESPACE
}
```
### Verify Deployment
```
bash
# Check pods
kubectl get pods
-n
${
NAMESPACE
}
# Watch worker logs
kubectl logs
-f
deployment/vllm-agg-lora-vllmdecode-worker
-n
${
NAMESPACE
}
```
Wait for the worker to show "Application startup complete".
## Step 6: Using DynamoModel CRD
The
`lora-model.yaml`
file demonstrates how to register a LoRA adapter using the DynamoModel Custom Resource:
```
bash
kubectl apply
-f
lora-model.yaml
-n
${
NAMESPACE
}
```
This creates a declarative way to manage LoRA adapters in your cluster.
---
## Configuration Reference
### Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
|
`AWS_ENDPOINT`
| MinIO/S3 endpoint URL |
`http://minio:9000`
|
|
`AWS_ACCESS_KEY_ID`
| MinIO access key | From secret |
|
`AWS_SECRET_ACCESS_KEY`
| MinIO secret key | From secret |
|
`AWS_REGION`
| AWS region (required for S3 SDK) |
`us-east-1`
|
|
`AWS_ALLOW_HTTP`
| Allow HTTP connections |
`true`
|
|
`DYN_LORA_ENABLED`
| Enable LoRA support |
`true`
|
|
`DYN_LORA_PATH`
| Local cache path for LoRA files |
`/tmp/dynamo_loras_minio`
|
|
`BUCKET_NAME`
| MinIO bucket name |
`my-loras`
|
### vLLM LoRA Arguments
| Argument | Description |
|----------|-------------|
|
`--enable-lora`
| Enable LoRA adapter support |
|
`--max-lora-rank`
| Maximum LoRA rank (must be >= your LoRA's rank) |
|
`--max-loras`
| Maximum number of LoRAs to load simultaneously |
---
## Cleanup
### Remove vLLM Deployment
```
bash
kubectl delete
-f
agg_lora.yaml
-n
${
NAMESPACE
}
```
### Remove Sync Job
```
bash
kubectl delete
-f
sync-lora-job.yaml
-n
${
NAMESPACE
}
```
### Remove MinIO
```
bash
helm uninstall minio
-n
${
NAMESPACE
}
```
### Remove Secrets
```
bash
kubectl delete
-f
minio-secret.yaml
-n
${
NAMESPACE
}
kubectl delete secret hf-token-secret
-n
${
NAMESPACE
}
```
---
## Troubleshooting
### LoRA Fails to Load
1.
**Check MinIO connectivity from worker**
:
```
bash
kubectl
exec
-it
deployment/vllm-agg-lora-vllmdecode-worker
-n
${
NAMESPACE
}
--
\
curl http://minio:9000/minio/health/live
```
2.
**Verify LoRA exists in MinIO**
:
```
bash
kubectl port-forward svc/minio
-n
${
NAMESPACE
}
9000:9000 &
aws
--endpoint-url
=
http://localhost:9000 s3
ls
s3://my-loras/
--recursive
```
3.
**Check worker logs**
:
```
bash
kubectl logs deployment/vllm-agg-lora-vllmdecode-worker
-n
${
NAMESPACE
}
```
### Sync Job Fails
1.
**Check job logs**
:
```
bash
kubectl logs job/sync-hf-lora-to-minio
-n
${
NAMESPACE
}
```
2.
**Verify HuggingFace token**
:
```
bash
kubectl get secret hf-token-secret
-n
${
NAMESPACE
}
-o
yaml
```
3.
**Check MinIO is accessible**
:
```
bash
kubectl get svc minio
-n
${
NAMESPACE
}
```
### MinIO Connection Refused
-
Ensure MinIO pods are running:
`kubectl get pods -n ${NAMESPACE} | grep minio`
-
Check MinIO service:
`kubectl get svc minio -n ${NAMESPACE}`
-
Verify the
`AWS_ENDPOINT`
URL matches the service name
## Further Reading
-
[
vLLM Deployment Guide
](
../README.md
)
- Other deployment patterns
-
[
Dynamo Kubernetes Guide
](
../../../../../docs/kubernetes/README.md
)
- Platform setup
-
[
Installation Guide
](
../../../../../docs/kubernetes/installation_guide.md
)
- Platform installation
examples/backends/vllm/deploy/lora/agg_lora.yaml
0 → 100644
View file @
be67f67b
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion
:
nvidia.com/v1alpha1
kind
:
DynamoGraphDeployment
metadata
:
name
:
vllm-agg-lora
spec
:
services
:
Frontend
:
dynamoNamespace
:
vllm-agg-lora
componentType
:
frontend
replicas
:
1
extraPodSpec
:
mainContainer
:
image
:
nvcr.io/nvidian/dynamo-dev/biswa:7e499b5c460f1883a9945d221123e0760051210f-39500608-vllm-amd64
VllmDecodeWorker
:
envFromSecret
:
hf-token-secret
dynamoNamespace
:
vllm-agg-lora
componentType
:
worker
subComponentType
:
decode
replicas
:
1
resources
:
limits
:
gpu
:
"
1"
modelRef
:
name
:
Qwen/Qwen3-0.6B
extraPodSpec
:
mainContainer
:
image
:
nvcr.io/nvidian/dynamo-dev/biswa:7e499b5c460f1883a9945d221123e0760051210f-39500608-vllm-amd64
workingDir
:
/workspace/examples/backends/vllm
env
:
-
name
:
DYN_LORA_ENABLED
value
:
"
true"
-
name
:
DYN_LORA_PATH
value
:
"
/tmp/dynamo_loras_minio"
-
name
:
DYN_SYSTEM_ENABLED
value
:
"
true"
-
name
:
DYN_SYSTEM_PORT
value
:
"
9090"
-
name
:
AWS_ENDPOINT
value
:
"
http://minio:9000"
-
name
:
AWS_ACCESS_KEY_ID
valueFrom
:
secretKeyRef
:
name
:
minio-secret
key
:
AWS_ACCESS_KEY_ID
-
name
:
AWS_SECRET_ACCESS_KEY
valueFrom
:
secretKeyRef
:
name
:
minio-secret
key
:
AWS_SECRET_ACCESS_KEY
-
name
:
AWS_REGION
value
:
"
us-east-1"
-
name
:
AWS_ALLOW_HTTP
value
:
"
true"
-
name
:
BUCKET_NAME
value
:
"
my-loras"
command
:
-
python3
-
-m
-
dynamo.vllm
args
:
-
--model
-
Qwen/Qwen3-0.6B
-
--connector
-
none
-
--enable-lora
-
--max-lora-rank
-
"
64"
-
--enforce-eager
examples/backends/vllm/deploy/lora/lora-model.yaml
0 → 100644
View file @
be67f67b
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion
:
nvidia.com/v1alpha1
kind
:
DynamoModel
metadata
:
name
:
codelion-recovery-lora
spec
:
modelName
:
codelion/Qwen3-0.6B-accuracy-recovery-lora
baseModelName
:
Qwen/Qwen3-0.6B
modelType
:
lora
source
:
uri
:
s3://my-loras/codelion/Qwen3-0.6B-accuracy-recovery-lora
\ No newline at end of file
examples/backends/vllm/deploy/lora/minio-secret.yaml
0 → 100644
View file @
be67f67b
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion
:
v1
kind
:
Secret
type
:
Opaque
metadata
:
name
:
minio-secret
stringData
:
AWS_ACCESS_KEY_ID
:
minioadmin
AWS_SECRET_ACCESS_KEY
:
minioadmin
examples/backends/vllm/deploy/lora/sync-lora-job.yaml
0 → 100644
View file @
be67f67b
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion
:
batch/v1
kind
:
Job
metadata
:
name
:
sync-hf-lora-to-minio
spec
:
template
:
spec
:
containers
:
-
name
:
uploader
image
:
python:3.10-slim
command
:
-
/bin/sh
-
-c
-
|
set -eux
pip install --no-cache-dir huggingface-hub awscli
hf download $MODEL_NAME --local-dir /tmp/lora
rm -rf /tmp/lora/.cache
aws --endpoint-url=http://minio:9000 s3 mb s3://$LORA_ROOT_PATH || true
aws --endpoint-url=http://minio:9000 s3 sync /tmp/lora s3://$LORA_ROOT_PATH/$MODEL_NAME
envFrom
:
-
secretRef
:
name
:
hf-token-secret
-
secretRef
:
name
:
minio-secret
env
:
-
name
:
AWS_REGION
# set this to your aws region
value
:
us-east-1
-
name
:
AWS_ALLOW_HTTP
# remove/disable this if you are using a S3 endpoint or secure MinIO
value
:
"
true"
-
name
:
LORA_ROOT_PATH
value
:
"
my-loras"
-
name
:
MODEL_NAME
value
:
codelion/Qwen3-0.6B-accuracy-recovery-lora
restartPolicy
:
Never
backoffLimit
:
3
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment