# S3-compatible Storage Backend LoRA Integration Guide This guide explains how to set up and use LoRA (Low-Rank Adaptation) adapters with Dynamo using S3-compatible storage backend (e.g. MinIO, AWS S3, GCS, etc.). ## Overview This example demonstrates how to: 1. Set up MinIO as a local S3-compatible storage 2. Download LoRA adapters from Hugging Face Hub 3. Upload LoRA adapters to MinIO 4. Load and use LoRA adapters with Dynamo 5. Run inference with LoRA-adapted models 6. Manage (load/unload) LoRA adapters ## Prerequisites ### Required Software - Docker (for running MinIO) - Python 3.10+ - AWS CLI: `pip install awscli` - Hugging Face CLI: `pip install huggingface-hub[cli]` - jq (optional, for pretty JSON output): `sudo apt install jq` ### Python Dependencies Make sure you have Dynamo installed with your chosen backend. See the [Dynamo quickstart guide](https://docs.nvidia.com/dynamo/getting-started/quickstart) for setup instructions. ## Quick Start ### Step 1: Setup MinIO and Upload LoRA Run the setup script to start MinIO and download/upload a LoRA adapter from Hugging Face: ```bash ./setup_minio.sh ``` This script will: - Start MinIO in a Docker container - Download a LoRA adapter from Hugging Face Hub (default: `codelion/Qwen3-0.6B-accuracy-recovery-lora`) - Upload the LoRA to MinIO at `s3://my-loras/codelion/Qwen3-0.6B-accuracy-recovery-lora` #### Script Options The setup script supports different modes: ```bash # Full setup (default) - start MinIO, download & upload LoRA ./setup_minio.sh # Start MinIO only (without downloading/uploading) ./setup_minio.sh --start # Stop MinIO ./setup_minio.sh --stop # Show help ./setup_minio.sh --help ``` #### Customize the LoRA to Download You can specify a different LoRA repository and name: ```bash HF_LORA_REPO="username/lora-repo" \ LORA_NAME="my-lora" \ ./setup_minio.sh ``` ### Step 2: Launch Dynamo with LoRA Support Start the Dynamo frontend and worker with LoRA support enabled: ```bash ./agg_lora.sh ``` This will: - Set up AWS credentials for MinIO - Start the Dynamo frontend on port 8000 - Start the Dynamo worker on port 8081 with LoRA support Wait for the services to start (check the logs for "Application startup complete"). ## Working with LoRAs ### 1. Check Available Models List all available models (base model only at first): ```bash curl http://localhost:8000/v1/models | jq . ``` ### 2. Load a LoRA Adapter Load a LoRA from S3-compatible storage backend (e.g. MinIO): ```bash curl -X POST http://localhost:8081/v1/loras \ -H "Content-Type: application/json" \ -d '{ "lora_name": "codelion/Qwen3-0.6B-accuracy-recovery-lora", "source": { "uri": "s3://my-loras/codelion/Qwen3-0.6B-accuracy-recovery-lora" } }' | jq . ``` Expected response: ```json { "status": "success", "message": "LoRA adapter 'codelion/Qwen3-0.6B-accuracy-recovery-lora' loaded successfully", "lora_name": "codelion/Qwen3-0.6B-accuracy-recovery-lora", "lora_id": 1207343256 } ``` ### 3. List Loaded LoRAs Check which LoRAs are currently loaded: ```bash curl http://localhost:8081/v1/loras | jq . ``` ### 4. Verify LoRA in Models List After loading, the LoRA should appear in the models list: ```bash curl http://localhost:8000/v1/models | jq . ``` You should see both the base model and the LoRA adapter listed. ### 5. Run Inference with LoRA #### Using the LoRA-adapted model: ```bash curl -X POST http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "codelion/Qwen3-0.6B-accuracy-recovery-lora", "messages": [{ "role": "user", "content": "What is good low risk investment strategy?" }], "max_tokens": 300, "temperature": 0.1 }' | jq . ``` #### For comparison, using the base model: ```bash curl -X POST http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen/Qwen3-0.6B", "messages": [{ "role": "user", "content": "What is good low risk investment strategy?" }], "max_tokens": 300 }' | jq . ``` ### 6. Unload a LoRA When you no longer need a LoRA, unload it to free up resources: ```bash curl -X DELETE http://localhost:8081/v1/loras/codelion/Qwen3-0.6B-accuracy-recovery-lora | jq . ``` Expected response: ```json { "status": "success", "message": "LoRA unloaded successfully" } ``` After unloading, the LoRA will be removed from both `/v1/loras` and `/v1/models` endpoints. ## Configuration ### Environment Variables The following environment variables can be configured: ```bash # S3-compatible storage backend Configuration export AWS_ENDPOINT=http://localhost:9000 export AWS_ACCESS_KEY_ID=minioadmin export AWS_SECRET_ACCESS_KEY=minioadmin export AWS_REGION=us-east-1 # Dynamo LoRA Configuration export DYN_LORA_ENABLED=true export DYN_LORA_PATH=/tmp/dynamo_loras_minio ``` ### MinIO Console Access the MinIO web console at `http://localhost:9001` - Username: `minioadmin` - Password: `minioadmin` ## Troubleshooting ### MinIO won't start - Check if ports 9000 and 9001 are already in use - Ensure Docker is running - Check Docker logs: `docker logs dynamo-minio` - Try stopping any existing MinIO containers: `./setup_minio.sh --stop` - Restart MinIO: `./setup_minio.sh --start` ### LoRA fails to load - Verify the LoRA is uploaded to MinIO: `aws --endpoint-url=http://localhost:9000 s3 ls s3://my-loras/` - Check AWS credentials are set correctly - Ensure the LoRA files are compatible with the base model - Check worker logs for detailed error messages ### Inference fails - Verify the model name matches exactly (case-sensitive) - Check if the LoRA is loaded: `curl http://localhost:8081/v1/loras` - Ensure the base model supports the LoRA rank - Check that max_lora_rank in the worker config is >= the LoRA rank ### Cache issues - Check the cache directory: `ls -la /tmp/dynamo_loras_minio/` - Clear the cache if needed: `rm -rf /tmp/dynamo_loras_minio/*` - Ensure the cache directory is writable ## Advanced Usage ### Loading Multiple LoRAs You can load multiple LoRA adapters simultaneously: ```bash # Load first LoRA curl -X POST http://localhost:8081/v1/loras \ -H "Content-Type: application/json" \ -d '{"lora_name": "lora1", "source": {"uri": "s3://my-loras/lora1"}}' # Load second LoRA curl -X POST http://localhost:8081/v1/loras \ -H "Content-Type: application/json" \ -d '{"lora_name": "lora2", "source": {"uri": "s3://my-loras/lora2"}}' ``` ### Using Different Base Models To use a different base model, modify the `MODEL` environment variable: ```bash MODEL=meta-llama/Llama-2-7b-hf ./agg_lora.sh ``` Ensure your LoRAs are compatible with the chosen base model. ## Cleanup ### Stop Services Press `Ctrl+C` in the terminal running `agg_lora.sh` to stop Dynamo services. ### Stop MinIO ```bash # Using the setup script (recommended) ./setup_minio.sh --stop # Or manually with Docker docker stop dynamo-minio docker rm dynamo-minio ``` ### Clean Up Data ```bash # Remove MinIO data rm -rf ~/dynamo_minio_data # Remove LoRA cache rm -rf /tmp/dynamo_loras_minio ``` ## API Reference ### Load LoRA - **Endpoint**: `POST /v1/loras` - **Body**: `{"lora_name": "string", "source": {"uri": "string"}}` - **Response**: `{"status": "success", "lora_id": int}` ### List LoRAs - **Endpoint**: `GET /v1/loras` - **Response**: Array of loaded LoRAs ### Unload LoRA - **Endpoint**: `DELETE /v1/loras/{lora_name}` - **Response**: `{"status": "success", "message": "string"}` ### List Models - **Endpoint**: `GET /v1/models` - **Response**: OpenAI-compatible models list ### Chat Completions - **Endpoint**: `POST /v1/chat/completions` - **Body**: OpenAI-compatible chat completion request - **Response**: OpenAI-compatible chat completion response