lora.md 10.1 KB
Newer Older
1
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2
3
4
5
6
7
8
9
10
11
12

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

13
# LoRA
14

15
<Tip warning={true}>
16

17
This is experimental and the API may change in the future.
18

19
</Tip>
20

21
[LoRA (Low-Rank Adaptation of Large Language Models)](https://hf.co/papers/2106.09685) is a popular and lightweight training technique that significantly reduces the number of trainable parameters. It works by inserting a smaller number of new weights into the model and only these are trained. This makes training with LoRA much faster, memory-efficient, and produces smaller model weights (a few hundred MBs), which are easier to store and share. LoRA can also be combined with other training techniques like DreamBooth to speedup training.
22
23
24

<Tip>

25
LoRA is very versatile and supported for [DreamBooth](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora.py), [Kandinsky 2.2](https://github.com/huggingface/diffusers/blob/main/examples/kandinsky2_2/text_to_image/train_text_to_image_lora_decoder.py), [Stable Diffusion XL](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora_sdxl.py), [text-to-image](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py), and [Wuerstchen](https://github.com/huggingface/diffusers/blob/main/examples/wuerstchen/text_to_image/train_text_to_image_lora_prior.py).
26
27
28

</Tip>

29
This guide will explore the [train_text_to_image_lora.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py) script to help you become more familiar with it, and how you can adapt it for your own use-case.
30

31
Before running the script, make sure you install the library from source:
32
33

```bash
34
35
36
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .
37
```
38

39
Navigate to the example folder with the training script and install the required dependencies for the script you're using:
40

41
42
<hfoptions id="installation">
<hfoption id="PyTorch">
43
44

```bash
45
46
cd examples/text_to_image
pip install -r requirements.txt
47
```
48

49
50
</hfoption>
<hfoption id="Flax">
51
52

```bash
53
54
cd examples/text_to_image
pip install -r requirements_flax.txt
55
56
```

57
58
</hfoption>
</hfoptions>
59

60
<Tip>
61

62
🤗 Accelerate is a library for helping you train on multiple GPUs/TPUs or with mixed-precision. It'll automatically configure your training setup based on your hardware and environment. Take a look at the 🤗 Accelerate [Quick tour](https://huggingface.co/docs/accelerate/quicktour) to learn more.
63

64
</Tip>
65

66
Initialize an 🤗 Accelerate environment:
67

68
```bash
69
accelerate config
70
71
```

72
To setup a default 🤗 Accelerate environment without choosing any configurations:
73

74
```bash
75
accelerate config default
76
77
```

78
Or if your environment doesn't support an interactive shell, like a notebook, you can use:
79

M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
80
```py
81
from accelerate.utils import write_basic_config
82

83
write_basic_config()
84
85
```

86
Lastly, if you want to train a model on your own dataset, take a look at the [Create a dataset for training](create_dataset) guide to learn how to create a dataset that works with the training script.
87

88
89
<Tip>

90
The following sections highlight parts of the training script that are important for understanding how to modify it, but it doesn't cover every aspect of the script in detail. If you're interested in learning more, feel free to read through the [script](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/text_to_image_lora.py) and let us know if you have any questions or concerns.
91
92
93

</Tip>

94
## Script parameters
95

96
The training script has many parameters to help you customize your training run. All of the parameters and their descriptions are found in the [`parse_args()`](https://github.com/huggingface/diffusers/blob/dd9a5caf61f04d11c0fa9f3947b69ab0010c9a0f/examples/text_to_image/train_text_to_image_lora.py#L85) function. Default values are provided for most parameters that work pretty well, but you can also set your own values in the training command if you'd like.
97

98
For example, to increase the number of epochs to train:
99

100
101
102
```bash
accelerate launch train_text_to_image_lora.py \
  --num_train_epochs=150 \
103
104
```

105
Many of the basic and important parameters are described in the [Text-to-image](text2image#script-parameters) training guide, so this guide just focuses on the LoRA relevant parameters:
106

107
- `--rank`: the inner dimension of the low-rank matrices to train; a higher rank means more trainable parameters
108
- `--learning_rate`: the default learning rate is 1e-4, but with LoRA, you can use a higher learning rate
109

110
## Training script
111

112
The dataset preprocessing code and training loop are found in the [`main()`](https://github.com/huggingface/diffusers/blob/dd9a5caf61f04d11c0fa9f3947b69ab0010c9a0f/examples/text_to_image/train_text_to_image_lora.py#L371) function, and if you need to adapt the training script, this is where you'll make your changes.
113

114
As with the script parameters, a walkthrough of the training script is provided in the [Text-to-image](text2image#training-script) training guide. Instead, this guide takes a look at the LoRA relevant parts of the script.
115

Steven Liu's avatar
Steven Liu committed
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
<hfoptions id="lora">
<hfoption id="UNet">

Diffusers uses [`~peft.LoraConfig`] from the [PEFT](https://hf.co/docs/peft) library to set up the parameters of the LoRA adapter such as the rank, alpha, and which modules to insert the LoRA weights into. The adapter is added to the UNet, and only the LoRA layers are filtered for optimization in `lora_layers`.

```py
unet_lora_config = LoraConfig(
    r=args.rank,
    lora_alpha=args.rank,
    init_lora_weights="gaussian",
    target_modules=["to_k", "to_q", "to_v", "to_out.0"],
)

unet.add_adapter(unet_lora_config)
lora_layers = filter(lambda p: p.requires_grad, unet.parameters())
```

</hfoption>
<hfoption id="text encoder">

Diffusers also supports finetuning the text encoder with LoRA from the [PEFT](https://hf.co/docs/peft) library when necessary such as finetuning Stable Diffusion XL (SDXL). The [`~peft.LoraConfig`] is used to configure the parameters of the LoRA adapter which are then added to the text encoder, and only the LoRA layers are filtered for training.
137

138
```py
Steven Liu's avatar
Steven Liu committed
139
140
141
142
143
144
145
146
147
148
149
text_lora_config = LoraConfig(
    r=args.rank,
    lora_alpha=args.rank,
    init_lora_weights="gaussian",
    target_modules=["q_proj", "k_proj", "v_proj", "out_proj"],
)

text_encoder_one.add_adapter(text_lora_config)
text_encoder_two.add_adapter(text_lora_config)
text_lora_parameters_one = list(filter(lambda p: p.requires_grad, text_encoder_one.parameters()))
text_lora_parameters_two = list(filter(lambda p: p.requires_grad, text_encoder_two.parameters()))
150
151
```

Steven Liu's avatar
Steven Liu committed
152
153
154
155
</hfoption>
</hfoptions>

The [optimizer](https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/examples/text_to_image/train_text_to_image_lora.py#L529) is initialized with the `lora_layers` because these are the only weights that'll be optimized:
156

157
158
```py
optimizer = optimizer_cls(
Steven Liu's avatar
Steven Liu committed
159
    lora_layers,
160
161
162
163
    lr=args.learning_rate,
    betas=(args.adam_beta1, args.adam_beta2),
    weight_decay=args.adam_weight_decay,
    eps=args.adam_epsilon,
164
165
166
)
```

167
Aside from setting up the LoRA layers, the training script is more or less the same as train_text_to_image.py!
168

169
## Launch the script
170

171
Once you've made all your changes or you're okay with the default configuration, you're ready to launch the training script! 🚀
172

173
Let's train on the [Naruto BLIP captions](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) dataset to generate your own Naruto characters. Set the environment variables `MODEL_NAME` and `DATASET_NAME` to the model and dataset respectively. You should also specify where to save the model in `OUTPUT_DIR`, and the name of the model to save to on the Hub with `HUB_MODEL_ID`. The script creates and saves the following files to your repository:
174

175
176
- saved model checkpoints
- `pytorch_lora_weights.safetensors` (the trained LoRA weights)
177

178
If you're training on more than one GPU, add the `--multi_gpu` parameter to the `accelerate launch` command.
179
180
181

<Tip warning={true}>

182
A full training run takes ~5 hours on a 2080 Ti GPU with 11GB of VRAM.
183
184
185

</Tip>

186
```bash
187
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
188
189
190
export OUTPUT_DIR="/sddata/finetune/lora/naruto"
export HUB_MODEL_ID="naruto-lora"
export DATASET_NAME="lambdalabs/naruto-blip-captions"
191

192
193
194
195
accelerate launch --mixed_precision="fp16"  train_text_to_image_lora.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$DATASET_NAME \
  --dataloader_num_workers=8 \
196
  --resolution=512 \
197
198
199
200
201
202
203
204
205
206
207
208
209
210
  --center_crop \
  --random_flip \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=15000 \
  --learning_rate=1e-04 \
  --max_grad_norm=1 \
  --lr_scheduler="cosine" \
  --lr_warmup_steps=0 \
  --output_dir=${OUTPUT_DIR} \
  --push_to_hub \
  --hub_model_id=${HUB_MODEL_ID} \
  --report_to=wandb \
  --checkpointing_steps=500 \
211
  --validation_prompt="A naruto with blue eyes." \
212
  --seed=1337
213
214
```

215
Once training has been completed, you can use your model for inference:
216

217
218
```py
from diffusers import AutoPipelineForText2Image
219
220
import torch

221
222
pipeline = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
pipeline.load_lora_weights("path/to/lora/model", weight_name="pytorch_lora_weights.safetensors")
223
image = pipeline("A naruto with blue eyes").images[0]
224
```
225

226
## Next steps
227

228
Congratulations on training a new model with LoRA! To learn more about how to use your new model, the following guides may be helpful:
229

230
- Learn how to [load different LoRA formats](../using-diffusers/loading_adapters#LoRA) trained using community trainers like Kohya and TheLastBen.
231
- Learn how to use and [combine multiple LoRA's](../tutorials/using_peft_for_inference) with PEFT for inference.