"tasks/vscode:/vscode.git/clone" did not exist on "77efcccb93d3c4ef714932d3476090c300e7cadc"
README.md 5.33 KB
Newer Older
Suraj Patil's avatar
Suraj Patil committed
1
2
3
4
5
## Textual Inversion fine-tuning example

[Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples.
The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion.

M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
6
## Running on Colab
7

M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
8
Colab for training
9
10
11
12
13
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb)

Colab for inference
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_conceptualizer_inference.ipynb)

14
## Running locally with PyTorch
Suraj Patil's avatar
Suraj Patil committed
15
16
### Installing the dependencies

17
Before running the scripts, make sure to install the library's training dependencies:
Suraj Patil's avatar
Suraj Patil committed
18

19
20
21
22
23
24
25
26
27
**Important**

To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment:
```bash
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .
```

M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
28
Then cd in the example folder and run:
Suraj Patil's avatar
Suraj Patil committed
29
```bash
30
pip install -r requirements.txt
Suraj Patil's avatar
Suraj Patil committed
31
32
```

M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
33
And initialize an [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with:
Suraj Patil's avatar
Suraj Patil committed
34
35
36
37
38
39
40

```bash
accelerate config
```

### Cat toy example

41
First, let's login so that we can upload the checkpoint to the Hub during training:
Suraj Patil's avatar
Suraj Patil committed
42
43
44
45
46

```bash
huggingface-cli login
```

47
Now let's get our dataset. For this example we will use some cat images: https://huggingface.co/datasets/diffusers/cat_toy_example .
Suraj Patil's avatar
Suraj Patil committed
48

49
Let's first download it locally:
Suraj Patil's avatar
Suraj Patil committed
50

51
52
53
54
55
56
```py
from huggingface_hub import snapshot_download

local_dir = "./cat"
snapshot_download("diffusers/cat_toy_example", local_dir=local_dir, repo_type="dataset", ignore_patterns=".gitattributes")
```
Suraj Patil's avatar
Suraj Patil committed
57

58
This will be our training data.
M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
59
Now we can launch the training using:
Suraj Patil's avatar
Suraj Patil committed
60

61
62
**___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___**

63
64
**___Note: Please follow the [README_sdxl.md](./README_sdxl.md) if you are using the [stable-diffusion-xl](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0).___**

Suraj Patil's avatar
Suraj Patil committed
65
```bash
apolinario's avatar
apolinario committed
66
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
67
export DATA_DIR="./cat"
Suraj Patil's avatar
Suraj Patil committed
68
69

accelerate launch textual_inversion.py \
70
  --pretrained_model_name_or_path=$MODEL_NAME \
Suraj Patil's avatar
Suraj Patil committed
71
72
  --train_data_dir=$DATA_DIR \
  --learnable_property="object" \
M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
73
74
  --placeholder_token="<cat-toy>" \
  --initializer_token="toy" \
Suraj Patil's avatar
Suraj Patil committed
75
76
  --resolution=512 \
  --train_batch_size=1 \
Suraj Patil's avatar
Suraj Patil committed
77
  --gradient_accumulation_steps=4 \
Suraj Patil's avatar
Suraj Patil committed
78
  --max_train_steps=3000 \
M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
79
80
  --learning_rate=5.0e-04 \
  --scale_lr \
Suraj Patil's avatar
Suraj Patil committed
81
82
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
83
  --push_to_hub \
Suraj Patil's avatar
Suraj Patil committed
84
85
86
87
88
  --output_dir="textual_inversion_cat"
```

A full training run takes ~1 hour on one V100 GPU.

M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
89
**Note**: As described in [the official paper](https://arxiv.org/abs/2208.01618)
90
only one embedding vector is used for the placeholder token, *e.g.* `"<cat-toy>"`.
M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
91
92
93
However, one can also add multiple embedding vectors for the placeholder token
to increase the number of fine-tuneable parameters. This can help the model to learn
more complex details. To use multiple embedding vectors, you should define `--num_vectors`
94
to a number larger than one, *e.g.*:
M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
95
```bash
96
97
98
99
100
--num_vectors 5
```

The saved textual inversion vectors will then be larger in size compared to the default case.

Suraj Patil's avatar
Suraj Patil committed
101
102
103
104
105
106
### Inference

Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `placeholder_token` in your prompt.

```python
from diffusers import StableDiffusionPipeline
107
import torch
Suraj Patil's avatar
Suraj Patil committed
108
109

model_id = "path-to-your-trained-model"
Kashif Rasul's avatar
Kashif Rasul committed
110
pipe = StableDiffusionPipeline.from_pretrained(model_id,torch_dtype=torch.float16).to("cuda")
Suraj Patil's avatar
Suraj Patil committed
111
112
113

prompt = "A <cat-toy> backpack"

114
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
Suraj Patil's avatar
Suraj Patil committed
115
116

image.save("cat-backpack.png")
Suraj Patil's avatar
Suraj Patil committed
117
```
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137


## Training with Flax/JAX

For faster training on TPUs and GPUs you can leverage the flax training example. Follow the instructions above to get the model and dataset before running the script.

Before running the scripts, make sure to install the library's training dependencies:

```bash
pip install -U -r requirements_flax.txt
```

```bash
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export DATA_DIR="path-to-dir-containing-images"

python textual_inversion_flax.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --train_data_dir=$DATA_DIR \
  --learnable_property="object" \
M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
138
139
  --placeholder_token="<cat-toy>" \
  --initializer_token="toy" \
140
141
142
  --resolution=512 \
  --train_batch_size=1 \
  --max_train_steps=3000 \
M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
143
144
  --learning_rate=5.0e-04 \
  --scale_lr \
145
146
  --output_dir="textual_inversion_cat"
```
147
It should be at least 70% faster than the PyTorch script with the same configuration.
148
149
150

### Training with xformers:
You can enable memory efficient attention by [installing xFormers](https://github.com/facebookresearch/xformers#installing-xformers) and padding the `--enable_xformers_memory_efficient_attention` argument to the script. This is not available with the Flax/JAX implementation.