Unverified Commit d0d3e24e authored by Suraj Patil's avatar Suraj Patil Committed by GitHub
Browse files

Textual inversion (#266)

* add textual inversion script

* make the loop work

* make coarse_loss optional

* save pipeline after training

* add arg pretrained_model_name_or_path

* fix saving

* fix gradient_accumulation_steps

* style

* fix progress bar steps

* scale lr

* add argument to accept style

* remove unused args

* scale lr using num gpus

* load tokenizer using args

* add checks when converting init token to id

* improve commnets and style

* document args

* more cleanup

* fix default adamw arsg

* TextualInversionWrapper -> CLIPTextualInversionWrapper

* fix tokenizer loading

* Use the CLIPTextModel instead of wrapper

* clean dataset

* remove commented code

* fix accessing grads for multi-gpu

* more cleanup

* fix saving on multi-GPU

* init_placeholder_token_embeds

* add seed

* fix flip

* fix multi-gpu

* add utility methods in wrapper

* remove ipynb

* don't use wrapper

* dont pass vae an dunet to accelerate prepare

* bring back accelerator.accumulate

* scale latents

* use only one progress bar for steps

* push_to_hub at the end of training

* remove unused args

* log some important stats

* store args in tensorboard

* pretty comments

* save the trained embeddings

* mobe the script up

* add requirements file

* more cleanup

* fux typo

* begin readme

* style -> learnable_property

* keep vae and unet in eval mode

* address review comments

* address more comments

* removed unused args

* add train command in readme

* update readme
parent 5164c9fa
## Textual Inversion fine-tuning example
[Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples.
The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion.
### Installing the dependencies
Before running the scipts, make sure to install the library's training dependencies:
```bash
pip install diffusers[training] accelerate transformers
```
And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:
```bash
accelerate config
```
### Cat toy example
You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree.
You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
Run the following command to autheticate your token
```bash
huggingface-cli login
```
If you have already cloned the repo, then you won't need to go through these steps. You can simple remove the `--use_auth_token` arg from the following command.
<br>
Now let's get our dataset.Download 3-4 images from [here](https://drive.google.com/drive/folders/1fmJMs25nxS_rSNqS5hTcRdLem_YQXbq5) and save them in a directory. This will be our training data.
And launch the training using
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATA_DIR="path-to-dir-containing-images"
accelerate launch textual_inversion.py \
--pretrained_model_name_or_path=$MODEL_NAME --use_auth_token \
--train_data_dir=$DATA_DIR \
--learnable_property="object" \
--placeholder_token="<cat-toy>" --initializer_token="toy" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=2 \
--max_train_steps=3000 \
--learning_rate=5.0e-04 --scale_lr \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--output_dir="textual_inversion_cat"
```
A full training run takes ~1 hour on one V100 GPU.
### Inference
Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `placeholder_token` in your prompt.
```python
from torch import autocast
from diffusers import StableDiffusionPipeline
model_id = "path-to-your-trained-model"
pipe = pipe = StableDiffusionPipeline.from_pretrained(model_id,torch_dtype=torch.float16).to("cuda")
prompt = "A <cat-toy> backpack"
with autocast("cuda"):
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5)["sample"][0]
image.save("cat-backpack.png")
```
\ No newline at end of file
accelerate
torchvision
transformers
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment