Textual inversion (#266)

* add textual inversion script * make the loop work * make coarse_loss optional * save pipeline after training * add arg pretrained_model_name_or_path * fix saving * fix gradient_accumulation_steps * style * fix progress bar steps * scale lr * add argument to accept style * remove unused args * scale lr using num gpus * load tokenizer using args * add checks when converting init token to id * improve commnets and style * document args * more cleanup * fix default adamw arsg * TextualInversionWrapper -> CLIPTextualInversionWrapper * fix tokenizer loading * Use the CLIPTextModel instead of wrapper * clean dataset * remove commented code * fix accessing grads for multi-gpu * more cleanup * fix saving on multi-GPU * init_placeholder_token_embeds * add seed * fix flip * fix multi-gpu * add utility methods in wrapper * remove ipynb * don't use wrapper * dont pass vae an dunet to accelerate prepare * bring back accelerator.accumulate * scale latents * use only one progress bar for steps * push_to_hub at the end of training * remove unused args * log some important stats * store args in tensorboard * pretty comments * save the trained embeddings * mobe the script up * add requirements file * more cleanup * fux typo * begin readme * style -> learnable_property * keep vae and unet in eval mode * address review comments * address more comments * removed unused args * add train command in readme * update readme

Textual inversion (#266)
* add textual inversion script * make the loop work * make coarse_loss optional * save pipeline after training * add arg pretrained_model_name_or_path * fix saving * fix gradient_accumulation_steps * style * fix progress bar steps * scale lr * add argument to accept style * remove unused args * scale lr using num gpus * load tokenizer using args * add checks when converting init token to id * improve commnets and style * document args * more cleanup * fix default adamw arsg * TextualInversionWrapper -> CLIPTextualInversionWrapper * fix tokenizer loading * Use the CLIPTextModel instead of wrapper * clean dataset * remove commented code * fix accessing grads for multi-gpu * more cleanup * fix saving on multi-GPU * init_placeholder_token_embeds * add seed * fix flip * fix multi-gpu * add utility methods in wrapper * remove ipynb * don't use wrapper * dont pass vae an dunet to accelerate prepare * bring back accelerator.accumulate * scale latents * use only one progress bar for steps * push_to_hub at the end of training * remove unused args * log some important stats * store args in tensorboard * pretty comments * save the trained embeddings * mobe the script up * add requirements file * more cleanup * fux typo * begin readme * style -> learnable_property * keep vae and unet in eval mode * address review comments * address more comments * removed unused args * add train command in readme * update readme
d0d3e24e · Suraj Patil · GitHub · 5164c9fa · d0d3e24e · d0d3e24e
Unverified Commit d0d3e24e authored Sep 02, 2022 by Suraj Patil Committed by GitHub Sep 02, 2022
3 changed files
--- a/examples/textual_inversion/README.md
+++ b/examples/textual_inversion/README.md
+## Textual Inversion fine-tuning example
+[Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples.
+The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion.
+### Installing the dependencies
+Before running the scipts, make sure to install the library's training dependencies:
+```bash
+pip install diffusers[training] accelerate transformers
+```
+And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:
+```bash
+accelerate config
+```
+### Cat toy example
+You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. 
+You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
+Run the following command to autheticate your token
+```bash
+huggingface-cli login
+```
+If you have already cloned the repo, then you won't need to go through these steps. You can simple remove the `--use_auth_token` arg from the following command.
+<br>
+Now let's get our dataset.Download 3-4 images from [here](https://drive.google.com/drive/folders/1fmJMs25nxS_rSNqS5hTcRdLem_YQXbq5) and save them in a directory. This will be our training data.
+And launch the training using
+```bash
+export MODEL_NAME="CompVis/stable-diffusion-v1-4"
+export DATA_DIR="path-to-dir-containing-images"
+accelerate launch textual_inversion.py \
+  --pretrained_model_name_or_path=$MODEL_NAME --use_auth_token \
+  --train_data_dir=$DATA_DIR \
+  --learnable_property="object" \
+  --placeholder_token="<cat-toy>" --initializer_token="toy" \
+  --resolution=512 \
+  --train_batch_size=1 \
+  --gradient_accumulation_steps=2 \
+  --max_train_steps=3000 \
+  --learning_rate=5.0e-04 --scale_lr \
+  --lr_scheduler="constant" \
+  --lr_warmup_steps=0 \
+  --output_dir="textual_inversion_cat"
+```
+A full training run takes ~1 hour on one V100 GPU.
+### Inference
+Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `placeholder_token` in your prompt.
+```python
+from torch import autocast
+from diffusers import StableDiffusionPipeline
+model_id = "path-to-your-trained-model"
+pipe = pipe = StableDiffusionPipeline.from_pretrained(model_id,torch_dtype=torch.float16).to("cuda")
+prompt = "A <cat-toy> backpack"
+with autocast("cuda"):
+    image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5)["sample"][0]
+image.save("cat-backpack.png")
+```
\ No newline at end of file
--- a/examples/textual_inversion/requirements.txt
+++ b/examples/textual_inversion/requirements.txt
+accelerate
+torchvision
+transformers
--- a/examples/textual_inversion/textual_inversion.py
+++ b/examples/textual_inversion/textual_inversion.py