Unverified Commit 0a401b95 authored by M. Tolga Cangöz's avatar M. Tolga Cangöz Committed by GitHub
Browse files

[`Docs`] Fix typos (#6122)

Fix typos and trim trailing whitespaces
parent 664e931b
...@@ -77,7 +77,7 @@ Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggi ...@@ -77,7 +77,7 @@ Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggi
## Quickstart ## Quickstart
Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 15000+ checkpoints): Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 16000+ checkpoints):
```python ```python
from diffusers import DiffusionPipeline from diffusers import DiffusionPipeline
...@@ -219,7 +219,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9 ...@@ -219,7 +219,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9
- https://github.com/deep-floyd/IF - https://github.com/deep-floyd/IF
- https://github.com/bentoml/BentoML - https://github.com/bentoml/BentoML
- https://github.com/bmaltais/kohya_ss - https://github.com/bmaltais/kohya_ss
- +6000 other amazing GitHub repositories 💪 - +7000 other amazing GitHub repositories 💪
Thank you for using us ❤️. Thank you for using us ❤️.
......
...@@ -18,8 +18,7 @@ limitations under the License. ...@@ -18,8 +18,7 @@ limitations under the License.
Diffusers examples are a collection of scripts to demonstrate how to effectively use the `diffusers` library Diffusers examples are a collection of scripts to demonstrate how to effectively use the `diffusers` library
for a variety of use cases involving training or fine-tuning. for a variety of use cases involving training or fine-tuning.
**Note**: If you are looking for **official** examples on how to use `diffusers` for inference, **Note**: If you are looking for **official** examples on how to use `diffusers` for inference, please have a look at [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines).
please have a look at [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines).
Our examples aspire to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**. Our examples aspire to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**.
More specifically, this means: More specifically, this means:
...@@ -27,11 +26,10 @@ More specifically, this means: ...@@ -27,11 +26,10 @@ More specifically, this means:
- **Self-contained**: An example script shall only depend on "pip-install-able" Python packages that can be found in a `requirements.txt` file. Example scripts shall **not** depend on any local files. This means that one can simply download an example script, *e.g.* [train_unconditional.py](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/train_unconditional.py), install the required dependencies, *e.g.* [requirements.txt](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/requirements.txt) and execute the example script. - **Self-contained**: An example script shall only depend on "pip-install-able" Python packages that can be found in a `requirements.txt` file. Example scripts shall **not** depend on any local files. This means that one can simply download an example script, *e.g.* [train_unconditional.py](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/train_unconditional.py), install the required dependencies, *e.g.* [requirements.txt](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/requirements.txt) and execute the example script.
- **Easy-to-tweak**: While we strive to present as many use cases as possible, the example scripts are just that - examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, most of the examples fully expose the preprocessing of the data and the training loop to allow you to tweak and edit them as required. - **Easy-to-tweak**: While we strive to present as many use cases as possible, the example scripts are just that - examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, most of the examples fully expose the preprocessing of the data and the training loop to allow you to tweak and edit them as required.
- **Beginner-friendly**: We do not aim for providing state-of-the-art training scripts for the newest models, but rather examples that can be used as a way to better understand diffusion models and how to use them with the `diffusers` library. We often purposefully leave out certain state-of-the-art methods if we consider them too complex for beginners. - **Beginner-friendly**: We do not aim for providing state-of-the-art training scripts for the newest models, but rather examples that can be used as a way to better understand diffusion models and how to use them with the `diffusers` library. We often purposefully leave out certain state-of-the-art methods if we consider them too complex for beginners.
- **One-purpose-only**: Examples should show one task and one task only. Even if a task is from a modeling - **One-purpose-only**: Examples should show one task and one task only. Even if a task is from a modeling point of view very similar, *e.g.* image super-resolution and image modification tend to use the same model and training method, we want examples to showcase only one task to keep them as readable and easy-to-understand as possible.
point of view very similar, *e.g.* image super-resolution and image modification tend to use the same model and training method, we want examples to showcase only one task to keep them as readable and easy-to-understand as possible.
We provide **official** examples that cover the most popular tasks of diffusion models. We provide **official** examples that cover the most popular tasks of diffusion models.
*Official* examples are **actively** maintained by the `diffusers` maintainers and we try to rigorously follow our example philosophy as defined above. *Official* examples are **actively** maintained by the `diffusers` maintainers and we try to rigorously follow our example philosophy as defined above.
If you feel like another important example should exist, we are more than happy to welcome a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) or directly a [Pull Request](https://github.com/huggingface/diffusers/compare) from you! If you feel like another important example should exist, we are more than happy to welcome a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) or directly a [Pull Request](https://github.com/huggingface/diffusers/compare) from you!
Training examples show how to pretrain or fine-tune diffusion models for a variety of tasks. Currently we support: Training examples show how to pretrain or fine-tune diffusion models for a variety of tasks. Currently we support:
...@@ -39,7 +37,7 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie ...@@ -39,7 +37,7 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie
| Task | 🤗 Accelerate | 🤗 Datasets | Colab | Task | 🤗 Accelerate | 🤗 Datasets | Colab
|---|---|:---:|:---:| |---|---|:---:|:---:|
| [**Unconditional Image Generation**](./unconditional_image_generation) | ✅ | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) | [**Unconditional Image Generation**](./unconditional_image_generation) | ✅ | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb)
| [**Text-to-Image fine-tuning**](./text_to_image) | ✅ | ✅ | | [**Text-to-Image fine-tuning**](./text_to_image) | ✅ | ✅ |
| [**Textual Inversion**](./textual_inversion) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) | [**Textual Inversion**](./textual_inversion) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb)
| [**Dreambooth**](./dreambooth) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_dreambooth_training.ipynb) | [**Dreambooth**](./dreambooth) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_dreambooth_training.ipynb)
| [**ControlNet**](./controlnet) | ✅ | ✅ | - | [**ControlNet**](./controlnet) | ✅ | ✅ | -
......
...@@ -8,13 +8,13 @@ If a community doesn't work as expected, please open an issue and ping the autho ...@@ -8,13 +8,13 @@ If a community doesn't work as expected, please open an issue and ping the autho
| Example | Description | Code Example | Colab | Author | | Example | Description | Code Example | Colab | Author |
|:--------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------:| |:--------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------:|
| LLM-grounded Diffusion (LMD+) | LMD greatly improves the prompt following ability of text-to-image generation models by introducing an LLM as a front-end prompt parser and layout planner. [Project page.](https://llm-grounded-diffusion.github.io/) [See our full codebase (also with diffusers).](https://github.com/TonyLianLong/LLM-groundedDiffusion) | [LLM-grounded Diffusion (LMD+)](#llm-grounded-diffusion) | [Huggingface Demo](https://huggingface.co/spaces/longlian/llm-grounded-diffusion) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1SXzMSeAB-LJYISb2yrUOdypLz4OYWUKj) | [Long (Tony) Lian](https://tonylian.com/) | | LLM-grounded Diffusion (LMD+) | LMD greatly improves the prompt following ability of text-to-image generation models by introducing an LLM as a front-end prompt parser and layout planner. [Project page.](https://llm-grounded-diffusion.github.io/) [See our full codebase (also with diffusers).](https://github.com/TonyLianLong/LLM-groundedDiffusion) | [LLM-grounded Diffusion (LMD+)](#llm-grounded-diffusion) | [Huggingface Demo](https://huggingface.co/spaces/longlian/llm-grounded-diffusion) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1SXzMSeAB-LJYISb2yrUOdypLz4OYWUKj) | [Long (Tony) Lian](https://tonylian.com/) |
| CLIP Guided Stable Diffusion | Doing CLIP guidance for text to image generation with Stable Diffusion | [CLIP Guided Stable Diffusion](#clip-guided-stable-diffusion) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/CLIP_Guided_Stable_diffusion_with_diffusers.ipynb) | [Suraj Patil](https://github.com/patil-suraj/) | | CLIP Guided Stable Diffusion | Doing CLIP guidance for text to image generation with Stable Diffusion | [CLIP Guided Stable Diffusion](#clip-guided-stable-diffusion) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/CLIP_Guided_Stable_diffusion_with_diffusers.ipynb) | [Suraj Patil](https://github.com/patil-suraj/) |
| One Step U-Net (Dummy) | Example showcasing of how to use Community Pipelines (see https://github.com/huggingface/diffusers/issues/841) | [One Step U-Net](#one-step-unet) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) | | One Step U-Net (Dummy) | Example showcasing of how to use Community Pipelines (see https://github.com/huggingface/diffusers/issues/841) | [One Step U-Net](#one-step-unet) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) |
| Stable Diffusion Interpolation | Interpolate the latent space of Stable Diffusion between different prompts/seeds | [Stable Diffusion Interpolation](#stable-diffusion-interpolation) | - | [Nate Raw](https://github.com/nateraw/) | | Stable Diffusion Interpolation | Interpolate the latent space of Stable Diffusion between different prompts/seeds | [Stable Diffusion Interpolation](#stable-diffusion-interpolation) | - | [Nate Raw](https://github.com/nateraw/) |
| Stable Diffusion Mega | **One** Stable Diffusion Pipeline with all functionalities of [Text2Image](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py), [Image2Image](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py) and [Inpainting](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py) | [Stable Diffusion Mega](#stable-diffusion-mega) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) | | Stable Diffusion Mega | **One** Stable Diffusion Pipeline with all functionalities of [Text2Image](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py), [Image2Image](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py) and [Inpainting](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py) | [Stable Diffusion Mega](#stable-diffusion-mega) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) |
| Long Prompt Weighting Stable Diffusion | **One** Stable Diffusion Pipeline without tokens length limit, and support parsing weighting in prompt. | [Long Prompt Weighting Stable Diffusion](#long-prompt-weighting-stable-diffusion) | - | [SkyTNT](https://github.com/SkyTNT) | | Long Prompt Weighting Stable Diffusion | **One** Stable Diffusion Pipeline without tokens length limit, and support parsing weighting in prompt. | [Long Prompt Weighting Stable Diffusion](#long-prompt-weighting-stable-diffusion) | - | [SkyTNT](https://github.com/SkyTNT) |
| Speech to Image | Using automatic-speech-recognition to transcribe text and Stable Diffusion to generate images | [Speech to Image](#speech-to-image) | - | [Mikail Duzenli](https://github.com/MikailINTech) | Speech to Image | Using automatic-speech-recognition to transcribe text and Stable Diffusion to generate images | [Speech to Image](#speech-to-image) | - | [Mikail Duzenli](https://github.com/MikailINTech)
| Wild Card Stable Diffusion | Stable Diffusion Pipeline that supports prompts that contain wildcard terms (indicated by surrounding double underscores), with values instantiated randomly from a corresponding txt file or a dictionary of possible values | [Wildcard Stable Diffusion](#wildcard-stable-diffusion) | - | [Shyam Sudhakaran](https://github.com/shyamsn97) | | Wild Card Stable Diffusion | Stable Diffusion Pipeline that supports prompts that contain wildcard terms (indicated by surrounding double underscores), with values instantiated randomly from a corresponding txt file or a dictionary of possible values | [Wildcard Stable Diffusion](#wildcard-stable-diffusion) | - | [Shyam Sudhakaran](https://github.com/shyamsn97) |
| [Composable Stable Diffusion](https://energy-based-model.github.io/Compositional-Visual-Generation-with-Composable-Diffusion-Models/) | Stable Diffusion Pipeline that supports prompts that contain "&#124;" in prompts (as an AND condition) and weights (separated by "&#124;" as well) to positively / negatively weight prompts. | [Composable Stable Diffusion](#composable-stable-diffusion) | - | [Mark Rich](https://github.com/MarkRich) | | [Composable Stable Diffusion](https://energy-based-model.github.io/Compositional-Visual-Generation-with-Composable-Diffusion-Models/) | Stable Diffusion Pipeline that supports prompts that contain "&#124;" in prompts (as an AND condition) and weights (separated by "&#124;" as well) to positively / negatively weight prompts. | [Composable Stable Diffusion](#composable-stable-diffusion) | - | [Mark Rich](https://github.com/MarkRich) |
| Seed Resizing Stable Diffusion | Stable Diffusion Pipeline that supports resizing an image and retaining the concepts of the 512 by 512 generation. | [Seed Resizing](#seed-resizing) | - | [Mark Rich](https://github.com/MarkRich) | | Seed Resizing Stable Diffusion | Stable Diffusion Pipeline that supports resizing an image and retaining the concepts of the 512 by 512 generation. | [Seed Resizing](#seed-resizing) | - | [Mark Rich](https://github.com/MarkRich) |
...@@ -24,27 +24,27 @@ If a community doesn't work as expected, please open an issue and ping the autho ...@@ -24,27 +24,27 @@ If a community doesn't work as expected, please open an issue and ping the autho
| Text Based Inpainting Stable Diffusion | Stable Diffusion Inpainting Pipeline that enables passing a text prompt to generate the mask for inpainting | [Text Based Inpainting Stable Diffusion](#image-to-image-inpainting-stable-diffusion) | - | [Dhruv Karan](https://github.com/unography) | | Text Based Inpainting Stable Diffusion | Stable Diffusion Inpainting Pipeline that enables passing a text prompt to generate the mask for inpainting | [Text Based Inpainting Stable Diffusion](#image-to-image-inpainting-stable-diffusion) | - | [Dhruv Karan](https://github.com/unography) |
| Bit Diffusion | Diffusion on discrete data | [Bit Diffusion](#bit-diffusion) | - | [Stuti R.](https://github.com/kingstut) | | Bit Diffusion | Diffusion on discrete data | [Bit Diffusion](#bit-diffusion) | - | [Stuti R.](https://github.com/kingstut) |
| K-Diffusion Stable Diffusion | Run Stable Diffusion with any of [K-Diffusion's samplers](https://github.com/crowsonkb/k-diffusion/blob/master/k_diffusion/sampling.py) | [Stable Diffusion with K Diffusion](#stable-diffusion-with-k-diffusion) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) | | K-Diffusion Stable Diffusion | Run Stable Diffusion with any of [K-Diffusion's samplers](https://github.com/crowsonkb/k-diffusion/blob/master/k_diffusion/sampling.py) | [Stable Diffusion with K Diffusion](#stable-diffusion-with-k-diffusion) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) |
| Checkpoint Merger Pipeline | Diffusion Pipeline that enables merging of saved model checkpoints | [Checkpoint Merger Pipeline](#checkpoint-merger-pipeline) | - | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) | | Checkpoint Merger Pipeline | Diffusion Pipeline that enables merging of saved model checkpoints | [Checkpoint Merger Pipeline](#checkpoint-merger-pipeline) | - | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) |
Stable Diffusion v1.1-1.4 Comparison | Run all 4 model checkpoints for Stable Diffusion and compare their results together | [Stable Diffusion Comparison](#stable-diffusion-comparisons) | - | [Suvaditya Mukherjee](https://github.com/suvadityamuk) | Stable Diffusion v1.1-1.4 Comparison | Run all 4 model checkpoints for Stable Diffusion and compare their results together | [Stable Diffusion Comparison](#stable-diffusion-comparisons) | - | [Suvaditya Mukherjee](https://github.com/suvadityamuk) |
MagicMix | Diffusion Pipeline for semantic mixing of an image and a text prompt | [MagicMix](#magic-mix) | - | [Partho Das](https://github.com/daspartho) | MagicMix | Diffusion Pipeline for semantic mixing of an image and a text prompt | [MagicMix](#magic-mix) | - | [Partho Das](https://github.com/daspartho) |
| Stable UnCLIP | Diffusion Pipeline for combining prior model (generate clip image embedding from text, UnCLIPPipeline `"kakaobrain/karlo-v1-alpha"`) and decoder pipeline (decode clip image embedding to image, StableDiffusionImageVariationPipeline `"lambdalabs/sd-image-variations-diffusers"` ). | [Stable UnCLIP](#stable-unclip) | - | [Ray Wang](https://wrong.wang) | | Stable UnCLIP | Diffusion Pipeline for combining prior model (generate clip image embedding from text, UnCLIPPipeline `"kakaobrain/karlo-v1-alpha"`) and decoder pipeline (decode clip image embedding to image, StableDiffusionImageVariationPipeline `"lambdalabs/sd-image-variations-diffusers"` ). | [Stable UnCLIP](#stable-unclip) | - | [Ray Wang](https://wrong.wang) |
| UnCLIP Text Interpolation Pipeline | Diffusion Pipeline that allows passing two prompts and produces images while interpolating between the text-embeddings of the two prompts | [UnCLIP Text Interpolation Pipeline](#unclip-text-interpolation-pipeline) | - | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) | | UnCLIP Text Interpolation Pipeline | Diffusion Pipeline that allows passing two prompts and produces images while interpolating between the text-embeddings of the two prompts | [UnCLIP Text Interpolation Pipeline](#unclip-text-interpolation-pipeline) | - | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) |
| UnCLIP Image Interpolation Pipeline | Diffusion Pipeline that allows passing two images/image_embeddings and produces images while interpolating between their image-embeddings | [UnCLIP Image Interpolation Pipeline](#unclip-image-interpolation-pipeline) | - | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) | | UnCLIP Image Interpolation Pipeline | Diffusion Pipeline that allows passing two images/image_embeddings and produces images while interpolating between their image-embeddings | [UnCLIP Image Interpolation Pipeline](#unclip-image-interpolation-pipeline) | - | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) |
| DDIM Noise Comparative Analysis Pipeline | Investigating how the diffusion models learn visual concepts from each noise level (which is a contribution of [P2 weighting (CVPR 2022)](https://arxiv.org/abs/2204.00227)) | [DDIM Noise Comparative Analysis Pipeline](#ddim-noise-comparative-analysis-pipeline) | - | [Aengus (Duc-Anh)](https://github.com/aengusng8) | | DDIM Noise Comparative Analysis Pipeline | Investigating how the diffusion models learn visual concepts from each noise level (which is a contribution of [P2 weighting (CVPR 2022)](https://arxiv.org/abs/2204.00227)) | [DDIM Noise Comparative Analysis Pipeline](#ddim-noise-comparative-analysis-pipeline) | - | [Aengus (Duc-Anh)](https://github.com/aengusng8) |
| CLIP Guided Img2Img Stable Diffusion Pipeline | Doing CLIP guidance for image to image generation with Stable Diffusion | [CLIP Guided Img2Img Stable Diffusion](#clip-guided-img2img-stable-diffusion) | - | [Nipun Jindal](https://github.com/nipunjindal/) | | CLIP Guided Img2Img Stable Diffusion Pipeline | Doing CLIP guidance for image to image generation with Stable Diffusion | [CLIP Guided Img2Img Stable Diffusion](#clip-guided-img2img-stable-diffusion) | - | [Nipun Jindal](https://github.com/nipunjindal/) |
| TensorRT Stable Diffusion Text to Image Pipeline | Accelerates the Stable Diffusion Text2Image Pipeline using TensorRT | [TensorRT Stable Diffusion Text to Image Pipeline](#tensorrt-text2image-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) | | TensorRT Stable Diffusion Text to Image Pipeline | Accelerates the Stable Diffusion Text2Image Pipeline using TensorRT | [TensorRT Stable Diffusion Text to Image Pipeline](#tensorrt-text2image-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) |
| EDICT Image Editing Pipeline | Diffusion pipeline for text-guided image editing | [EDICT Image Editing Pipeline](#edict-image-editing-pipeline) | - | [Joqsan Azocar](https://github.com/Joqsan) | | EDICT Image Editing Pipeline | Diffusion pipeline for text-guided image editing | [EDICT Image Editing Pipeline](#edict-image-editing-pipeline) | - | [Joqsan Azocar](https://github.com/Joqsan) |
| Stable Diffusion RePaint | Stable Diffusion pipeline using [RePaint](https://arxiv.org/abs/2201.0986) for inpainting. | [Stable Diffusion RePaint](#stable-diffusion-repaint ) | - | [Markus Pobitzer](https://github.com/Markus-Pobitzer) | | Stable Diffusion RePaint | Stable Diffusion pipeline using [RePaint](https://arxiv.org/abs/2201.0986) for inpainting. | [Stable Diffusion RePaint](#stable-diffusion-repaint ) | - | [Markus Pobitzer](https://github.com/Markus-Pobitzer) |
| TensorRT Stable Diffusion Image to Image Pipeline | Accelerates the Stable Diffusion Image2Image Pipeline using TensorRT | [TensorRT Stable Diffusion Image to Image Pipeline](#tensorrt-image2image-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) | | TensorRT Stable Diffusion Image to Image Pipeline | Accelerates the Stable Diffusion Image2Image Pipeline using TensorRT | [TensorRT Stable Diffusion Image to Image Pipeline](#tensorrt-image2image-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) |
| Stable Diffusion IPEX Pipeline | Accelerate Stable Diffusion inference pipeline with BF16/FP32 precision on Intel Xeon CPUs with [IPEX](https://github.com/intel/intel-extension-for-pytorch) | [Stable Diffusion on IPEX](#stable-diffusion-on-ipex) | - | [Yingjie Han](https://github.com/yingjie-han/) | | Stable Diffusion IPEX Pipeline | Accelerate Stable Diffusion inference pipeline with BF16/FP32 precision on Intel Xeon CPUs with [IPEX](https://github.com/intel/intel-extension-for-pytorch) | [Stable Diffusion on IPEX](#stable-diffusion-on-ipex) | - | [Yingjie Han](https://github.com/yingjie-han/) |
| CLIP Guided Images Mixing Stable Diffusion Pipeline | Сombine images using usual diffusion models. | [CLIP Guided Images Mixing Using Stable Diffusion](#clip-guided-images-mixing-with-stable-diffusion) | - | [Karachev Denis](https://github.com/TheDenk) | | CLIP Guided Images Mixing Stable Diffusion Pipeline | Сombine images using usual diffusion models. | [CLIP Guided Images Mixing Using Stable Diffusion](#clip-guided-images-mixing-with-stable-diffusion) | - | [Karachev Denis](https://github.com/TheDenk) |
| TensorRT Stable Diffusion Inpainting Pipeline | Accelerates the Stable Diffusion Inpainting Pipeline using TensorRT | [TensorRT Stable Diffusion Inpainting Pipeline](#tensorrt-inpainting-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) | | TensorRT Stable Diffusion Inpainting Pipeline | Accelerates the Stable Diffusion Inpainting Pipeline using TensorRT | [TensorRT Stable Diffusion Inpainting Pipeline](#tensorrt-inpainting-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) |
| IADB Pipeline | Implementation of [Iterative α-(de)Blending: a Minimalist Deterministic Diffusion Model](https://arxiv.org/abs/2305.03486) | [IADB Pipeline](#iadb-pipeline) | - | [Thomas Chambon](https://github.com/tchambon) | IADB Pipeline | Implementation of [Iterative α-(de)Blending: a Minimalist Deterministic Diffusion Model](https://arxiv.org/abs/2305.03486) | [IADB Pipeline](#iadb-pipeline) | - | [Thomas Chambon](https://github.com/tchambon)
| Zero1to3 Pipeline | Implementation of [Zero-1-to-3: Zero-shot One Image to 3D Object](https://arxiv.org/abs/2303.11328) | [Zero1to3 Pipeline](#Zero1to3-pipeline) | - | [Xin Kong](https://github.com/kxhit) | | Zero1to3 Pipeline | Implementation of [Zero-1-to-3: Zero-shot One Image to 3D Object](https://arxiv.org/abs/2303.11328) | [Zero1to3 Pipeline](#Zero1to3-pipeline) | - | [Xin Kong](https://github.com/kxhit) |
Stable Diffusion XL Long Weighted Prompt Pipeline | A pipeline support unlimited length of prompt and negative prompt, use A1111 style of prompt weighting | [Stable Diffusion XL Long Weighted Prompt Pipeline](#stable-diffusion-xl-long-weighted-prompt-pipeline) | - | [Andrew Zhu](https://xhinker.medium.com/) | Stable Diffusion XL Long Weighted Prompt Pipeline | A pipeline support unlimited length of prompt and negative prompt, use A1111 style of prompt weighting | [Stable Diffusion XL Long Weighted Prompt Pipeline](#stable-diffusion-xl-long-weighted-prompt-pipeline) | - | [Andrew Zhu](https://xhinker.medium.com/) |
FABRIC - Stable Diffusion with feedback Pipeline | pipeline supports feedback from liked and disliked images | [Stable Diffusion Fabric Pipeline](#stable-diffusion-fabric-pipeline) | - | [Shauray Singh](https://shauray8.github.io/about_shauray/) | FABRIC - Stable Diffusion with feedback Pipeline | pipeline supports feedback from liked and disliked images | [Stable Diffusion Fabric Pipeline](#stable-diffusion-fabric-pipeline) | - | [Shauray Singh](https://shauray8.github.io/about_shauray/) |
sketch inpaint - Inpainting with non-inpaint Stable Diffusion | sketch inpaint much like in automatic1111 | [Masked Im2Im Stable Diffusion Pipeline](#stable-diffusion-masked-im2im) | - | [Anatoly Belikov](https://github.com/noskill) | sketch inpaint - Inpainting with non-inpaint Stable Diffusion | sketch inpaint much like in automatic1111 | [Masked Im2Im Stable Diffusion Pipeline](#stable-diffusion-masked-im2im) | - | [Anatoly Belikov](https://github.com/noskill) |
prompt-to-prompt | change parts of a prompt and retain image structure (see [paper page](https://prompt-to-prompt.github.io/)) | [Prompt2Prompt Pipeline](#prompt2prompt-pipeline) | - | [Umer H. Adil](https://twitter.com/UmerHAdil) | prompt-to-prompt | change parts of a prompt and retain image structure (see [paper page](https://prompt-to-prompt.github.io/)) | [Prompt2Prompt Pipeline](#prompt2prompt-pipeline) | - | [Umer H. Adil](https://twitter.com/UmerHAdil) |
| Latent Consistency Pipeline | Implementation of [Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference](https://arxiv.org/abs/2310.04378) | [Latent Consistency Pipeline](#latent-consistency-pipeline) | - | [Simian Luo](https://github.com/luosiallen) | | Latent Consistency Pipeline | Implementation of [Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference](https://arxiv.org/abs/2310.04378) | [Latent Consistency Pipeline](#latent-consistency-pipeline) | - | [Simian Luo](https://github.com/luosiallen) |
| Latent Consistency Img2img Pipeline | Img2img pipeline for Latent Consistency Models | [Latent Consistency Img2Img Pipeline](#latent-consistency-img2img-pipeline) | - | [Logan Zoellner](https://github.com/nagolinc) | | Latent Consistency Img2img Pipeline | Img2img pipeline for Latent Consistency Models | [Latent Consistency Img2Img Pipeline](#latent-consistency-img2img-pipeline) | - | [Logan Zoellner](https://github.com/nagolinc) |
| Latent Consistency Interpolation Pipeline | Interpolate the latent space of Latent Consistency Models with multiple prompts | [Latent Consistency Interpolation Pipeline](#latent-consistency-interpolation-pipeline) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1pK3NrLWJSiJsBynLns1K1-IDTW9zbPvl?usp=sharing) | [Aryan V S](https://github.com/a-r-r-o-w) | | Latent Consistency Interpolation Pipeline | Interpolate the latent space of Latent Consistency Models with multiple prompts | [Latent Consistency Interpolation Pipeline](#latent-consistency-interpolation-pipeline) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1pK3NrLWJSiJsBynLns1K1-IDTW9zbPvl?usp=sharing) | [Aryan V S](https://github.com/a-r-r-o-w) |
...@@ -77,7 +77,7 @@ import torch ...@@ -77,7 +77,7 @@ import torch
from diffusers import DiffusionPipeline from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained( pipe = DiffusionPipeline.from_pretrained(
"longlian/lmd_plus", "longlian/lmd_plus",
custom_pipeline="llm_grounded_diffusion", custom_pipeline="llm_grounded_diffusion",
custom_revision="main", custom_revision="main",
variant="fp16", torch_dtype=torch.float16 variant="fp16", torch_dtype=torch.float16
...@@ -112,7 +112,7 @@ import torch ...@@ -112,7 +112,7 @@ import torch
from diffusers import DiffusionPipeline from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained( pipe = DiffusionPipeline.from_pretrained(
"longlian/lmd_plus", "longlian/lmd_plus",
custom_pipeline="llm_grounded_diffusion", custom_pipeline="llm_grounded_diffusion",
variant="fp16", torch_dtype=torch.float16 variant="fp16", torch_dtype=torch.float16
) )
...@@ -139,7 +139,7 @@ images[0].save("./lmd_plus_generation.jpg") ...@@ -139,7 +139,7 @@ images[0].save("./lmd_plus_generation.jpg")
### CLIP Guided Stable Diffusion ### CLIP Guided Stable Diffusion
CLIP guided stable diffusion can help to generate more realistic images CLIP guided stable diffusion can help to generate more realistic images
by guiding stable diffusion at every denoising step with an additional CLIP model. by guiding stable diffusion at every denoising step with an additional CLIP model.
The following code requires roughly 12GB of GPU RAM. The following code requires roughly 12GB of GPU RAM.
...@@ -159,7 +159,7 @@ guided_pipeline = DiffusionPipeline.from_pretrained( ...@@ -159,7 +159,7 @@ guided_pipeline = DiffusionPipeline.from_pretrained(
custom_pipeline="clip_guided_stable_diffusion", custom_pipeline="clip_guided_stable_diffusion",
clip_model=clip_model, clip_model=clip_model,
feature_extractor=feature_extractor, feature_extractor=feature_extractor,
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
guided_pipeline.enable_attention_slicing() guided_pipeline.enable_attention_slicing()
...@@ -180,7 +180,7 @@ for i in range(4): ...@@ -180,7 +180,7 @@ for i in range(4):
generator=generator, generator=generator,
).images[0] ).images[0]
images.append(image) images.append(image)
# save images locally # save images locally
for i, img in enumerate(images): for i, img in enumerate(images):
img.save(f"./clip_guided_sd/image_{i}.png") img.save(f"./clip_guided_sd/image_{i}.png")
...@@ -234,7 +234,7 @@ frame_filepaths = pipe.walk( ...@@ -234,7 +234,7 @@ frame_filepaths = pipe.walk(
) )
``` ```
The output of the `walk(...)` function returns a list of images saved under the folder as defined in `output_dir`. You can use these images to create videos of stable diffusion. The output of the `walk(...)` function returns a list of images saved under the folder as defined in `output_dir`. You can use these images to create videos of stable diffusion.
> **Please have a look at https://github.com/nateraw/stable-diffusion-videos for more in-detail information on how to create videos using stable diffusion as well as more feature-complete functionality.** > **Please have a look at https://github.com/nateraw/stable-diffusion-videos for more in-detail information on how to create videos using stable diffusion as well as more feature-complete functionality.**
...@@ -310,7 +310,7 @@ import torch ...@@ -310,7 +310,7 @@ import torch
pipe = DiffusionPipeline.from_pretrained( pipe = DiffusionPipeline.from_pretrained(
'hakurei/waifu-diffusion', 'hakurei/waifu-diffusion',
custom_pipeline="lpw_stable_diffusion", custom_pipeline="lpw_stable_diffusion",
torch_dtype=torch.float16 torch_dtype=torch.float16
) )
pipe=pipe.to("cuda") pipe=pipe.to("cuda")
...@@ -377,7 +377,7 @@ diffuser_pipeline = DiffusionPipeline.from_pretrained( ...@@ -377,7 +377,7 @@ diffuser_pipeline = DiffusionPipeline.from_pretrained(
custom_pipeline="speech_to_image_diffusion", custom_pipeline="speech_to_image_diffusion",
speech_model=model, speech_model=model,
speech_processor=processor, speech_processor=processor,
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
...@@ -435,7 +435,7 @@ import torch ...@@ -435,7 +435,7 @@ import torch
pipe = DiffusionPipeline.from_pretrained( pipe = DiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4", "CompVis/stable-diffusion-v1-4",
custom_pipeline="wildcard_stable_diffusion", custom_pipeline="wildcard_stable_diffusion",
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
prompt = "__animal__ sitting on a __object__ wearing a __clothing__" prompt = "__animal__ sitting on a __object__ wearing a __clothing__"
...@@ -449,7 +449,7 @@ out = pipe( ...@@ -449,7 +449,7 @@ out = pipe(
) )
``` ```
### Composable Stable diffusion ### Composable Stable diffusion
[Composable Stable Diffusion](https://energy-based-model.github.io/Compositional-Visual-Generation-with-Composable-Diffusion-Models/) proposes conjunction and negation (negative prompts) operators for compositional generation with conditional diffusion models. [Composable Stable Diffusion](https://energy-based-model.github.io/Compositional-Visual-Generation-with-Composable-Diffusion-Models/) proposes conjunction and negation (negative prompts) operators for compositional generation with conditional diffusion models.
...@@ -499,7 +499,7 @@ tvu.save_image(grid, f'{prompt}_{args.weights}' + '.png') ...@@ -499,7 +499,7 @@ tvu.save_image(grid, f'{prompt}_{args.weights}' + '.png')
``` ```
### Imagic Stable Diffusion ### Imagic Stable Diffusion
Allows you to edit an image using stable diffusion. Allows you to edit an image using stable diffusion.
```python ```python
import requests import requests
...@@ -539,7 +539,7 @@ image = res.images[0] ...@@ -539,7 +539,7 @@ image = res.images[0]
image.save('./imagic/imagic_image_alpha_2.png') image.save('./imagic/imagic_image_alpha_2.png')
``` ```
### Seed Resizing ### Seed Resizing
Test seed resizing. Originally generate an image in 512 by 512, then generate image with same seed at 512 by 592 using seed resizing. Finally, generate 512 by 592 using original stable diffusion pipeline. Test seed resizing. Originally generate an image in 512 by 512, then generate image with same seed at 512 by 592 using seed resizing. Finally, generate 512 by 592 using original stable diffusion pipeline.
```python ```python
...@@ -667,14 +667,14 @@ diffuser_pipeline = DiffusionPipeline.from_pretrained( ...@@ -667,14 +667,14 @@ diffuser_pipeline = DiffusionPipeline.from_pretrained(
detection_pipeline=language_detection_pipeline, detection_pipeline=language_detection_pipeline,
translation_model=trans_model, translation_model=trans_model,
translation_tokenizer=trans_tokenizer, translation_tokenizer=trans_tokenizer,
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
diffuser_pipeline.enable_attention_slicing() diffuser_pipeline.enable_attention_slicing()
diffuser_pipeline = diffuser_pipeline.to(device) diffuser_pipeline = diffuser_pipeline.to(device)
prompt = ["a photograph of an astronaut riding a horse", prompt = ["a photograph of an astronaut riding a horse",
"Una casa en la playa", "Una casa en la playa",
"Ein Hund, der Orange isst", "Ein Hund, der Orange isst",
"Un restaurant parisien"] "Un restaurant parisien"]
...@@ -715,7 +715,7 @@ mask_image = PIL.Image.open(mask_path).convert("RGB").resize((512, 512)) ...@@ -715,7 +715,7 @@ mask_image = PIL.Image.open(mask_path).convert("RGB").resize((512, 512))
pipe = DiffusionPipeline.from_pretrained( pipe = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-inpainting", "runwayml/stable-diffusion-inpainting",
custom_pipeline="img2img_inpainting", custom_pipeline="img2img_inpainting",
torch_dtype=torch.float16 torch_dtype=torch.float16
) )
pipe = pipe.to("cuda") pipe = pipe.to("cuda")
...@@ -758,8 +758,8 @@ prompt = "a cup" # the masked out region will be replaced with this ...@@ -758,8 +758,8 @@ prompt = "a cup" # the masked out region will be replaced with this
image = pipe(image=image, text=text, prompt=prompt).images[0] image = pipe(image=image, text=text, prompt=prompt).images[0]
``` ```
### Bit Diffusion ### Bit Diffusion
Based https://arxiv.org/abs/2208.04202, this is used for diffusion on discrete data - eg, discreate image data, DNA sequence data. An unconditional discreate image can be generated like this: Based https://arxiv.org/abs/2208.04202, this is used for diffusion on discrete data - eg, discreate image data, DNA sequence data. An unconditional discreate image can be generated like this:
```python ```python
from diffusers import DiffusionPipeline from diffusers import DiffusionPipeline
...@@ -837,8 +837,8 @@ Usage:- ...@@ -837,8 +837,8 @@ Usage:-
```python ```python
from diffusers import DiffusionPipeline from diffusers import DiffusionPipeline
#Return a CheckpointMergerPipeline class that allows you to merge checkpoints. #Return a CheckpointMergerPipeline class that allows you to merge checkpoints.
#The checkpoint passed here is ignored. But still pass one of the checkpoints you plan to #The checkpoint passed here is ignored. But still pass one of the checkpoints you plan to
#merge for convenience #merge for convenience
pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", custom_pipeline="checkpoint_merger") pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", custom_pipeline="checkpoint_merger")
...@@ -861,16 +861,16 @@ image = merged_pipe(prompt).images[0] ...@@ -861,16 +861,16 @@ image = merged_pipe(prompt).images[0]
``` ```
Some examples along with the merge details: Some examples along with the merge details:
1. "CompVis/stable-diffusion-v1-4" + "hakurei/waifu-diffusion" ; Sigmoid interpolation; alpha = 0.8 1. "CompVis/stable-diffusion-v1-4" + "hakurei/waifu-diffusion" ; Sigmoid interpolation; alpha = 0.8
![Stable plus Waifu Sigmoid 0.8](https://huggingface.co/datasets/NagaSaiAbhinay/CheckpointMergerSamples/resolve/main/stability_v1_4_waifu_sig_0.8.png) ![Stable plus Waifu Sigmoid 0.8](https://huggingface.co/datasets/NagaSaiAbhinay/CheckpointMergerSamples/resolve/main/stability_v1_4_waifu_sig_0.8.png)
2. "hakurei/waifu-diffusion" + "prompthero/openjourney" ; Inverse Sigmoid interpolation; alpha = 0.8 2. "hakurei/waifu-diffusion" + "prompthero/openjourney" ; Inverse Sigmoid interpolation; alpha = 0.8
![Stable plus Waifu Sigmoid 0.8](https://huggingface.co/datasets/NagaSaiAbhinay/CheckpointMergerSamples/resolve/main/waifu_openjourney_inv_sig_0.8.png) ![Stable plus Waifu Sigmoid 0.8](https://huggingface.co/datasets/NagaSaiAbhinay/CheckpointMergerSamples/resolve/main/waifu_openjourney_inv_sig_0.8.png)
3. "CompVis/stable-diffusion-v1-4" + "hakurei/waifu-diffusion" + "prompthero/openjourney"; Add Difference interpolation; alpha = 0.5 3. "CompVis/stable-diffusion-v1-4" + "hakurei/waifu-diffusion" + "prompthero/openjourney"; Add Difference interpolation; alpha = 0.5
![Stable plus Waifu plus openjourney add_diff 0.5](https://huggingface.co/datasets/NagaSaiAbhinay/CheckpointMergerSamples/resolve/main/stable_waifu_openjourney_add_diff_0.5.png) ![Stable plus Waifu plus openjourney add_diff 0.5](https://huggingface.co/datasets/NagaSaiAbhinay/CheckpointMergerSamples/resolve/main/stable_waifu_openjourney_add_diff_0.5.png)
...@@ -937,8 +937,8 @@ pipe = DiffusionPipeline.from_pretrained( ...@@ -937,8 +937,8 @@ pipe = DiffusionPipeline.from_pretrained(
img = Image.open('phone.jpg') img = Image.open('phone.jpg')
mix_img = pipe( mix_img = pipe(
img, img,
prompt = 'bed', prompt = 'bed',
kmin = 0.3, kmin = 0.3,
kmax = 0.5, kmax = 0.5,
mix_factor = 0.5, mix_factor = 0.5,
...@@ -1049,7 +1049,7 @@ print(pipeline.prior_scheduler) ...@@ -1049,7 +1049,7 @@ print(pipeline.prior_scheduler)
### UnCLIP Text Interpolation Pipeline ### UnCLIP Text Interpolation Pipeline
This Diffusion Pipeline takes two prompts and interpolates between the two input prompts using spherical interpolation ( slerp ). The input prompts are converted to text embeddings by the pipeline's text_encoder and the interpolation is done on the resulting text_embeddings over the number of steps specified. Defaults to 5 steps. This Diffusion Pipeline takes two prompts and interpolates between the two input prompts using spherical interpolation ( slerp ). The input prompts are converted to text embeddings by the pipeline's text_encoder and the interpolation is done on the resulting text_embeddings over the number of steps specified. Defaults to 5 steps.
```python ```python
import torch import torch
...@@ -1086,7 +1086,7 @@ The resulting images in order:- ...@@ -1086,7 +1086,7 @@ The resulting images in order:-
### UnCLIP Image Interpolation Pipeline ### UnCLIP Image Interpolation Pipeline
This Diffusion Pipeline takes two images or an image_embeddings tensor of size 2 and interpolates between their embeddings using spherical interpolation ( slerp ). The input images/image_embeddings are converted to image embeddings by the pipeline's image_encoder and the interpolation is done on the resulting image_embeddings over the number of steps specified. Defaults to 5 steps. This Diffusion Pipeline takes two images or an image_embeddings tensor of size 2 and interpolates between their embeddings using spherical interpolation ( slerp ). The input images/image_embeddings are converted to image embeddings by the pipeline's image_encoder and the interpolation is done on the resulting image_embeddings over the number of steps specified. Defaults to 5 steps.
```python ```python
import torch import torch
...@@ -1127,8 +1127,8 @@ The resulting images in order:- ...@@ -1127,8 +1127,8 @@ The resulting images in order:-
![result5](https://huggingface.co/datasets/NagaSaiAbhinay/UnCLIPImageInterpolationSamples/resolve/main/starry_to_flowers_5.png) ![result5](https://huggingface.co/datasets/NagaSaiAbhinay/UnCLIPImageInterpolationSamples/resolve/main/starry_to_flowers_5.png)
### DDIM Noise Comparative Analysis Pipeline ### DDIM Noise Comparative Analysis Pipeline
#### **Research question: What visual concepts do the diffusion models learn from each noise level during training?** #### **Research question: What visual concepts do the diffusion models learn from each noise level during training?**
The [P2 weighting (CVPR 2022)](https://arxiv.org/abs/2204.00227) paper proposed an approach to answer the above question, which is their second contribution. The [P2 weighting (CVPR 2022)](https://arxiv.org/abs/2204.00227) paper proposed an approach to answer the above question, which is their second contribution.
The approach consists of the following steps: The approach consists of the following steps:
1. The input is an image x0. 1. The input is an image x0.
...@@ -1170,7 +1170,7 @@ Here is the result of this pipeline (which is DDIM) on CelebA-HQ dataset. ...@@ -1170,7 +1170,7 @@ Here is the result of this pipeline (which is DDIM) on CelebA-HQ dataset.
### CLIP Guided Img2Img Stable Diffusion ### CLIP Guided Img2Img Stable Diffusion
CLIP guided Img2Img stable diffusion can help to generate more realistic images with an initial image CLIP guided Img2Img stable diffusion can help to generate more realistic images with an initial image
by guiding stable diffusion at every denoising step with an additional CLIP model. by guiding stable diffusion at every denoising step with an additional CLIP model.
The following code requires roughly 12GB of GPU RAM. The following code requires roughly 12GB of GPU RAM.
...@@ -1322,8 +1322,8 @@ target_prompt = "A golden retriever" ...@@ -1322,8 +1322,8 @@ target_prompt = "A golden retriever"
# run the pipeline # run the pipeline
result_image = pipeline( result_image = pipeline(
base_prompt=base_prompt, base_prompt=base_prompt,
target_prompt=target_prompt, target_prompt=target_prompt,
image=cropped_image, image=cropped_image,
) )
...@@ -1537,7 +1537,7 @@ python -m pip install intel_extension_for_pytorch==<version_name> -f https://dev ...@@ -1537,7 +1537,7 @@ python -m pip install intel_extension_for_pytorch==<version_name> -f https://dev
pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", custom_pipeline="stable_diffusion_ipex") pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", custom_pipeline="stable_diffusion_ipex")
# For Float32 # For Float32
pipe.prepare_for_ipex(prompt, dtype=torch.float32, height=512, width=512) #value of image height/width should be consistent with the pipeline inference pipe.prepare_for_ipex(prompt, dtype=torch.float32, height=512, width=512) #value of image height/width should be consistent with the pipeline inference
# For BFloat16 # For BFloat16
pipe.prepare_for_ipex(prompt, dtype=torch.bfloat16, height=512, width=512) #value of image height/width should be consistent with the pipeline inference pipe.prepare_for_ipex(prompt, dtype=torch.bfloat16, height=512, width=512) #value of image height/width should be consistent with the pipeline inference
``` ```
...@@ -1545,7 +1545,7 @@ Then you can use the ipex pipeline in a similar way to the default stable diffus ...@@ -1545,7 +1545,7 @@ Then you can use the ipex pipeline in a similar way to the default stable diffus
```python ```python
# For Float32 # For Float32
image = pipe(prompt, num_inference_steps=20, height=512, width=512).images[0] #value of image height/width should be consistent with 'prepare_for_ipex()' image = pipe(prompt, num_inference_steps=20, height=512, width=512).images[0] #value of image height/width should be consistent with 'prepare_for_ipex()'
# For BFloat16 # For BFloat16
with torch.cpu.amp.autocast(enabled=True, dtype=torch.bfloat16): with torch.cpu.amp.autocast(enabled=True, dtype=torch.bfloat16):
image = pipe(prompt, num_inference_steps=20, height=512, width=512).images[0] #value of image height/width should be consistent with 'prepare_for_ipex()' image = pipe(prompt, num_inference_steps=20, height=512, width=512).images[0] #value of image height/width should be consistent with 'prepare_for_ipex()'
``` ```
...@@ -1604,21 +1604,21 @@ latency = elapsed_time(pipe4) ...@@ -1604,21 +1604,21 @@ latency = elapsed_time(pipe4)
print("Latency of StableDiffusionPipeline--fp32",latency) print("Latency of StableDiffusionPipeline--fp32",latency)
``` ```
### CLIP Guided Images Mixing With Stable Diffusion ### CLIP Guided Images Mixing With Stable Diffusion
![clip_guided_images_mixing_examples](https://huggingface.co/datasets/TheDenk/images_mixing/resolve/main/main.png) ![clip_guided_images_mixing_examples](https://huggingface.co/datasets/TheDenk/images_mixing/resolve/main/main.png)
CLIP guided stable diffusion images mixing pipeline allows to combine two images using standard diffusion models. CLIP guided stable diffusion images mixing pipeline allows to combine two images using standard diffusion models.
This approach is using (optional) CoCa model to avoid writing image description. This approach is using (optional) CoCa model to avoid writing image description.
[More code examples](https://github.com/TheDenk/images_mixing) [More code examples](https://github.com/TheDenk/images_mixing)
### Stable Diffusion XL Long Weighted Prompt Pipeline ### Stable Diffusion XL Long Weighted Prompt Pipeline
This SDXL pipeline support unlimited length prompt and negative prompt, compatible with A1111 prompt weighted style. This SDXL pipeline support unlimited length prompt and negative prompt, compatible with A1111 prompt weighted style.
You can provide both `prompt` and `prompt_2`. if only one prompt is provided, `prompt_2` will be a copy of the provided `prompt`. Here is a sample code to use this pipeline. You can provide both `prompt` and `prompt_2`. if only one prompt is provided, `prompt_2` will be a copy of the provided `prompt`. Here is a sample code to use this pipeline.
```python ```python
from diffusers import DiffusionPipeline from diffusers import DiffusionPipeline
...@@ -1639,8 +1639,8 @@ neg_prompt = "blur, low quality, carton, animate" ...@@ -1639,8 +1639,8 @@ neg_prompt = "blur, low quality, carton, animate"
pipe.to("cuda") pipe.to("cuda")
images = pipe( images = pipe(
prompt = prompt prompt = prompt
, negative_prompt = neg_prompt , negative_prompt = neg_prompt
).images[0] ).images[0]
pipe.to("cpu") pipe.to("cpu")
...@@ -1648,7 +1648,7 @@ torch.cuda.empty_cache() ...@@ -1648,7 +1648,7 @@ torch.cuda.empty_cache()
images images
``` ```
In the above code, the `prompt2` is appended to the `prompt`, which is more than 77 tokens. "birds" are showing up in the result. In the above code, the `prompt2` is appended to the `prompt`, which is more than 77 tokens. "birds" are showing up in the result.
![Stable Diffusion XL Long Weighted Prompt Pipeline sample](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_long_weighted_prompt.png) ![Stable Diffusion XL Long Weighted Prompt Pipeline sample](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_long_weighted_prompt.png)
## Example Images Mixing (with CoCa) ## Example Images Mixing (with CoCa)
...@@ -1700,7 +1700,7 @@ mixing_pipeline.enable_attention_slicing() ...@@ -1700,7 +1700,7 @@ mixing_pipeline.enable_attention_slicing()
mixing_pipeline = mixing_pipeline.to("cuda") mixing_pipeline = mixing_pipeline.to("cuda")
# Pipeline running # Pipeline running
generator = torch.Generator(device="cuda").manual_seed(17) generator = torch.Generator(device="cuda").manual_seed(17)
def download_image(url): def download_image(url):
response = requests.get(url) response = requests.get(url)
...@@ -1729,7 +1729,7 @@ pipe_images = mixing_pipeline( ...@@ -1729,7 +1729,7 @@ pipe_images = mixing_pipeline(
### Stable Diffusion Mixture Tiling ### Stable Diffusion Mixture Tiling
This pipeline uses the Mixture. Refer to the [Mixture](https://arxiv.org/abs/2302.02412) paper for more details. This pipeline uses the Mixture. Refer to the [Mixture](https://arxiv.org/abs/2302.02412) paper for more details.
```python ```python
from diffusers import LMSDiscreteScheduler, DiffusionPipeline from diffusers import LMSDiscreteScheduler, DiffusionPipeline
...@@ -1802,7 +1802,7 @@ image.save('tensorrt_inpaint_mecha_robot.png') ...@@ -1802,7 +1802,7 @@ image.save('tensorrt_inpaint_mecha_robot.png')
### Stable Diffusion Mixture Canvas ### Stable Diffusion Mixture Canvas
This pipeline uses the Mixture. Refer to the [Mixture](https://arxiv.org/abs/2302.02412) paper for more details. This pipeline uses the Mixture. Refer to the [Mixture](https://arxiv.org/abs/2302.02412) paper for more details.
```python ```python
from PIL import Image from PIL import Image
from diffusers import LMSDiscreteScheduler, DiffusionPipeline from diffusers import LMSDiscreteScheduler, DiffusionPipeline
...@@ -2011,7 +2011,7 @@ Reference Image ...@@ -2011,7 +2011,7 @@ Reference Image
![reference_image](https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png) ![reference_image](https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png)
Output Image Output Image
`prompt: 1 girl` `prompt: 1 girl`
...@@ -2022,7 +2022,7 @@ Reference Image ...@@ -2022,7 +2022,7 @@ Reference Image
![reference_image](https://github.com/huggingface/diffusers/assets/34944964/449bdab6-e744-4fb2-9620-d4068d9a741b) ![reference_image](https://github.com/huggingface/diffusers/assets/34944964/449bdab6-e744-4fb2-9620-d4068d9a741b)
Output Image Output Image
`prompt: A dog` `prompt: A dog`
...@@ -2103,7 +2103,7 @@ Let's have a look at the images (*512X512*) ...@@ -2103,7 +2103,7 @@ Let's have a look at the images (*512X512*)
| Without Feedback | With Feedback (1st image) | | Without Feedback | With Feedback (1st image) |
|---------------------|---------------------| |---------------------|---------------------|
| ![Image 1](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/fabric_wo_feedback.jpg) | ![Feedback Image 1](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/fabric_w_feedback.png) | | ![Image 1](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/fabric_wo_feedback.jpg) | ![Feedback Image 1](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/fabric_w_feedback.png) |
### Masked Im2Im Stable Diffusion Pipeline ### Masked Im2Im Stable Diffusion Pipeline
...@@ -2256,7 +2256,7 @@ pipe.to(torch_device="cuda", torch_dtype=torch.float32) ...@@ -2256,7 +2256,7 @@ pipe.to(torch_device="cuda", torch_dtype=torch.float32)
prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k" prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"
# Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps. # Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps.
num_inference_steps = 4 num_inference_steps = 4
images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=8.0, lcm_origin_steps=50, output_type="pil").images images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=8.0, lcm_origin_steps=50, output_type="pil").images
``` ```
...@@ -2292,7 +2292,7 @@ input_image=Image.open("myimg.png") ...@@ -2292,7 +2292,7 @@ input_image=Image.open("myimg.png")
strength = 0.5 #strength =0 (no change) strength=1 (completely overwrite image) strength = 0.5 #strength =0 (no change) strength=1 (completely overwrite image)
# Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps. # Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps.
num_inference_steps = 4 num_inference_steps = 4
images = pipe(prompt=prompt, image=input_image, strength=strength, num_inference_steps=num_inference_steps, guidance_scale=8.0, lcm_origin_steps=50, output_type="pil").images images = pipe(prompt=prompt, image=input_image, strength=strength, num_inference_steps=num_inference_steps, guidance_scale=8.0, lcm_origin_steps=50, output_type="pil").images
``` ```
...@@ -2345,7 +2345,7 @@ assert len(images) == (len(prompts) - 1) * num_interpolation_steps ...@@ -2345,7 +2345,7 @@ assert len(images) == (len(prompts) - 1) * num_interpolation_steps
``` ```
### StableDiffusionUpscaleLDM3D Pipeline ### StableDiffusionUpscaleLDM3D Pipeline
[LDM3D-VR](https://arxiv.org/pdf/2311.03226.pdf) is an extended version of LDM3D. [LDM3D-VR](https://arxiv.org/pdf/2311.03226.pdf) is an extended version of LDM3D.
The abstract from the paper is: The abstract from the paper is:
*Latent diffusion models have proven to be state-of-the-art in the creation and manipulation of visual outputs. However, as far as we know, the generation of depth maps jointly with RGB is still limited. We introduce LDM3D-VR, a suite of diffusion models targeting virtual reality development that includes LDM3D-pano and LDM3D-SR. These models enable the generation of panoramic RGBD based on textual prompts and the upscaling of low-resolution inputs to high-resolution RGBD, respectively. Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions. Both models are evaluated in comparison to existing related methods* *Latent diffusion models have proven to be state-of-the-art in the creation and manipulation of visual outputs. However, as far as we know, the generation of depth maps jointly with RGB is still limited. We introduce LDM3D-VR, a suite of diffusion models targeting virtual reality development that includes LDM3D-pano and LDM3D-SR. These models enable the generation of panoramic RGBD based on textual prompts and the upscaling of low-resolution inputs to high-resolution RGBD, respectively. Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions. Both models are evaluated in comparison to existing related methods*
...@@ -2386,8 +2386,8 @@ upscaled_depth.save(f"upscaled_lemons_depth.png") ...@@ -2386,8 +2386,8 @@ upscaled_depth.save(f"upscaled_lemons_depth.png")
''' '''
### ControlNet + T2I Adapter Pipeline ### ControlNet + T2I Adapter Pipeline
This pipelines combines both ControlNet and T2IAdapter into a single pipeline, where the forward pass is executed once. This pipelines combines both ControlNet and T2IAdapter into a single pipeline, where the forward pass is executed once.
It receives `control_image` and `adapter_image`, as well as `controlnet_conditioning_scale` and `adapter_conditioning_scale`, for the ControlNet and Adapter modules, respectively. Whenever `adapter_conditioning_scale = 0` or `controlnet_conditioning_scale = 0`, it will act as a full ControlNet module or as a full T2IAdapter module, respectively. It receives `control_image` and `adapter_image`, as well as `controlnet_conditioning_scale` and `adapter_conditioning_scale`, for the ControlNet and Adapter modules, respectively. Whenever `adapter_conditioning_scale = 0` or `controlnet_conditioning_scale = 0`, it will act as a full ControlNet module or as a full T2IAdapter module, respectively.
```py ```py
import cv2 import cv2
...@@ -2538,7 +2538,7 @@ pipe = RegionalPromptingStableDiffusionPipeline.from_single_file(model_path, vae ...@@ -2538,7 +2538,7 @@ pipe = RegionalPromptingStableDiffusionPipeline.from_single_file(model_path, vae
rp_args = { rp_args = {
"mode":"rows", "mode":"rows",
"div": "1;1;1" "div": "1;1;1"
} }
prompt =""" prompt ="""
green hair twintail BREAK green hair twintail BREAK
...@@ -2567,7 +2567,7 @@ for image in images: ...@@ -2567,7 +2567,7 @@ for image in images:
### Cols, Rows mode ### Cols, Rows mode
In the Cols, Rows mode, you can split the screen vertically and horizontally and assign prompts to each region. The split ratio can be specified by 'div', and you can set the division ratio like '3;3;2' or '0.1;0.5'. Furthermore, as will be described later, you can also subdivide the split Cols, Rows to specify more complex regions. In the Cols, Rows mode, you can split the screen vertically and horizontally and assign prompts to each region. The split ratio can be specified by 'div', and you can set the division ratio like '3;3;2' or '0.1;0.5'. Furthermore, as will be described later, you can also subdivide the split Cols, Rows to specify more complex regions.
In this image, the image is divided into three parts, and a separate prompt is applied to each. The prompts are divided by 'BREAK', and each is applied to the respective region. In this image, the image is divided into three parts, and a separate prompt is applied to each. The prompts are divided by 'BREAK', and each is applied to the respective region.
![sample](https://github.com/hako-mikan/sd-webui-regional-prompter/blob/imgs/rp_pipeline2.png) ![sample](https://github.com/hako-mikan/sd-webui-regional-prompter/blob/imgs/rp_pipeline2.png)
``` ```
green hair twintail BREAK green hair twintail BREAK
...@@ -2625,7 +2625,7 @@ prompt =""" ...@@ -2625,7 +2625,7 @@ prompt ="""
a girl in street with shirt, tie, skirt BREAK a girl in street with shirt, tie, skirt BREAK
red, shirt BREAK red, shirt BREAK
green, tie BREAK green, tie BREAK
blue , skirt blue , skirt
""" """
``` ```
![sample](https://github.com/hako-mikan/sd-webui-regional-prompter/blob/imgs/rp_pipeline3.png) ![sample](https://github.com/hako-mikan/sd-webui-regional-prompter/blob/imgs/rp_pipeline3.png)
...@@ -2644,7 +2644,7 @@ If only one input is given for multiple regions, they are all assumed to be the ...@@ -2644,7 +2644,7 @@ If only one input is given for multiple regions, they are all assumed to be the
The difference is that in Prompt, duplicate regions are added, whereas in Prompt-EX, duplicate regions are overwritten sequentially. Since they are processed in order, setting a TARGET with a large regions first makes it easier for the effect of small regions to remain unmuffled. The difference is that in Prompt, duplicate regions are added, whereas in Prompt-EX, duplicate regions are overwritten sequentially. Since they are processed in order, setting a TARGET with a large regions first makes it easier for the effect of small regions to remain unmuffled.
### Accuracy ### Accuracy
In the case of a 512 x 512 image, Attention mode reduces the size of the region to about 8 x 8 pixels deep in the U-Net, so that small regions get mixed up; Latent mode calculates 64*64, so that the region is exact. In the case of a 512 x 512 image, Attention mode reduces the size of the region to about 8 x 8 pixels deep in the U-Net, so that small regions get mixed up; Latent mode calculates 64*64, so that the region is exact.
``` ```
girl hair twintail frills,ribbons, dress, face BREAK girl hair twintail frills,ribbons, dress, face BREAK
girl, ,face girl, ,face
...@@ -2675,13 +2675,13 @@ Negative prompts are equally effective across all regions, but it is possible to ...@@ -2675,13 +2675,13 @@ Negative prompts are equally effective across all regions, but it is possible to
To activate Regional Prompter, it is necessary to enter settings in rp_args. The items that can be set are as follows. rp_args is a dictionary type. To activate Regional Prompter, it is necessary to enter settings in rp_args. The items that can be set are as follows. rp_args is a dictionary type.
### Input Parameters ### Input Parameters
Parameters are specified through the `rp_arg`(dictionary type). Parameters are specified through the `rp_arg`(dictionary type).
``` ```
rp_args = { rp_args = {
"mode":"rows", "mode":"rows",
"div": "1;1;1" "div": "1;1;1"
} }
pipe(prompt =prompt, rp_args = rp_args) pipe(prompt =prompt, rp_args = rp_args)
``` ```
...@@ -2760,7 +2760,7 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur ...@@ -2760,7 +2760,7 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur
def get_kernel(self): def get_kernel(self):
return self.k return self.k
self.kernel_size = kernel_size self.kernel_size = kernel_size
self.conv = Blurkernel(blur_type='gaussian', self.conv = Blurkernel(blur_type='gaussian',
kernel_size=kernel_size, kernel_size=kernel_size,
...@@ -2835,7 +2835,7 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur ...@@ -2835,7 +2835,7 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur
* ![sample](https://github.com/tongdaxu/Images/assets/22267548/0ceb5575-d42e-4f0b-99c0-50e69c982209) * ![sample](https://github.com/tongdaxu/Images/assets/22267548/0ceb5575-d42e-4f0b-99c0-50e69c982209)
* The reconstruction is perceptually similar to the source image, but different in details. * The reconstruction is perceptually similar to the source image, but different in details.
* In dps_pipeline.py, we also provide a super-resolution example, which should produce: * In dps_pipeline.py, we also provide a super-resolution example, which should produce:
* Downsampled image: * Downsampled image:
* ![dps_mea](https://github.com/tongdaxu/Images/assets/22267548/ff6a33d6-26f0-42aa-88ce-f8a76ba45a13) * ![dps_mea](https://github.com/tongdaxu/Images/assets/22267548/ff6a33d6-26f0-42aa-88ce-f8a76ba45a13)
* Reconstructed image: * Reconstructed image:
* ![dps_generated_image](https://github.com/tongdaxu/Images/assets/22267548/b74f084d-93f4-4845-83d8-44c0fa758a5f) * ![dps_generated_image](https://github.com/tongdaxu/Images/assets/22267548/b74f084d-93f4-4845-83d8-44c0fa758a5f)
...@@ -2930,7 +2930,7 @@ The original repo can be found at [repo](https://github.com/PRIS-CV/DemoFusion). ...@@ -2930,7 +2930,7 @@ The original repo can be found at [repo](https://github.com/PRIS-CV/DemoFusion).
- `show_image` (`bool`, defaults to False): - `show_image` (`bool`, defaults to False):
Determine whether to show intermediate results during generation. Determine whether to show intermediate results during generation.
``` ```py
from diffusers import DiffusionPipeline from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained( pipe = DiffusionPipeline.from_pretrained(
...@@ -2945,24 +2945,24 @@ prompt = "Envision a portrait of an elderly woman, her face a canvas of time, fr ...@@ -2945,24 +2945,24 @@ prompt = "Envision a portrait of an elderly woman, her face a canvas of time, fr
negative_prompt = "blurry, ugly, duplicate, poorly drawn, deformed, mosaic" negative_prompt = "blurry, ugly, duplicate, poorly drawn, deformed, mosaic"
images = pipe( images = pipe(
prompt, prompt,
negative_prompt=negative_prompt, negative_prompt=negative_prompt,
height=3072, height=3072,
width=3072, width=3072,
view_batch_size=16, view_batch_size=16,
stride=64, stride=64,
num_inference_steps=50, num_inference_steps=50,
guidance_scale=7.5, guidance_scale=7.5,
cosine_scale_1=3, cosine_scale_1=3,
cosine_scale_2=1, cosine_scale_2=1,
cosine_scale_3=1, cosine_scale_3=1,
sigma=0.8, sigma=0.8,
multi_decoder=True, multi_decoder=True,
show_image=True show_image=True
) )
``` ```
You can display and save the generated images as: You can display and save the generated images as:
``` ```py
def image_grid(imgs, save_path=None): def image_grid(imgs, save_path=None):
w = 0 w = 0
...@@ -2980,7 +2980,7 @@ def image_grid(imgs, save_path=None): ...@@ -2980,7 +2980,7 @@ def image_grid(imgs, save_path=None):
if save_path != None: if save_path != None:
img.save(save_path + "/img_{}.jpg".format((i + 1) * 1024)) img.save(save_path + "/img_{}.jpg".format((i + 1) * 1024))
w += w_ w += w_
return grid return grid
image_grid(images, save_path="./outputs/") image_grid(images, save_path="./outputs/")
......
# Research projects # Research projects
This folder contains various research projects using 🧨 Diffusers. This folder contains various research projects using 🧨 Diffusers.
They are not really maintained by the core maintainers of this library and often require a specific version of Diffusers that is indicated in the requirements file of each folder. They are not really maintained by the core maintainers of this library and often require a specific version of Diffusers that is indicated in the requirements file of each folder.
Updating them to the most recent version of the library will require some work. Updating them to the most recent version of the library will require some work.
To use any of them, just run the command To use any of them, just run the command
......
## [Deprecated] Multi Token Textual Inversion ## [Deprecated] Multi Token Textual Inversion
**IMPORTART: This research project is deprecated. Multi Token Textual Inversion is now supported natively in [the officail textual inversion example](https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion#running-locally-with-pytorch).** **IMPORTART: This research project is deprecated. Multi Token Textual Inversion is now supported natively in [the official textual inversion example](https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion#running-locally-with-pytorch).**
The author of this project is [Isamu Isozaki](https://github.com/isamu-isozaki) - please make sure to tag the author for issue and PRs as well as @patrickvonplaten. The author of this project is [Isamu Isozaki](https://github.com/isamu-isozaki) - please make sure to tag the author for issue and PRs as well as @patrickvonplaten.
...@@ -17,9 +17,9 @@ Feel free to add these options to your training! In practice num_vec_per_token a ...@@ -17,9 +17,9 @@ Feel free to add these options to your training! In practice num_vec_per_token a
[Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples. [Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples.
The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion. The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion.
## Running on Colab ## Running on Colab
Colab for training Colab for training
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb)
Colab for inference Colab for inference
...@@ -53,7 +53,7 @@ accelerate config ...@@ -53,7 +53,7 @@ accelerate config
### Cat toy example ### Cat toy example
You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-5`, so you'll need to visit [its card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license and tick the checkbox if you agree. You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-5`, so you'll need to visit [its card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license and tick the checkbox if you agree.
You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens). You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
...@@ -63,7 +63,7 @@ Run the following command to authenticate your token ...@@ -63,7 +63,7 @@ Run the following command to authenticate your token
huggingface-cli login huggingface-cli login
``` ```
If you have already cloned the repo, then you won't need to go through these steps. If you have already cloned the repo, then you won't need to go through these steps.
<br> <br>
......
...@@ -2,4 +2,4 @@ ...@@ -2,4 +2,4 @@
**This research project is not actively maintained by the diffusers team. For any questions or comments, please contact Prathik Rao (prathikr), Sunghoon Choi (hanbitmyths), Ashwini Khade (askhade), or Peng Wang (pengwa) on github with any questions.** **This research project is not actively maintained by the diffusers team. For any questions or comments, please contact Prathik Rao (prathikr), Sunghoon Choi (hanbitmyths), Ashwini Khade (askhade), or Peng Wang (pengwa) on github with any questions.**
This aims to provide diffusers examples with ONNXRuntime optimizations for training/fine-tuning unconditional image generation, text to image, and textual inversion. Please see individual directories for more details on how to run each task using ONNXRuntime. This aims to provide diffusers examples with ONNXRuntime optimizations for training/fine-tuning unconditional image generation, text to image, and textual inversion. Please see individual directories for more details on how to run each task using ONNXRuntime.
\ No newline at end of file
...@@ -34,7 +34,7 @@ accelerate config ...@@ -34,7 +34,7 @@ accelerate config
### Pokemon example ### Pokemon example
You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree.
You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens). You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
...@@ -68,7 +68,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image.py \ ...@@ -68,7 +68,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image.py \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--lr_scheduler="constant" --lr_warmup_steps=0 \ --lr_scheduler="constant" --lr_warmup_steps=0 \
--output_dir="sd-pokemon-model" --output_dir="sd-pokemon-model"
``` ```
Please contact Prathik Rao (prathikr), Sunghoon Choi (hanbitmyths), Ashwini Khade (askhade), or Peng Wang (pengwa) on github with any questions. Please contact Prathik Rao (prathikr), Sunghoon Choi (hanbitmyths), Ashwini Khade (askhade), or Peng Wang (pengwa) on github with any questions.
\ No newline at end of file
...@@ -3,9 +3,9 @@ ...@@ -3,9 +3,9 @@
[Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples. [Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples.
The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion. The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion.
## Running on Colab ## Running on Colab
Colab for training Colab for training
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb)
Colab for inference Colab for inference
...@@ -39,7 +39,7 @@ accelerate config ...@@ -39,7 +39,7 @@ accelerate config
### Cat toy example ### Cat toy example
You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-5`, so you'll need to visit [its card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license and tick the checkbox if you agree. You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-5`, so you'll need to visit [its card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license and tick the checkbox if you agree.
You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens). You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
...@@ -49,7 +49,7 @@ Run the following command to authenticate your token ...@@ -49,7 +49,7 @@ Run the following command to authenticate your token
huggingface-cli login huggingface-cli login
``` ```
If you have already cloned the repo, then you won't need to go through these steps. If you have already cloned the repo, then you won't need to go through these steps.
<br> <br>
......
...@@ -35,7 +35,7 @@ from accelerate.utils import write_basic_config ...@@ -35,7 +35,7 @@ from accelerate.utils import write_basic_config
write_basic_config() write_basic_config()
``` ```
When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups. When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups.
### Toy example ### Toy example
......
...@@ -72,8 +72,8 @@ params = jax.tree_util.tree_map(lambda x: x.astype(jnp.bfloat16), params) ...@@ -72,8 +72,8 @@ params = jax.tree_util.tree_map(lambda x: x.astype(jnp.bfloat16), params)
params["scheduler"] = scheduler_state params["scheduler"] = scheduler_state
``` ```
This section adjusts the data types of the model parameters. This section adjusts the data types of the model parameters.
We convert all parameters to `bfloat16` to speed-up the computation with model weights. We convert all parameters to `bfloat16` to speed-up the computation with model weights.
**Note** that the scheduler parameters are **not** converted to `blfoat16` as the loss **Note** that the scheduler parameters are **not** converted to `blfoat16` as the loss
in precision is degrading the pipeline's performance too significantly. in precision is degrading the pipeline's performance too significantly.
**3. Define Inputs to Pipeline** **3. Define Inputs to Pipeline**
...@@ -146,12 +146,12 @@ For this we will be using a JAX feature called [Ahead of Time](https://jax.readt ...@@ -146,12 +146,12 @@ For this we will be using a JAX feature called [Ahead of Time](https://jax.readt
In [sdxl_single_aot.py](./sdxl_single_aot.py) we give a simple example of how to write our own parallelization logic for text-to-image generation pipeline in JAX using [StabilityAI's Stable Diffusion XL](stabilityai/stable-diffusion-xl-base-1.0) In [sdxl_single_aot.py](./sdxl_single_aot.py) we give a simple example of how to write our own parallelization logic for text-to-image generation pipeline in JAX using [StabilityAI's Stable Diffusion XL](stabilityai/stable-diffusion-xl-base-1.0)
We add a `aot_compile` function that compiles the `pipeline._generate` function We add a `aot_compile` function that compiles the `pipeline._generate` function
telling JAX which input arguments are static, that is, arguments that telling JAX which input arguments are static, that is, arguments that
are known at compile time and won't change. In our case, it is num_inference_steps, are known at compile time and won't change. In our case, it is num_inference_steps,
height, width and return_latents. height, width and return_latents.
Once the function is compiled, these parameters are omitted from future calls and Once the function is compiled, these parameters are omitted from future calls and
cannot be changed without modifying the code and recompiling. cannot be changed without modifying the code and recompiling.
```python ```python
...@@ -205,9 +205,9 @@ def generate( ...@@ -205,9 +205,9 @@ def generate(
g = jnp.array([guidance_scale] * prompt_ids.shape[0], dtype=jnp.float32) g = jnp.array([guidance_scale] * prompt_ids.shape[0], dtype=jnp.float32)
g = g[:, None] g = g[:, None]
images = p_generate( images = p_generate(
prompt_ids, prompt_ids,
p_params, p_params,
rng, rng,
g, g,
None, None,
neg_prompt_ids) neg_prompt_ids)
...@@ -220,7 +220,7 @@ def generate( ...@@ -220,7 +220,7 @@ def generate(
The first forward pass after AOT compilation still takes a while longer than The first forward pass after AOT compilation still takes a while longer than
subsequent passes, this is because on the first pass, JAX uses Python dispatch, which subsequent passes, this is because on the first pass, JAX uses Python dispatch, which
Fills the C++ dispatch cache. Fills the C++ dispatch cache.
When using jit, this extra step is done automatically, but when using AOT compilation, When using jit, this extra step is done automatically, but when using AOT compilation,
it doesn't happen until the function call is made. it doesn't happen until the function call is made.
```python ```python
......
...@@ -42,7 +42,7 @@ from accelerate.utils import write_basic_config ...@@ -42,7 +42,7 @@ from accelerate.utils import write_basic_config
write_basic_config() write_basic_config()
``` ```
When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups. When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups.
## Circle filling dataset ## Circle filling dataset
...@@ -85,7 +85,7 @@ accelerate launch train_t2i_adapter_sdxl.py \ ...@@ -85,7 +85,7 @@ accelerate launch train_t2i_adapter_sdxl.py \
To better track our training experiments, we're using the following flags in the command above: To better track our training experiments, we're using the following flags in the command above:
* `report_to="wandb` will ensure the training runs are tracked on Weights and Biases. To use it, be sure to install `wandb` with `pip install wandb`. * `report_to="wandb` will ensure the training runs are tracked on Weights and Biases. To use it, be sure to install `wandb` with `pip install wandb`.
* `validation_image`, `validation_prompt`, and `validation_steps` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected. * `validation_image`, `validation_prompt`, and `validation_steps` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected.
Our experiments were conducted on a single 40GB A100 GPU. Our experiments were conducted on a single 40GB A100 GPU.
......
...@@ -36,7 +36,7 @@ Note also that we use PEFT library as backend for LoRA training, make sure to ha ...@@ -36,7 +36,7 @@ Note also that we use PEFT library as backend for LoRA training, make sure to ha
### Pokemon example ### Pokemon example
You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree.
You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens). You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
...@@ -71,7 +71,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image.py \ ...@@ -71,7 +71,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image.py \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--lr_scheduler="constant" --lr_warmup_steps=0 \ --lr_scheduler="constant" --lr_warmup_steps=0 \
--output_dir="sd-pokemon-model" --output_dir="sd-pokemon-model"
``` ```
<!-- accelerate_snippet_end --> <!-- accelerate_snippet_end -->
...@@ -145,11 +145,11 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image.py \ ...@@ -145,11 +145,11 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image.py \
--train_batch_size=1 \ --train_batch_size=1 \
--gradient_accumulation_steps=4 \ --gradient_accumulation_steps=4 \
--gradient_checkpointing \ --gradient_checkpointing \
--max_train_steps=15000 \ --max_train_steps=15000 \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--lr_scheduler="constant" --lr_warmup_steps=0 \ --lr_scheduler="constant" --lr_warmup_steps=0 \
--output_dir="sd-pokemon-model" --output_dir="sd-pokemon-model"
``` ```
...@@ -157,7 +157,7 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image.py \ ...@@ -157,7 +157,7 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image.py \
We support training with the Min-SNR weighting strategy proposed in [Efficient Diffusion Training via Min-SNR Weighting Strategy](https://arxiv.org/abs/2303.09556) which helps to achieve faster convergence We support training with the Min-SNR weighting strategy proposed in [Efficient Diffusion Training via Min-SNR Weighting Strategy](https://arxiv.org/abs/2303.09556) which helps to achieve faster convergence
by rebalancing the loss. In order to use it, one needs to set the `--snr_gamma` argument. The recommended by rebalancing the loss. In order to use it, one needs to set the `--snr_gamma` argument. The recommended
value when using it is 5.0. value when using it is 5.0.
You can find [this project on Weights and Biases](https://wandb.ai/sayakpaul/text2image-finetune-minsnr) that compares the loss surfaces of the following setups: You can find [this project on Weights and Biases](https://wandb.ai/sayakpaul/text2image-finetune-minsnr) that compares the loss surfaces of the following setups:
...@@ -167,7 +167,7 @@ You can find [this project on Weights and Biases](https://wandb.ai/sayakpaul/tex ...@@ -167,7 +167,7 @@ You can find [this project on Weights and Biases](https://wandb.ai/sayakpaul/tex
For our small Pokemons dataset, the effects of Min-SNR weighting strategy might not appear to be pronounced, but for larger datasets, we believe the effects will be more pronounced. For our small Pokemons dataset, the effects of Min-SNR weighting strategy might not appear to be pronounced, but for larger datasets, we believe the effects will be more pronounced.
Also, note that in this example, we either predict `epsilon` (i.e., the noise) or the `v_prediction`. For both of these cases, the formulation of the Min-SNR weighting strategy that we have used holds. Also, note that in this example, we either predict `epsilon` (i.e., the noise) or the `v_prediction`. For both of these cases, the formulation of the Min-SNR weighting strategy that we have used holds.
## Training with LoRA ## Training with LoRA
...@@ -186,7 +186,7 @@ on consumer GPUs like Tesla T4, Tesla V100. ...@@ -186,7 +186,7 @@ on consumer GPUs like Tesla T4, Tesla V100.
### Training ### Training
First, you need to set up your development environment as is explained in the [installation section](#installing-the-dependencies). Make sure to set the `MODEL_NAME` and `DATASET_NAME` environment variables. Here, we will use [Stable Diffusion v1-4](https://hf.co/CompVis/stable-diffusion-v1-4) and the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions). First, you need to set up your development environment as is explained in the [installation section](#installing-the-dependencies). Make sure to set the `MODEL_NAME` and `DATASET_NAME` environment variables. Here, we will use [Stable Diffusion v1-4](https://hf.co/CompVis/stable-diffusion-v1-4) and the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions).
**___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___** **___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___**
...@@ -197,7 +197,7 @@ export MODEL_NAME="CompVis/stable-diffusion-v1-4" ...@@ -197,7 +197,7 @@ export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/pokemon-blip-captions"
``` ```
For this example we want to directly store the trained LoRA embeddings on the Hub, so For this example we want to directly store the trained LoRA embeddings on the Hub, so
we need to be logged in and add the `--push_to_hub` flag. we need to be logged in and add the `--push_to_hub` flag.
```bash ```bash
...@@ -225,11 +225,11 @@ The above command will also run inference as fine-tuning progresses and log the ...@@ -225,11 +225,11 @@ The above command will also run inference as fine-tuning progresses and log the
The final LoRA embedding weights have been uploaded to [sayakpaul/sd-model-finetuned-lora-t4](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4). **___Note: [The final weights](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4/blob/main/pytorch_lora_weights.bin) are only 3 MB in size, which is orders of magnitudes smaller than the original model.___** The final LoRA embedding weights have been uploaded to [sayakpaul/sd-model-finetuned-lora-t4](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4). **___Note: [The final weights](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4/blob/main/pytorch_lora_weights.bin) are only 3 MB in size, which is orders of magnitudes smaller than the original model.___**
You can check some inference samples that were logged during the course of the fine-tuning process [here](https://wandb.ai/sayakpaul/text2image-fine-tune/runs/q4lc0xsw). You can check some inference samples that were logged during the course of the fine-tuning process [here](https://wandb.ai/sayakpaul/text2image-fine-tune/runs/q4lc0xsw).
### Inference ### Inference
Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline` after loading the trained LoRA weights. You Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline` after loading the trained LoRA weights. You
need to pass the `output_dir` for loading the LoRA weights which, in this case, is `sd-pokemon-model-lora`. need to pass the `output_dir` for loading the LoRA weights which, in this case, is `sd-pokemon-model-lora`.
```python ```python
...@@ -248,9 +248,9 @@ image.save("pokemon.png") ...@@ -248,9 +248,9 @@ image.save("pokemon.png")
If you are loading the LoRA parameters from the Hub and if the Hub repository has If you are loading the LoRA parameters from the Hub and if the Hub repository has
a `base_model` tag (such as [this](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4/blob/main/README.md?code=true#L4)), then a `base_model` tag (such as [this](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4/blob/main/README.md?code=true#L4)), then
you can do: you can do:
```py ```py
from huggingface_hub.repocard import RepoCard from huggingface_hub.repocard import RepoCard
lora_model_id = "sayakpaul/sd-model-finetuned-lora-t4" lora_model_id = "sayakpaul/sd-model-finetuned-lora-t4"
...@@ -287,7 +287,7 @@ python train_text_to_image_flax.py \ ...@@ -287,7 +287,7 @@ python train_text_to_image_flax.py \
--max_train_steps=15000 \ --max_train_steps=15000 \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--output_dir="sd-pokemon-model" --output_dir="sd-pokemon-model"
``` ```
To run on your own training files prepare the dataset according to the format required by `datasets`, you can find the instructions for how to do that in this [document](https://huggingface.co/docs/datasets/v2.4.0/en/image_load#imagefolder-with-metadata). To run on your own training files prepare the dataset according to the format required by `datasets`, you can find the instructions for how to do that in this [document](https://huggingface.co/docs/datasets/v2.4.0/en/image_load#imagefolder-with-metadata).
...@@ -321,5 +321,5 @@ According to [this issue](https://github.com/huggingface/diffusers/issues/2234#i ...@@ -321,5 +321,5 @@ According to [this issue](https://github.com/huggingface/diffusers/issues/2234#i
## Stable Diffusion XL ## Stable Diffusion XL
* We support fine-tuning the UNet shipped in [Stable Diffusion XL](https://huggingface.co/papers/2307.01952) via the `train_text_to_image_sdxl.py` script. Please refer to the docs [here](./README_sdxl.md). * We support fine-tuning the UNet shipped in [Stable Diffusion XL](https://huggingface.co/papers/2307.01952) via the `train_text_to_image_sdxl.py` script. Please refer to the docs [here](./README_sdxl.md).
* We also support fine-tuning of the UNet and Text Encoder shipped in [Stable Diffusion XL](https://huggingface.co/papers/2307.01952) with LoRA via the `train_text_to_image_lora_sdxl.py` script. Please refer to the docs [here](./README_sdxl.md). * We also support fine-tuning of the UNet and Text Encoder shipped in [Stable Diffusion XL](https://huggingface.co/papers/2307.01952) with LoRA via the `train_text_to_image_lora_sdxl.py` script. Please refer to the docs [here](./README_sdxl.md).
...@@ -3,9 +3,9 @@ ...@@ -3,9 +3,9 @@
[Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples. [Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples.
The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion. The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion.
## Running on Colab ## Running on Colab
Colab for training Colab for training
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb)
Colab for inference Colab for inference
...@@ -84,11 +84,11 @@ accelerate launch textual_inversion.py \ ...@@ -84,11 +84,11 @@ accelerate launch textual_inversion.py \
A full training run takes ~1 hour on one V100 GPU. A full training run takes ~1 hour on one V100 GPU.
**Note**: As described in [the official paper](https://arxiv.org/abs/2208.01618) **Note**: As described in [the official paper](https://arxiv.org/abs/2208.01618)
only one embedding vector is used for the placeholder token, *e.g.* `"<cat-toy>"`. only one embedding vector is used for the placeholder token, *e.g.* `"<cat-toy>"`.
However, one can also add multiple embedding vectors for the placeholder token However, one can also add multiple embedding vectors for the placeholder token
to increase the number of fine-tuneable parameters. This can help the model to learn to increase the number of fine-tuneable parameters. This can help the model to learn
more complex details. To use multiple embedding vectors, you should define `--num_vectors` more complex details. To use multiple embedding vectors, you should define `--num_vectors`
to a number larger than one, *e.g.*: to a number larger than one, *e.g.*:
```bash ```bash
--num_vectors 5 --num_vectors 5
......
...@@ -27,7 +27,7 @@ And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) e ...@@ -27,7 +27,7 @@ And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) e
accelerate config accelerate config
``` ```
### Unconditional Flowers ### Unconditional Flowers
The command to train a DDPM UNet model on the Oxford Flowers dataset: The command to train a DDPM UNet model on the Oxford Flowers dataset:
...@@ -52,7 +52,7 @@ A full training run takes 2 hours on 4xV100 GPUs. ...@@ -52,7 +52,7 @@ A full training run takes 2 hours on 4xV100 GPUs.
<img src="https://user-images.githubusercontent.com/26864830/180248660-a0b143d0-b89a-42c5-8656-2ebf6ece7e52.png" width="700" /> <img src="https://user-images.githubusercontent.com/26864830/180248660-a0b143d0-b89a-42c5-8656-2ebf6ece7e52.png" width="700" />
### Unconditional Pokemon ### Unconditional Pokemon
The command to train a DDPM UNet model on the Pokemon dataset: The command to train a DDPM UNet model on the Pokemon dataset:
...@@ -96,7 +96,7 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_unconditional.py \ ...@@ -96,7 +96,7 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_unconditional.py \
--logger="wandb" --logger="wandb"
``` ```
To be able to use Weights and Biases (`wandb`) as a logger you need to install the library: `pip install wandb`. To be able to use Weights and Biases (`wandb`) as a logger you need to install the library: `pip install wandb`.
### Using your own data ### Using your own data
......
...@@ -72,7 +72,7 @@ In a nutshell, LoRA allows adapting pretrained models by adding pairs of rank-de ...@@ -72,7 +72,7 @@ In a nutshell, LoRA allows adapting pretrained models by adding pairs of rank-de
### Prior Training ### Prior Training
First, you need to set up your development environment as explained in the [installation](#Running-locally-with-PyTorch) section. Make sure to set the `DATASET_NAME` environment variable. Here, we will use the [Pokemon captions dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions). First, you need to set up your development environment as explained in the [installation](#Running-locally-with-PyTorch) section. Make sure to set the `DATASET_NAME` environment variable. Here, we will use the [Pokemon captions dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions).
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/pokemon-blip-captions"
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment