Unverified Commit 51941387 authored by Parag Ekbote's avatar Parag Ekbote Committed by GitHub
Browse files

Notebooks for Community Scripts-7 (#10846)

Add 5 Notebooks, improve their example
scripts and update the missing links for the
example README.
parent c7a8c439
...@@ -40,9 +40,9 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie ...@@ -40,9 +40,9 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie
| [**Text-to-Image fine-tuning**](./text_to_image) | ✅ | ✅ | | [**Text-to-Image fine-tuning**](./text_to_image) | ✅ | ✅ |
| [**Textual Inversion**](./textual_inversion) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) | [**Textual Inversion**](./textual_inversion) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb)
| [**Dreambooth**](./dreambooth) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_dreambooth_training.ipynb) | [**Dreambooth**](./dreambooth) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_dreambooth_training.ipynb)
| [**ControlNet**](./controlnet) | ✅ | ✅ | - | [**ControlNet**](./controlnet) | ✅ | ✅ | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/controlnet.ipynb)
| [**InstructPix2Pix**](./instruct_pix2pix) | ✅ | ✅ | - | [**InstructPix2Pix**](./instruct_pix2pix) | ✅ | ✅ | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/InstructPix2Pix_using_diffusers.ipynb)
| [**Reinforcement Learning for Control**](./reinforcement_learning) | - | - | coming soon. | [**Reinforcement Learning for Control**](./reinforcement_learning) | - | - | [Notebook1](https://github.com/huggingface/notebooks/blob/main/diffusers/reinforcement_learning_for_control.ipynb), [Notebook2](https://github.com/huggingface/notebooks/blob/main/diffusers/reinforcement_learning_with_diffusers.ipynb)
## Community ## Community
......
...@@ -27,25 +27,25 @@ Please also check out our [Community Scripts](https://github.com/huggingface/dif ...@@ -27,25 +27,25 @@ Please also check out our [Community Scripts](https://github.com/huggingface/dif
| Seed Resizing Stable Diffusion | Stable Diffusion Pipeline that supports resizing an image and retaining the concepts of the 512 by 512 generation. | [Seed Resizing](#seed-resizing) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/seed_resizing.ipynb) | [Mark Rich](https://github.com/MarkRich) | | Seed Resizing Stable Diffusion | Stable Diffusion Pipeline that supports resizing an image and retaining the concepts of the 512 by 512 generation. | [Seed Resizing](#seed-resizing) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/seed_resizing.ipynb) | [Mark Rich](https://github.com/MarkRich) |
| Imagic Stable Diffusion | Stable Diffusion Pipeline that enables writing a text prompt to edit an existing image | [Imagic Stable Diffusion](#imagic-stable-diffusion) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/imagic_stable_diffusion.ipynb) | [Mark Rich](https://github.com/MarkRich) | | Imagic Stable Diffusion | Stable Diffusion Pipeline that enables writing a text prompt to edit an existing image | [Imagic Stable Diffusion](#imagic-stable-diffusion) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/imagic_stable_diffusion.ipynb) | [Mark Rich](https://github.com/MarkRich) |
| Multilingual Stable Diffusion | Stable Diffusion Pipeline that supports prompts in 50 different languages. | [Multilingual Stable Diffusion](#multilingual-stable-diffusion-pipeline) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/multilingual_stable_diffusion.ipynb) | [Juan Carlos Piñeros](https://github.com/juancopi81) | | Multilingual Stable Diffusion | Stable Diffusion Pipeline that supports prompts in 50 different languages. | [Multilingual Stable Diffusion](#multilingual-stable-diffusion-pipeline) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/multilingual_stable_diffusion.ipynb) | [Juan Carlos Piñeros](https://github.com/juancopi81) |
| GlueGen Stable Diffusion | Stable Diffusion Pipeline that supports prompts in different languages using GlueGen adapter. | [GlueGen Stable Diffusion](#gluegen-stable-diffusion-pipeline) | - | [Phạm Hồng Vinh](https://github.com/rootonchair) | | GlueGen Stable Diffusion | Stable Diffusion Pipeline that supports prompts in different languages using GlueGen adapter. | [GlueGen Stable Diffusion](#gluegen-stable-diffusion-pipeline) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/gluegen_stable_diffusion.ipynb) | [Phạm Hồng Vinh](https://github.com/rootonchair) |
| Image to Image Inpainting Stable Diffusion | Stable Diffusion Pipeline that enables the overlaying of two images and subsequent inpainting | [Image to Image Inpainting Stable Diffusion](#image-to-image-inpainting-stable-diffusion) | - | [Alex McKinney](https://github.com/vvvm23) | | Image to Image Inpainting Stable Diffusion | Stable Diffusion Pipeline that enables the overlaying of two images and subsequent inpainting | [Image to Image Inpainting Stable Diffusion](#image-to-image-inpainting-stable-diffusion) | - | [Alex McKinney](https://github.com/vvvm23) |
| Text Based Inpainting Stable Diffusion | Stable Diffusion Inpainting Pipeline that enables passing a text prompt to generate the mask for inpainting | [Text Based Inpainting Stable Diffusion](#text-based-inpainting-stable-diffusion) | - | [Dhruv Karan](https://github.com/unography) | | Text Based Inpainting Stable Diffusion | Stable Diffusion Inpainting Pipeline that enables passing a text prompt to generate the mask for inpainting | [Text Based Inpainting Stable Diffusion](#text-based-inpainting-stable-diffusion) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/text_based_inpainting_stable_dffusion.ipynb) | [Dhruv Karan](https://github.com/unography) |
| Bit Diffusion | Diffusion on discrete data | [Bit Diffusion](#bit-diffusion) | - | [Stuti R.](https://github.com/kingstut) | | Bit Diffusion | Diffusion on discrete data | [Bit Diffusion](#bit-diffusion) | - | [Stuti R.](https://github.com/kingstut) |
| K-Diffusion Stable Diffusion | Run Stable Diffusion with any of [K-Diffusion's samplers](https://github.com/crowsonkb/k-diffusion/blob/master/k_diffusion/sampling.py) | [Stable Diffusion with K Diffusion](#stable-diffusion-with-k-diffusion) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) | | K-Diffusion Stable Diffusion | Run Stable Diffusion with any of [K-Diffusion's samplers](https://github.com/crowsonkb/k-diffusion/blob/master/k_diffusion/sampling.py) | [Stable Diffusion with K Diffusion](#stable-diffusion-with-k-diffusion) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) |
| Checkpoint Merger Pipeline | Diffusion Pipeline that enables merging of saved model checkpoints | [Checkpoint Merger Pipeline](#checkpoint-merger-pipeline) | - | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) | | Checkpoint Merger Pipeline | Diffusion Pipeline that enables merging of saved model checkpoints | [Checkpoint Merger Pipeline](#checkpoint-merger-pipeline) | - | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) |
| Stable Diffusion v1.1-1.4 Comparison | Run all 4 model checkpoints for Stable Diffusion and compare their results together | [Stable Diffusion Comparison](#stable-diffusion-comparisons) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/stable_diffusion_comparison.ipynb) | [Suvaditya Mukherjee](https://github.com/suvadityamuk) | | Stable Diffusion v1.1-1.4 Comparison | Run all 4 model checkpoints for Stable Diffusion and compare their results together | [Stable Diffusion Comparison](#stable-diffusion-comparisons) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/stable_diffusion_comparison.ipynb) | [Suvaditya Mukherjee](https://github.com/suvadityamuk) |
| MagicMix | Diffusion Pipeline for semantic mixing of an image and a text prompt | [MagicMix](#magic-mix) | - | [Partho Das](https://github.com/daspartho) | | MagicMix | Diffusion Pipeline for semantic mixing of an image and a text prompt | [MagicMix](#magic-mix) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/magic_mix.ipynb) | [Partho Das](https://github.com/daspartho) |
| Stable UnCLIP | Diffusion Pipeline for combining prior model (generate clip image embedding from text, UnCLIPPipeline `"kakaobrain/karlo-v1-alpha"`) and decoder pipeline (decode clip image embedding to image, StableDiffusionImageVariationPipeline `"lambdalabs/sd-image-variations-diffusers"` ). | [Stable UnCLIP](#stable-unclip) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/stable_unclip.ipynb) | [Ray Wang](https://wrong.wang) | | Stable UnCLIP | Diffusion Pipeline for combining prior model (generate clip image embedding from text, UnCLIPPipeline `"kakaobrain/karlo-v1-alpha"`) and decoder pipeline (decode clip image embedding to image, StableDiffusionImageVariationPipeline `"lambdalabs/sd-image-variations-diffusers"` ). | [Stable UnCLIP](#stable-unclip) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/stable_unclip.ipynb) | [Ray Wang](https://wrong.wang) |
| UnCLIP Text Interpolation Pipeline | Diffusion Pipeline that allows passing two prompts and produces images while interpolating between the text-embeddings of the two prompts | [UnCLIP Text Interpolation Pipeline](#unclip-text-interpolation-pipeline) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/unclip_text_interpolation.ipynb)| [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) | | UnCLIP Text Interpolation Pipeline | Diffusion Pipeline that allows passing two prompts and produces images while interpolating between the text-embeddings of the two prompts | [UnCLIP Text Interpolation Pipeline](#unclip-text-interpolation-pipeline) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/unclip_text_interpolation.ipynb)| [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) |
| UnCLIP Image Interpolation Pipeline | Diffusion Pipeline that allows passing two images/image_embeddings and produces images while interpolating between their image-embeddings | [UnCLIP Image Interpolation Pipeline](#unclip-image-interpolation-pipeline) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/unclip_image_interpolation.ipynb)| [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) | | UnCLIP Image Interpolation Pipeline | Diffusion Pipeline that allows passing two images/image_embeddings and produces images while interpolating between their image-embeddings | [UnCLIP Image Interpolation Pipeline](#unclip-image-interpolation-pipeline) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/unclip_image_interpolation.ipynb)| [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) |
| DDIM Noise Comparative Analysis Pipeline | Investigating how the diffusion models learn visual concepts from each noise level (which is a contribution of [P2 weighting (CVPR 2022)](https://arxiv.org/abs/2204.00227)) | [DDIM Noise Comparative Analysis Pipeline](#ddim-noise-comparative-analysis-pipeline) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/ddim_noise_comparative_analysis.ipynb)| [Aengus (Duc-Anh)](https://github.com/aengusng8) | | DDIM Noise Comparative Analysis Pipeline | Investigating how the diffusion models learn visual concepts from each noise level (which is a contribution of [P2 weighting (CVPR 2022)](https://arxiv.org/abs/2204.00227)) | [DDIM Noise Comparative Analysis Pipeline](#ddim-noise-comparative-analysis-pipeline) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/ddim_noise_comparative_analysis.ipynb)| [Aengus (Duc-Anh)](https://github.com/aengusng8) |
| CLIP Guided Img2Img Stable Diffusion Pipeline | Doing CLIP guidance for image to image generation with Stable Diffusion | [CLIP Guided Img2Img Stable Diffusion](#clip-guided-img2img-stable-diffusion) | - | [Nipun Jindal](https://github.com/nipunjindal/) | | CLIP Guided Img2Img Stable Diffusion Pipeline | Doing CLIP guidance for image to image generation with Stable Diffusion | [CLIP Guided Img2Img Stable Diffusion](#clip-guided-img2img-stable-diffusion) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/clip_guided_img2img_stable_diffusion.ipynb) | [Nipun Jindal](https://github.com/nipunjindal/) |
| TensorRT Stable Diffusion Text to Image Pipeline | Accelerates the Stable Diffusion Text2Image Pipeline using TensorRT | [TensorRT Stable Diffusion Text to Image Pipeline](#tensorrt-text2image-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) | | TensorRT Stable Diffusion Text to Image Pipeline | Accelerates the Stable Diffusion Text2Image Pipeline using TensorRT | [TensorRT Stable Diffusion Text to Image Pipeline](#tensorrt-text2image-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) |
| EDICT Image Editing Pipeline | Diffusion pipeline for text-guided image editing | [EDICT Image Editing Pipeline](#edict-image-editing-pipeline) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/edict_image_pipeline.ipynb) | [Joqsan Azocar](https://github.com/Joqsan) | | EDICT Image Editing Pipeline | Diffusion pipeline for text-guided image editing | [EDICT Image Editing Pipeline](#edict-image-editing-pipeline) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/edict_image_pipeline.ipynb) | [Joqsan Azocar](https://github.com/Joqsan) |
| Stable Diffusion RePaint | Stable Diffusion pipeline using [RePaint](https://arxiv.org/abs/2201.09865) for inpainting. | [Stable Diffusion RePaint](#stable-diffusion-repaint )|[Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/stable_diffusion_repaint.ipynb)| [Markus Pobitzer](https://github.com/Markus-Pobitzer) | | Stable Diffusion RePaint | Stable Diffusion pipeline using [RePaint](https://arxiv.org/abs/2201.09865) for inpainting. | [Stable Diffusion RePaint](#stable-diffusion-repaint )|[Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/stable_diffusion_repaint.ipynb)| [Markus Pobitzer](https://github.com/Markus-Pobitzer) |
| TensorRT Stable Diffusion Image to Image Pipeline | Accelerates the Stable Diffusion Image2Image Pipeline using TensorRT | [TensorRT Stable Diffusion Image to Image Pipeline](#tensorrt-image2image-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) | | TensorRT Stable Diffusion Image to Image Pipeline | Accelerates the Stable Diffusion Image2Image Pipeline using TensorRT | [TensorRT Stable Diffusion Image to Image Pipeline](#tensorrt-image2image-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) |
| Stable Diffusion IPEX Pipeline | Accelerate Stable Diffusion inference pipeline with BF16/FP32 precision on Intel Xeon CPUs with [IPEX](https://github.com/intel/intel-extension-for-pytorch) | [Stable Diffusion on IPEX](#stable-diffusion-on-ipex) | - | [Yingjie Han](https://github.com/yingjie-han/) | | Stable Diffusion IPEX Pipeline | Accelerate Stable Diffusion inference pipeline with BF16/FP32 precision on Intel Xeon CPUs with [IPEX](https://github.com/intel/intel-extension-for-pytorch) | [Stable Diffusion on IPEX](#stable-diffusion-on-ipex) | - | [Yingjie Han](https://github.com/yingjie-han/) |
| CLIP Guided Images Mixing Stable Diffusion Pipeline | Сombine images using usual diffusion models. | [CLIP Guided Images Mixing Using Stable Diffusion](#clip-guided-images-mixing-with-stable-diffusion) | - | [Karachev Denis](https://github.com/TheDenk) | | CLIP Guided Images Mixing Stable Diffusion Pipeline | Сombine images using usual diffusion models. | [CLIP Guided Images Mixing Using Stable Diffusion](#clip-guided-images-mixing-with-stable-diffusion) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/clip_guided_images_mixing_with_stable_diffusion.ipynb) | [Karachev Denis](https://github.com/TheDenk) |
| TensorRT Stable Diffusion Inpainting Pipeline | Accelerates the Stable Diffusion Inpainting Pipeline using TensorRT | [TensorRT Stable Diffusion Inpainting Pipeline](#tensorrt-inpainting-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) | | TensorRT Stable Diffusion Inpainting Pipeline | Accelerates the Stable Diffusion Inpainting Pipeline using TensorRT | [TensorRT Stable Diffusion Inpainting Pipeline](#tensorrt-inpainting-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) |
| IADB Pipeline | Implementation of [Iterative α-(de)Blending: a Minimalist Deterministic Diffusion Model](https://arxiv.org/abs/2305.03486) | [IADB Pipeline](#iadb-pipeline) | - | [Thomas Chambon](https://github.com/tchambon) | IADB Pipeline | Implementation of [Iterative α-(de)Blending: a Minimalist Deterministic Diffusion Model](https://arxiv.org/abs/2305.03486) | [IADB Pipeline](#iadb-pipeline) | - | [Thomas Chambon](https://github.com/tchambon)
| Zero1to3 Pipeline | Implementation of [Zero-1-to-3: Zero-shot One Image to 3D Object](https://arxiv.org/abs/2303.11328) | [Zero1to3 Pipeline](#zero1to3-pipeline) | - | [Xin Kong](https://github.com/kxhit) | | Zero1to3 Pipeline | Implementation of [Zero-1-to-3: Zero-shot One Image to 3D Object](https://arxiv.org/abs/2303.11328) | [Zero1to3 Pipeline](#zero1to3-pipeline) | - | [Xin Kong](https://github.com/kxhit) |
...@@ -81,6 +81,7 @@ PIXART-α Controlnet pipeline | Implementation of the controlnet model for pixar ...@@ -81,6 +81,7 @@ PIXART-α Controlnet pipeline | Implementation of the controlnet model for pixar
| HunyuanDiT Differential Diffusion Pipeline | Applies [Differential Diffusion](https://github.com/exx8/differential-diffusion) to [HunyuanDiT](https://github.com/huggingface/diffusers/pull/8240). | [HunyuanDiT with Differential Diffusion](#hunyuandit-with-differential-diffusion) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1v44a5fpzyr4Ffr4v2XBQ7BajzG874N4P?usp=sharing) | [Monjoy Choudhury](https://github.com/MnCSSJ4x) | | HunyuanDiT Differential Diffusion Pipeline | Applies [Differential Diffusion](https://github.com/exx8/differential-diffusion) to [HunyuanDiT](https://github.com/huggingface/diffusers/pull/8240). | [HunyuanDiT with Differential Diffusion](#hunyuandit-with-differential-diffusion) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1v44a5fpzyr4Ffr4v2XBQ7BajzG874N4P?usp=sharing) | [Monjoy Choudhury](https://github.com/MnCSSJ4x) |
| [🪆Matryoshka Diffusion Models](https://huggingface.co/papers/2310.15111) | A diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture where features and parameters for small scale inputs are nested within those of the large scales. See [original codebase](https://github.com/apple/ml-mdm). | [🪆Matryoshka Diffusion Models](#matryoshka-diffusion-models) | [![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/pcuenq/mdm) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/tolgacangoz/1f54875fc7aeaabcf284ebde64820966/matryoshka_hf.ipynb) | [M. Tolga Cangöz](https://github.com/tolgacangoz) | | [🪆Matryoshka Diffusion Models](https://huggingface.co/papers/2310.15111) | A diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture where features and parameters for small scale inputs are nested within those of the large scales. See [original codebase](https://github.com/apple/ml-mdm). | [🪆Matryoshka Diffusion Models](#matryoshka-diffusion-models) | [![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/pcuenq/mdm) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/tolgacangoz/1f54875fc7aeaabcf284ebde64820966/matryoshka_hf.ipynb) | [M. Tolga Cangöz](https://github.com/tolgacangoz) |
| Stable Diffusion XL Attentive Eraser Pipeline |[[AAAI2025 Oral] Attentive Eraser](https://github.com/Anonym0u3/AttentiveEraser) is a novel tuning-free method that enhances object removal capabilities in pre-trained diffusion models.|[Stable Diffusion XL Attentive Eraser Pipeline](#stable-diffusion-xl-attentive-eraser-pipeline)|-|[Wenhao Sun](https://github.com/Anonym0u3) and [Benlei Cui](https://github.com/Benny079)| | Stable Diffusion XL Attentive Eraser Pipeline |[[AAAI2025 Oral] Attentive Eraser](https://github.com/Anonym0u3/AttentiveEraser) is a novel tuning-free method that enhances object removal capabilities in pre-trained diffusion models.|[Stable Diffusion XL Attentive Eraser Pipeline](#stable-diffusion-xl-attentive-eraser-pipeline)|-|[Wenhao Sun](https://github.com/Anonym0u3) and [Benlei Cui](https://github.com/Benny079)|
| Perturbed-Attention Guidance |StableDiffusionPAGPipeline is a modification of StableDiffusionPipeline to support Perturbed-Attention Guidance (PAG).|[Perturbed-Attention Guidance](#perturbed-attention-guidance)|[Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/perturbed_attention_guidance.ipynb)|[Hyoungwon Cho](https://github.com/HyoungwonCho)|
To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly. To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly.
...@@ -1106,38 +1107,100 @@ GlueGen is a minimal adapter that allows alignment between any encoder (Text Enc ...@@ -1106,38 +1107,100 @@ GlueGen is a minimal adapter that allows alignment between any encoder (Text Enc
Make sure you downloaded `gluenet_French_clip_overnorm_over3_noln.ckpt` for French (there are also pre-trained weights for Chinese, Italian, Japanese, Spanish or train your own) at [GlueGen's official repo](https://github.com/salesforce/GlueGen/tree/main). Make sure you downloaded `gluenet_French_clip_overnorm_over3_noln.ckpt` for French (there are also pre-trained weights for Chinese, Italian, Japanese, Spanish or train your own) at [GlueGen's official repo](https://github.com/salesforce/GlueGen/tree/main).
```python ```python
from PIL import Image import os
import gc
import urllib.request
import torch import torch
from transformers import XLMRobertaTokenizer, XLMRobertaForMaskedLM, CLIPTokenizer, CLIPTextModel
from diffusers import DiffusionPipeline
from transformers import AutoModel, AutoTokenizer # Download checkpoints
CHECKPOINTS = [
"https://storage.googleapis.com/sfr-gluegen-data-research/checkpoints_all/gluenet_checkpoint/gluenet_Chinese_clip_overnorm_over3_noln.ckpt",
"https://storage.googleapis.com/sfr-gluegen-data-research/checkpoints_all/gluenet_checkpoint/gluenet_French_clip_overnorm_over3_noln.ckpt",
"https://storage.googleapis.com/sfr-gluegen-data-research/checkpoints_all/gluenet_checkpoint/gluenet_Italian_clip_overnorm_over3_noln.ckpt",
"https://storage.googleapis.com/sfr-gluegen-data-research/checkpoints_all/gluenet_checkpoint/gluenet_Japanese_clip_overnorm_over3_noln.ckpt",
"https://storage.googleapis.com/sfr-gluegen-data-research/checkpoints_all/gluenet_checkpoint/gluenet_Spanish_clip_overnorm_over3_noln.ckpt",
"https://storage.googleapis.com/sfr-gluegen-data-research/checkpoints_all/gluenet_checkpoint/gluenet_sound2img_audioclip_us8k.ckpt"
]
from diffusers import DiffusionPipeline LANGUAGE_PROMPTS = {
"French": "une voiture sur la plage",
#"Chinese": "海滩上的一辆车",
#"Italian": "una macchina sulla spiaggia",
#"Japanese": "浜辺の車",
#"Spanish": "un coche en la playa"
}
if __name__ == "__main__": def download_checkpoints(checkpoint_dir):
device = "cuda" os.makedirs(checkpoint_dir, exist_ok=True)
for url in CHECKPOINTS:
filename = os.path.join(checkpoint_dir, os.path.basename(url))
if not os.path.exists(filename):
print(f"Downloading {filename}...")
urllib.request.urlretrieve(url, filename)
print(f"Downloaded {filename}")
else:
print(f"Checkpoint {filename} already exists, skipping download.")
return checkpoint_dir
def load_checkpoint(pipeline, checkpoint_path, device):
state_dict = torch.load(checkpoint_path, map_location=device)
state_dict = state_dict.get("state_dict", state_dict)
missing_keys, unexpected_keys = pipeline.unet.load_state_dict(state_dict, strict=False)
return pipeline
def generate_image(pipeline, prompt, device, output_path):
with torch.inference_mode():
image = pipeline(
prompt,
generator=torch.Generator(device=device).manual_seed(42),
num_inference_steps=50
).images[0]
image.save(output_path)
print(f"Image saved to {output_path}")
checkpoint_dir = download_checkpoints("./checkpoints_all/gluenet_checkpoint")
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
lm_model_id = "xlm-roberta-large" tokenizer = XLMRobertaTokenizer.from_pretrained("xlm-roberta-base", use_fast=False)
token_max_length = 77 model = XLMRobertaForMaskedLM.from_pretrained("xlm-roberta-base").to(device)
inputs = tokenizer("Ceci est une phrase incomplète avec un [MASK].", return_tensors="pt").to(device)
with torch.inference_mode():
_ = model(**inputs)
text_encoder = AutoModel.from_pretrained(lm_model_id)
tokenizer = AutoTokenizer.from_pretrained(lm_model_id, model_max_length=token_max_length, use_fast=False)
tensor_norm = torch.Tensor([[43.8203],[28.3668],[27.9345],[28.0084],[28.2958],[28.2576],[28.3373],[28.2695],[28.4097],[28.2790],[28.2825],[28.2807],[28.2775],[28.2708],[28.2682],[28.2624],[28.2589],[28.2611],[28.2616],[28.2639],[28.2613],[28.2566],[28.2615],[28.2665],[28.2799],[28.2885],[28.2852],[28.2863],[28.2780],[28.2818],[28.2764],[28.2532],[28.2412],[28.2336],[28.2514],[28.2734],[28.2763],[28.2977],[28.2971],[28.2948],[28.2818],[28.2676],[28.2831],[28.2890],[28.2979],[28.2999],[28.3117],[28.3363],[28.3554],[28.3626],[28.3589],[28.3597],[28.3543],[28.3660],[28.3731],[28.3717],[28.3812],[28.3753],[28.3810],[28.3777],[28.3693],[28.3713],[28.3670],[28.3691],[28.3679],[28.3624],[28.3703],[28.3703],[28.3720],[28.3594],[28.3576],[28.3562],[28.3438],[28.3376],[28.3389],[28.3433],[28.3191]]) clip_tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
clip_text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14").to(device)
pipeline = DiffusionPipeline.from_pretrained( # Initialize pipeline
"stable-diffusion-v1-5/stable-diffusion-v1-5", pipeline = DiffusionPipeline.from_pretrained(
text_encoder=text_encoder, "stable-diffusion-v1-5/stable-diffusion-v1-5",
tokenizer=tokenizer, text_encoder=clip_text_encoder,
custom_pipeline="gluegen" tokenizer=clip_tokenizer,
).to(device) custom_pipeline="gluegen",
pipeline.load_language_adapter("gluenet_French_clip_overnorm_over3_noln.ckpt", num_token=token_max_length, dim=1024, dim_out=768, tensor_norm=tensor_norm) safety_checker=None
).to(device)
os.makedirs("outputs", exist_ok=True)
prompt = "une voiture sur la plage" # Generate images
for language, prompt in LANGUAGE_PROMPTS.items():
generator = torch.Generator(device=device).manual_seed(42) checkpoint_file = f"gluenet_{language}_clip_overnorm_over3_noln.ckpt"
image = pipeline(prompt, generator=generator).images[0] checkpoint_path = os.path.join(checkpoint_dir, checkpoint_file)
image.save("gluegen_output_fr.png") try:
pipeline = load_checkpoint(pipeline, checkpoint_path, device)
output_path = f"outputs/gluegen_output_{language.lower()}.png"
generate_image(pipeline, prompt, device, output_path)
except Exception as e:
print(f"Error processing {language} model: {e}")
continue
if torch.cuda.is_available():
torch.cuda.empty_cache()
gc.collect()
``` ```
Which will produce: Which will produce:
...@@ -1188,28 +1251,49 @@ Currently uses the CLIPSeg model for mask generation, then calls the standard St ...@@ -1188,28 +1251,49 @@ Currently uses the CLIPSeg model for mask generation, then calls the standard St
```python ```python
from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation
from diffusers import DiffusionPipeline from diffusers import DiffusionPipeline
from PIL import Image from PIL import Image
import requests import requests
import torch
# Load CLIPSeg model and processor
processor = CLIPSegProcessor.from_pretrained("CIDAS/clipseg-rd64-refined") processor = CLIPSegProcessor.from_pretrained("CIDAS/clipseg-rd64-refined")
model = CLIPSegForImageSegmentation.from_pretrained("CIDAS/clipseg-rd64-refined") model = CLIPSegForImageSegmentation.from_pretrained("CIDAS/clipseg-rd64-refined").to("cuda")
# Load Stable Diffusion Inpainting Pipeline with custom pipeline
pipe = DiffusionPipeline.from_pretrained( pipe = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-inpainting", "runwayml/stable-diffusion-inpainting",
custom_pipeline="text_inpainting", custom_pipeline="text_inpainting",
segmentation_model=model, segmentation_model=model,
segmentation_processor=processor segmentation_processor=processor
) ).to("cuda")
pipe = pipe.to("cuda")
# Load input image
url = "https://github.com/timojl/clipseg/blob/master/example_image.jpg?raw=true" url = "https://github.com/timojl/clipseg/blob/master/example_image.jpg?raw=true"
image = Image.open(requests.get(url, stream=True).raw).resize((512, 512)) image = Image.open(requests.get(url, stream=True).raw)
text = "a glass" # will mask out this text
prompt = "a cup" # the masked out region will be replaced with this # Step 1: Resize input image for CLIPSeg (224x224)
segmentation_input = image.resize((224, 224))
image = pipe(image=image, text=text, prompt=prompt).images[0] # Step 2: Generate segmentation mask
text = "a glass" # Object to mask
inputs = processor(text=text, images=segmentation_input, return_tensors="pt").to("cuda")
with torch.no_grad():
mask = model(**inputs).logits.sigmoid() # Get segmentation mask
# Resize mask back to 512x512 for SD inpainting
mask = torch.nn.functional.interpolate(mask.unsqueeze(0), size=(512, 512), mode="bilinear").squeeze(0)
# Step 3: Resize input image for Stable Diffusion
image = image.resize((512, 512))
# Step 4: Run inpainting with Stable Diffusion
prompt = "a cup" # The masked-out region will be replaced with this
result = pipe(image=image, mask=mask, prompt=prompt,text=text).images[0]
# Save output
result.save("inpainting_output.png")
print("Inpainting completed. Image saved as 'inpainting_output.png'.")
``` ```
### Bit Diffusion ### Bit Diffusion
...@@ -1385,8 +1469,10 @@ There are 3 parameters for the method- ...@@ -1385,8 +1469,10 @@ There are 3 parameters for the method-
Here is an example usage- Here is an example usage-
```python ```python
import requests
from diffusers import DiffusionPipeline, DDIMScheduler from diffusers import DiffusionPipeline, DDIMScheduler
from PIL import Image from PIL import Image
from io import BytesIO
pipe = DiffusionPipeline.from_pretrained( pipe = DiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4", "CompVis/stable-diffusion-v1-4",
...@@ -1394,9 +1480,11 @@ pipe = DiffusionPipeline.from_pretrained( ...@@ -1394,9 +1480,11 @@ pipe = DiffusionPipeline.from_pretrained(
scheduler=DDIMScheduler.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="scheduler"), scheduler=DDIMScheduler.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="scheduler"),
).to('cuda') ).to('cuda')
img = Image.open('phone.jpg') url = "https://user-images.githubusercontent.com/59410571/209578593-141467c7-d831-4792-8b9a-b17dc5e47816.jpg"
response = requests.get(url)
image = Image.open(BytesIO(response.content)).convert("RGB") # Convert to RGB to avoid issues
mix_img = pipe( mix_img = pipe(
img, image,
prompt='bed', prompt='bed',
kmin=0.3, kmin=0.3,
kmax=0.5, kmax=0.5,
...@@ -1657,37 +1745,51 @@ from diffusers import DiffusionPipeline ...@@ -1657,37 +1745,51 @@ from diffusers import DiffusionPipeline
from PIL import Image from PIL import Image
from transformers import CLIPImageProcessor, CLIPModel from transformers import CLIPImageProcessor, CLIPModel
# Load CLIP model and feature extractor
feature_extractor = CLIPImageProcessor.from_pretrained( feature_extractor = CLIPImageProcessor.from_pretrained(
"laion/CLIP-ViT-B-32-laion2B-s34B-b79K" "laion/CLIP-ViT-B-32-laion2B-s34B-b79K"
) )
clip_model = CLIPModel.from_pretrained( clip_model = CLIPModel.from_pretrained(
"laion/CLIP-ViT-B-32-laion2B-s34B-b79K", torch_dtype=torch.float16 "laion/CLIP-ViT-B-32-laion2B-s34B-b79K", torch_dtype=torch.float16
) )
# Load guided pipeline
guided_pipeline = DiffusionPipeline.from_pretrained( guided_pipeline = DiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4", "CompVis/stable-diffusion-v1-4",
# custom_pipeline="clip_guided_stable_diffusion", custom_pipeline="clip_guided_stable_diffusion_img2img",
custom_pipeline="/home/njindal/diffusers/examples/community/clip_guided_stable_diffusion.py",
clip_model=clip_model, clip_model=clip_model,
feature_extractor=feature_extractor, feature_extractor=feature_extractor,
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
guided_pipeline.enable_attention_slicing() guided_pipeline.enable_attention_slicing()
guided_pipeline = guided_pipeline.to("cuda") guided_pipeline = guided_pipeline.to("cuda")
# Define prompt and fetch image
prompt = "fantasy book cover, full moon, fantasy forest landscape, golden vector elements, fantasy magic, dark light night, intricate, elegant, sharp focus, illustration, highly detailed, digital painting, concept art, matte, art by WLOP and Artgerm and Albert Bierstadt, masterpiece" prompt = "fantasy book cover, full moon, fantasy forest landscape, golden vector elements, fantasy magic, dark light night, intricate, elegant, sharp focus, illustration, highly detailed, digital painting, concept art, matte, art by WLOP and Artgerm and Albert Bierstadt, masterpiece"
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url) response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB") edit_image = Image.open(BytesIO(response.content)).convert("RGB")
# Run the pipeline
image = guided_pipeline( image = guided_pipeline(
prompt=prompt, prompt=prompt,
num_inference_steps=30, height=512, # Height of the output image
image=init_image, width=512, # Width of the output image
strength=0.75, image=edit_image, # Input image to guide the diffusion
guidance_scale=7.5, strength=0.75, # How much to transform the input image
clip_guidance_scale=100, num_inference_steps=30, # Number of diffusion steps
num_cutouts=4, guidance_scale=7.5, # Scale of the classifier-free guidance
use_cutouts=False, clip_guidance_scale=100, # Scale of the CLIP guidance
num_images_per_prompt=1, # Generate one image per prompt
eta=0.0, # Noise scheduling parameter
num_cutouts=4, # Number of cutouts for CLIP guidance
use_cutouts=False, # Whether to use cutouts
output_type="pil", # Output as PIL image
).images[0] ).images[0]
display(image)
# Display the generated image
image.show()
``` ```
Init Image Init Image
...@@ -2264,81 +2366,15 @@ CLIP guided stable diffusion images mixing pipeline allows to combine two images ...@@ -2264,81 +2366,15 @@ CLIP guided stable diffusion images mixing pipeline allows to combine two images
This approach is using (optional) CoCa model to avoid writing image description. This approach is using (optional) CoCa model to avoid writing image description.
[More code examples](https://github.com/TheDenk/images_mixing) [More code examples](https://github.com/TheDenk/images_mixing)
### Stable Diffusion XL Long Weighted Prompt Pipeline
This SDXL pipeline supports unlimited length prompt and negative prompt, compatible with A1111 prompt weighted style.
You can provide both `prompt` and `prompt_2`. If only one prompt is provided, `prompt_2` will be a copy of the provided `prompt`. Here is a sample code to use this pipeline.
```python
from diffusers import DiffusionPipeline
from diffusers.utils import load_image
import torch
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0"
, torch_dtype = torch.float16
, use_safetensors = True
, variant = "fp16"
, custom_pipeline = "lpw_stable_diffusion_xl",
)
prompt = "photo of a cute (white) cat running on the grass" * 20
prompt2 = "chasing (birds:1.5)" * 20
prompt = f"{prompt},{prompt2}"
neg_prompt = "blur, low quality, carton, animate"
pipe.to("cuda")
# text2img
t2i_images = pipe(
prompt=prompt,
negative_prompt=neg_prompt,
).images # alternatively, you can call the .text2img() function
# img2img
input_image = load_image("/path/to/local/image.png") # or URL to your input image
i2i_images = pipe.img2img(
prompt=prompt,
negative_prompt=neg_prompt,
image=input_image,
strength=0.8, # higher strength will result in more variation compared to original image
).images
# inpaint
input_mask = load_image("/path/to/local/mask.png") # or URL to your input inpainting mask
inpaint_images = pipe.inpaint(
prompt="photo of a cute (black) cat running on the grass" * 20,
negative_prompt=neg_prompt,
image=input_image,
mask=input_mask,
strength=0.6, # higher strength will result in more variation compared to original image
).images
pipe.to("cpu")
torch.cuda.empty_cache()
from IPython.display import display # assuming you are using this code in a notebook
display(t2i_images[0])
display(i2i_images[0])
display(inpaint_images[0])
```
In the above code, the `prompt2` is appended to the `prompt`, which is more than 77 tokens. "birds" are showing up in the result.
![Stable Diffusion XL Long Weighted Prompt Pipeline sample](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_long_weighted_prompt.png)
For more results, checkout [PR #6114](https://github.com/huggingface/diffusers/pull/6114).
### Example Images Mixing (with CoCa) ### Example Images Mixing (with CoCa)
```python ```python
import requests
from io import BytesIO
import PIL import PIL
import torch import torch
import requests
import open_clip import open_clip
from open_clip import SimpleTokenizer from open_clip import SimpleTokenizer
from io import BytesIO
from diffusers import DiffusionPipeline from diffusers import DiffusionPipeline
from transformers import CLIPImageProcessor, CLIPModel from transformers import CLIPImageProcessor, CLIPModel
...@@ -2401,10 +2437,79 @@ pipe_images = mixing_pipeline( ...@@ -2401,10 +2437,79 @@ pipe_images = mixing_pipeline(
clip_guidance_scale=100, clip_guidance_scale=100,
generator=generator, generator=generator,
).images ).images
output_path = "mixed_output.jpg"
pipe_images[0].save(output_path)
print(f"Image saved successfully at {output_path}")
``` ```
![image_mixing_result](https://huggingface.co/datasets/TheDenk/images_mixing/resolve/main/boromir_gigachad.png) ![image_mixing_result](https://huggingface.co/datasets/TheDenk/images_mixing/resolve/main/boromir_gigachad.png)
### Stable Diffusion XL Long Weighted Prompt Pipeline
This SDXL pipeline supports unlimited length prompt and negative prompt, compatible with A1111 prompt weighted style.
You can provide both `prompt` and `prompt_2`. If only one prompt is provided, `prompt_2` will be a copy of the provided `prompt`. Here is a sample code to use this pipeline.
```python
from diffusers import DiffusionPipeline
from diffusers.utils import load_image
import torch
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0"
, torch_dtype = torch.float16
, use_safetensors = True
, variant = "fp16"
, custom_pipeline = "lpw_stable_diffusion_xl",
)
prompt = "photo of a cute (white) cat running on the grass" * 20
prompt2 = "chasing (birds:1.5)" * 20
prompt = f"{prompt},{prompt2}"
neg_prompt = "blur, low quality, carton, animate"
pipe.to("cuda")
# text2img
t2i_images = pipe(
prompt=prompt,
negative_prompt=neg_prompt,
).images # alternatively, you can call the .text2img() function
# img2img
input_image = load_image("/path/to/local/image.png") # or URL to your input image
i2i_images = pipe.img2img(
prompt=prompt,
negative_prompt=neg_prompt,
image=input_image,
strength=0.8, # higher strength will result in more variation compared to original image
).images
# inpaint
input_mask = load_image("/path/to/local/mask.png") # or URL to your input inpainting mask
inpaint_images = pipe.inpaint(
prompt="photo of a cute (black) cat running on the grass" * 20,
negative_prompt=neg_prompt,
image=input_image,
mask=input_mask,
strength=0.6, # higher strength will result in more variation compared to original image
).images
pipe.to("cpu")
torch.cuda.empty_cache()
from IPython.display import display # assuming you are using this code in a notebook
display(t2i_images[0])
display(i2i_images[0])
display(inpaint_images[0])
```
In the above code, the `prompt2` is appended to the `prompt`, which is more than 77 tokens. "birds" are showing up in the result.
![Stable Diffusion XL Long Weighted Prompt Pipeline sample](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_long_weighted_prompt.png)
For more results, checkout [PR #6114](https://github.com/huggingface/diffusers/pull/6114).
### Stable Diffusion Mixture Tiling Pipeline SD 1.5 ### Stable Diffusion Mixture Tiling Pipeline SD 1.5
This pipeline uses the Mixture. Refer to the [Mixture](https://arxiv.org/abs/2302.02412) paper for more details. This pipeline uses the Mixture. Refer to the [Mixture](https://arxiv.org/abs/2302.02412) paper for more details.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment