Unverified Commit 06b01ea8 authored by YiYi Xu's avatar YiYi Xu Committed by GitHub
Browse files

[ip-adapter] refactor `prepare_ip_adapter_image_embeds` and skip load `image_encoder` (#7016)



* add
Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>

---------
Co-authored-by: default avataryiyixuxu <yixu310@gmail,com>
Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
parent f4fc7503
...@@ -234,6 +234,39 @@ export_to_gif(frames, "gummy_bear.gif") ...@@ -234,6 +234,39 @@ export_to_gif(frames, "gummy_bear.gif")
> [!TIP] > [!TIP]
> While calling `load_ip_adapter()`, pass `low_cpu_mem_usage=True` to speed up the loading time. > While calling `load_ip_adapter()`, pass `low_cpu_mem_usage=True` to speed up the loading time.
All the pipelines supporting IP-Adapter accept a `ip_adapter_image_embeds` argument. If you need to run the IP-Adapter multiple times with the same image, you can encode the image once and save the embedding to the disk.
```py
image_embeds = pipeline.prepare_ip_adapter_image_embeds(
ip_adapter_image=image,
ip_adapter_image_embeds=None,
device="cuda",
num_images_per_prompt=1,
do_classifier_free_guidance=True,
)
torch.save(image_embeds, "image_embeds.ipadpt")
```
Load the image embedding and pass it to the pipeline as `ip_adapter_image_embeds`
> [!TIP]
> ComfyUI image embeddings for IP-Adapters are fully compatible in Diffusers and should work out-of-box.
```py
image_embeds = torch.load("image_embeds.ipadpt")
images = pipeline(
prompt="a polar bear sitting in a chair drinking a milkshake",
ip_adapter_image_embeds=image_embeds,
negative_prompt="deformed, ugly, wrong proportion, low res, bad anatomy, worst quality, low quality",
num_inference_steps=100,
generator=generator,
).images
```
> [!TIP]
> If you use IP-Adapter with `ip_adapter_image_embedding` instead of `ip_adapter_image`, you can choose not to load an image encoder by passing `image_encoder_folder=None` to `load_ip_adapter()`.
## Specific use cases ## Specific use cases
IP-Adapter's image prompting and compatibility with other adapters and models makes it a versatile tool for a variety of use cases. This section covers some of the more popular applications of IP-Adapter, and we can't wait to see what you come up with! IP-Adapter's image prompting and compatibility with other adapters and models makes it a versatile tool for a variety of use cases. This section covers some of the more popular applications of IP-Adapter, and we can't wait to see what you come up with!
......
...@@ -799,8 +799,10 @@ class AnimateDiffControlNetPipeline( ...@@ -799,8 +799,10 @@ class AnimateDiffControlNetPipeline(
ip_adapter_image (`PipelineImageInput`, *optional*): ip_adapter_image (`PipelineImageInput`, *optional*):
Optional image input to work with IP Adapters. Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. If not Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
provided, embeddings are computed from the `ip_adapter_image` input argument. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should contain the negative image embedding
if `do_classifier_free_guidance` is set to `True`.
If not provided, embeddings are computed from the `ip_adapter_image` input argument.
conditioning_frames (`List[PipelineImageInput]`, *optional*): conditioning_frames (`List[PipelineImageInput]`, *optional*):
The ControlNet input condition to provide guidance to the `unet` for generation. If multiple ControlNets The ControlNet input condition to provide guidance to the `unet` for generation. If multiple ControlNets
are specified, images must be passed as a list such that each element of the list can be correctly are specified, images must be passed as a list such that each element of the list can be correctly
......
...@@ -798,8 +798,10 @@ class AnimateDiffImgToVideoPipeline( ...@@ -798,8 +798,10 @@ class AnimateDiffImgToVideoPipeline(
ip_adapter_image: (`PipelineImageInput`, *optional*): ip_adapter_image: (`PipelineImageInput`, *optional*):
Optional image input to work with IP Adapters. Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. If not Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
provided, embeddings are computed from the `ip_adapter_image` input argument. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should contain the negative image embedding
if `do_classifier_free_guidance` is set to `True`.
If not provided, embeddings are computed from the `ip_adapter_image` input argument.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generated video. Choose between `torch.FloatTensor`, `PIL.Image` or The output format of the generated video. Choose between `torch.FloatTensor`, `PIL.Image` or
`np.array`. `np.array`.
......
...@@ -13,7 +13,7 @@ ...@@ -13,7 +13,7 @@
# limitations under the License. # limitations under the License.
from pathlib import Path from pathlib import Path
from typing import Dict, List, Union from typing import Dict, List, Optional, Union
import torch import torch
from huggingface_hub.utils import validate_hf_hub_args from huggingface_hub.utils import validate_hf_hub_args
...@@ -52,11 +52,12 @@ class IPAdapterMixin: ...@@ -52,11 +52,12 @@ class IPAdapterMixin:
pretrained_model_name_or_path_or_dict: Union[str, List[str], Dict[str, torch.Tensor]], pretrained_model_name_or_path_or_dict: Union[str, List[str], Dict[str, torch.Tensor]],
subfolder: Union[str, List[str]], subfolder: Union[str, List[str]],
weight_name: Union[str, List[str]], weight_name: Union[str, List[str]],
image_encoder_folder: Optional[str] = "image_encoder",
**kwargs, **kwargs,
): ):
""" """
Parameters: Parameters:
pretrained_model_name_or_path_or_dict (`str` or `os.PathLike` or `dict`): pretrained_model_name_or_path_or_dict (`str` or `List[str]` or `os.PathLike` or `List[os.PathLike]` or `dict` or `List[dict]`):
Can be either: Can be either:
- A string, the *model id* (for example `google/ddpm-celebahq-256`) of a pretrained model hosted on - A string, the *model id* (for example `google/ddpm-celebahq-256`) of a pretrained model hosted on
...@@ -65,7 +66,18 @@ class IPAdapterMixin: ...@@ -65,7 +66,18 @@ class IPAdapterMixin:
with [`ModelMixin.save_pretrained`]. with [`ModelMixin.save_pretrained`].
- A [torch state - A [torch state
dict](https://pytorch.org/tutorials/beginner/saving_loading_models.html#what-is-a-state-dict). dict](https://pytorch.org/tutorials/beginner/saving_loading_models.html#what-is-a-state-dict).
subfolder (`str` or `List[str]`):
The subfolder location of a model file within a larger model repository on the Hub or locally.
If a list is passed, it should have the same length as `weight_name`.
weight_name (`str` or `List[str]`):
The name of the weight file to load. If a list is passed, it should have the same length as
`weight_name`.
image_encoder_folder (`str`, *optional*, defaults to `image_encoder`):
The subfolder location of the image encoder within a larger model repository on the Hub or locally.
Pass `None` to not load the image encoder. If the image encoder is located in a folder inside `subfolder`,
you only need to pass the name of the folder that contains image encoder weights, e.g. `image_encoder_folder="image_encoder"`.
If the image encoder is located in a folder other than `subfolder`, you should pass the path to the folder that contains image encoder weights,
for example, `image_encoder_folder="different_subfolder/image_encoder"`.
cache_dir (`Union[str, os.PathLike]`, *optional*): cache_dir (`Union[str, os.PathLike]`, *optional*):
Path to a directory where a downloaded pretrained model configuration is cached if the standard cache Path to a directory where a downloaded pretrained model configuration is cached if the standard cache
is not used. is not used.
...@@ -87,8 +99,6 @@ class IPAdapterMixin: ...@@ -87,8 +99,6 @@ class IPAdapterMixin:
revision (`str`, *optional*, defaults to `"main"`): revision (`str`, *optional*, defaults to `"main"`):
The specific model version to use. It can be a branch name, a tag name, a commit id, or any identifier The specific model version to use. It can be a branch name, a tag name, a commit id, or any identifier
allowed by Git. allowed by Git.
subfolder (`str`, *optional*, defaults to `""`):
The subfolder location of a model file within a larger model repository on the Hub or locally.
low_cpu_mem_usage (`bool`, *optional*, defaults to `True` if torch version >= 1.9.0 else `False`): low_cpu_mem_usage (`bool`, *optional*, defaults to `True` if torch version >= 1.9.0 else `False`):
Speed up model loading only loading the pretrained weights and not initializing the weights. This also Speed up model loading only loading the pretrained weights and not initializing the weights. This also
tries to not use more than 1x model size in CPU memory (including peak memory) while loading the model. tries to not use more than 1x model size in CPU memory (including peak memory) while loading the model.
...@@ -184,16 +194,29 @@ class IPAdapterMixin: ...@@ -184,16 +194,29 @@ class IPAdapterMixin:
# load CLIP image encoder here if it has not been registered to the pipeline yet # load CLIP image encoder here if it has not been registered to the pipeline yet
if hasattr(self, "image_encoder") and getattr(self, "image_encoder", None) is None: if hasattr(self, "image_encoder") and getattr(self, "image_encoder", None) is None:
if not isinstance(pretrained_model_name_or_path_or_dict, dict): if image_encoder_folder is not None:
logger.info(f"loading image_encoder from {pretrained_model_name_or_path_or_dict}") if not isinstance(pretrained_model_name_or_path_or_dict, dict):
image_encoder = CLIPVisionModelWithProjection.from_pretrained( logger.info(f"loading image_encoder from {pretrained_model_name_or_path_or_dict}")
pretrained_model_name_or_path_or_dict, if image_encoder_folder.count("/") == 0:
subfolder=Path(subfolder, "image_encoder").as_posix(), image_encoder_subfolder = Path(subfolder, image_encoder_folder).as_posix()
low_cpu_mem_usage=low_cpu_mem_usage, else:
).to(self.device, dtype=self.dtype) image_encoder_subfolder = Path(image_encoder_folder).as_posix()
self.register_modules(image_encoder=image_encoder)
image_encoder = CLIPVisionModelWithProjection.from_pretrained(
pretrained_model_name_or_path_or_dict,
subfolder=image_encoder_subfolder,
low_cpu_mem_usage=low_cpu_mem_usage,
).to(self.device, dtype=self.dtype)
self.register_modules(image_encoder=image_encoder)
else:
raise ValueError(
"`image_encoder` cannot be loaded because `pretrained_model_name_or_path_or_dict` is a state dict."
)
else: else:
raise ValueError("`image_encoder` cannot be None when using IP Adapters.") logger.warning(
"image_encoder is not loaded since `image_encoder_folder=None` passed. You will not be able to use `ip_adapter_image` when calling the pipeline with IP-Adapter."
"Use `ip_adapter_image_embedding` to pass pre-geneated image embedding instead."
)
# create feature extractor if it has not been registered to the pipeline yet # create feature extractor if it has not been registered to the pipeline yet
if hasattr(self, "feature_extractor") and getattr(self, "feature_extractor", None) is None: if hasattr(self, "feature_extractor") and getattr(self, "feature_extractor", None) is None:
......
...@@ -370,7 +370,7 @@ class AnimateDiffPipeline( ...@@ -370,7 +370,7 @@ class AnimateDiffPipeline(
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds
def prepare_ip_adapter_image_embeds( def prepare_ip_adapter_image_embeds(
self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance
): ):
if ip_adapter_image_embeds is None: if ip_adapter_image_embeds is None:
if not isinstance(ip_adapter_image, list): if not isinstance(ip_adapter_image, list):
...@@ -394,13 +394,23 @@ class AnimateDiffPipeline( ...@@ -394,13 +394,23 @@ class AnimateDiffPipeline(
[single_negative_image_embeds] * num_images_per_prompt, dim=0 [single_negative_image_embeds] * num_images_per_prompt, dim=0
) )
if self.do_classifier_free_guidance: if do_classifier_free_guidance:
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds]) single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
single_image_embeds = single_image_embeds.to(device) single_image_embeds = single_image_embeds.to(device)
image_embeds.append(single_image_embeds) image_embeds.append(single_image_embeds)
else: else:
image_embeds = ip_adapter_image_embeds image_embeds = []
for single_image_embeds in ip_adapter_image_embeds:
if do_classifier_free_guidance:
single_negative_image_embeds, single_image_embeds = single_image_embeds.chunk(2)
single_negative_image_embeds = single_negative_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
else:
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
image_embeds.append(single_image_embeds)
return image_embeds return image_embeds
# Copied from diffusers.pipelines.text_to_video_synthesis/pipeline_text_to_video_synth.TextToVideoSDPipeline.decode_latents # Copied from diffusers.pipelines.text_to_video_synthesis/pipeline_text_to_video_synth.TextToVideoSDPipeline.decode_latents
...@@ -494,6 +504,16 @@ class AnimateDiffPipeline( ...@@ -494,6 +504,16 @@ class AnimateDiffPipeline(
"Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined." "Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined."
) )
if ip_adapter_image_embeds is not None:
if not isinstance(ip_adapter_image_embeds, list):
raise ValueError(
f"`ip_adapter_image_embeds` has to be of type `list` but is {type(ip_adapter_image_embeds)}"
)
elif ip_adapter_image_embeds[0].ndim != 3:
raise ValueError(
f"`ip_adapter_image_embeds` has to be a list of 3D tensors but is {ip_adapter_image_embeds[0].ndim}D"
)
# Copied from diffusers.pipelines.text_to_video_synthesis.pipeline_text_to_video_synth.TextToVideoSDPipeline.prepare_latents # Copied from diffusers.pipelines.text_to_video_synthesis.pipeline_text_to_video_synth.TextToVideoSDPipeline.prepare_latents
def prepare_latents( def prepare_latents(
self, batch_size, num_channels_latents, num_frames, height, width, dtype, device, generator, latents=None self, batch_size, num_channels_latents, num_frames, height, width, dtype, device, generator, latents=None
...@@ -612,8 +632,10 @@ class AnimateDiffPipeline( ...@@ -612,8 +632,10 @@ class AnimateDiffPipeline(
ip_adapter_image: (`PipelineImageInput`, *optional*): ip_adapter_image: (`PipelineImageInput`, *optional*):
Optional image input to work with IP Adapters. Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. If not Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
provided, embeddings are computed from the `ip_adapter_image` input argument. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should contain the negative image embedding
if `do_classifier_free_guidance` is set to `True`.
If not provided, embeddings are computed from the `ip_adapter_image` input argument.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generated video. Choose between `torch.FloatTensor`, `PIL.Image` or The output format of the generated video. Choose between `torch.FloatTensor`, `PIL.Image` or
`np.array`. `np.array`.
...@@ -717,7 +739,11 @@ class AnimateDiffPipeline( ...@@ -717,7 +739,11 @@ class AnimateDiffPipeline(
if ip_adapter_image is not None or ip_adapter_image_embeds is not None: if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
image_embeds = self.prepare_ip_adapter_image_embeds( image_embeds = self.prepare_ip_adapter_image_embeds(
ip_adapter_image, ip_adapter_image_embeds, device, batch_size * num_videos_per_prompt ip_adapter_image,
ip_adapter_image_embeds,
device,
batch_size * num_videos_per_prompt,
self.do_classifier_free_guidance,
) )
# 4. Prepare timesteps # 4. Prepare timesteps
......
...@@ -448,7 +448,7 @@ class AnimateDiffVideoToVideoPipeline( ...@@ -448,7 +448,7 @@ class AnimateDiffVideoToVideoPipeline(
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds
def prepare_ip_adapter_image_embeds( def prepare_ip_adapter_image_embeds(
self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance
): ):
if ip_adapter_image_embeds is None: if ip_adapter_image_embeds is None:
if not isinstance(ip_adapter_image, list): if not isinstance(ip_adapter_image, list):
...@@ -472,13 +472,23 @@ class AnimateDiffVideoToVideoPipeline( ...@@ -472,13 +472,23 @@ class AnimateDiffVideoToVideoPipeline(
[single_negative_image_embeds] * num_images_per_prompt, dim=0 [single_negative_image_embeds] * num_images_per_prompt, dim=0
) )
if self.do_classifier_free_guidance: if do_classifier_free_guidance:
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds]) single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
single_image_embeds = single_image_embeds.to(device) single_image_embeds = single_image_embeds.to(device)
image_embeds.append(single_image_embeds) image_embeds.append(single_image_embeds)
else: else:
image_embeds = ip_adapter_image_embeds image_embeds = []
for single_image_embeds in ip_adapter_image_embeds:
if do_classifier_free_guidance:
single_negative_image_embeds, single_image_embeds = single_image_embeds.chunk(2)
single_negative_image_embeds = single_negative_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
else:
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
image_embeds.append(single_image_embeds)
return image_embeds return image_embeds
# Copied from diffusers.pipelines.text_to_video_synthesis/pipeline_text_to_video_synth.TextToVideoSDPipeline.decode_latents # Copied from diffusers.pipelines.text_to_video_synthesis/pipeline_text_to_video_synth.TextToVideoSDPipeline.decode_latents
...@@ -523,6 +533,8 @@ class AnimateDiffVideoToVideoPipeline( ...@@ -523,6 +533,8 @@ class AnimateDiffVideoToVideoPipeline(
negative_prompt=None, negative_prompt=None,
prompt_embeds=None, prompt_embeds=None,
negative_prompt_embeds=None, negative_prompt_embeds=None,
ip_adapter_image=None,
ip_adapter_image_embeds=None,
callback_on_step_end_tensor_inputs=None, callback_on_step_end_tensor_inputs=None,
): ):
if strength < 0 or strength > 1: if strength < 0 or strength > 1:
...@@ -567,6 +579,21 @@ class AnimateDiffVideoToVideoPipeline( ...@@ -567,6 +579,21 @@ class AnimateDiffVideoToVideoPipeline(
if video is not None and latents is not None: if video is not None and latents is not None:
raise ValueError("Only one of `video` or `latents` should be provided") raise ValueError("Only one of `video` or `latents` should be provided")
if ip_adapter_image is not None and ip_adapter_image_embeds is not None:
raise ValueError(
"Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined."
)
if ip_adapter_image_embeds is not None:
if not isinstance(ip_adapter_image_embeds, list):
raise ValueError(
f"`ip_adapter_image_embeds` has to be of type `list` but is {type(ip_adapter_image_embeds)}"
)
elif ip_adapter_image_embeds[0].ndim != 3:
raise ValueError(
f"`ip_adapter_image_embeds` has to be a list of 3D tensors but is {ip_adapter_image_embeds[0].ndim}D"
)
def get_timesteps(self, num_inference_steps, timesteps, strength, device): def get_timesteps(self, num_inference_steps, timesteps, strength, device):
# get the original timestep using init_timestep # get the original timestep using init_timestep
init_timestep = min(int(num_inference_steps * strength), num_inference_steps) init_timestep = min(int(num_inference_steps * strength), num_inference_steps)
...@@ -765,8 +792,10 @@ class AnimateDiffVideoToVideoPipeline( ...@@ -765,8 +792,10 @@ class AnimateDiffVideoToVideoPipeline(
ip_adapter_image: (`PipelineImageInput`, *optional*): ip_adapter_image: (`PipelineImageInput`, *optional*):
Optional image input to work with IP Adapters. Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. If not Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
provided, embeddings are computed from the `ip_adapter_image` input argument. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should contain the negative image embedding
if `do_classifier_free_guidance` is set to `True`.
If not provided, embeddings are computed from the `ip_adapter_image` input argument.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generated video. Choose between `torch.FloatTensor`, `PIL.Image` or The output format of the generated video. Choose between `torch.FloatTensor`, `PIL.Image` or
`np.array`. `np.array`.
...@@ -814,6 +843,8 @@ class AnimateDiffVideoToVideoPipeline( ...@@ -814,6 +843,8 @@ class AnimateDiffVideoToVideoPipeline(
negative_prompt_embeds=negative_prompt_embeds, negative_prompt_embeds=negative_prompt_embeds,
video=video, video=video,
latents=latents, latents=latents,
ip_adapter_image=ip_adapter_image,
ip_adapter_image_embeds=ip_adapter_image_embeds,
callback_on_step_end_tensor_inputs=callback_on_step_end_tensor_inputs, callback_on_step_end_tensor_inputs=callback_on_step_end_tensor_inputs,
) )
...@@ -855,7 +886,11 @@ class AnimateDiffVideoToVideoPipeline( ...@@ -855,7 +886,11 @@ class AnimateDiffVideoToVideoPipeline(
if ip_adapter_image is not None or ip_adapter_image_embeds is not None: if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
image_embeds = self.prepare_ip_adapter_image_embeds( image_embeds = self.prepare_ip_adapter_image_embeds(
ip_adapter_image, ip_adapter_image_embeds, device, batch_size * num_videos_per_prompt ip_adapter_image,
ip_adapter_image_embeds,
device,
batch_size * num_videos_per_prompt,
self.do_classifier_free_guidance,
) )
# 4. Prepare timesteps # 4. Prepare timesteps
......
...@@ -480,7 +480,7 @@ class StableDiffusionControlNetPipeline( ...@@ -480,7 +480,7 @@ class StableDiffusionControlNetPipeline(
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds
def prepare_ip_adapter_image_embeds( def prepare_ip_adapter_image_embeds(
self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance
): ):
if ip_adapter_image_embeds is None: if ip_adapter_image_embeds is None:
if not isinstance(ip_adapter_image, list): if not isinstance(ip_adapter_image, list):
...@@ -504,13 +504,23 @@ class StableDiffusionControlNetPipeline( ...@@ -504,13 +504,23 @@ class StableDiffusionControlNetPipeline(
[single_negative_image_embeds] * num_images_per_prompt, dim=0 [single_negative_image_embeds] * num_images_per_prompt, dim=0
) )
if self.do_classifier_free_guidance: if do_classifier_free_guidance:
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds]) single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
single_image_embeds = single_image_embeds.to(device) single_image_embeds = single_image_embeds.to(device)
image_embeds.append(single_image_embeds) image_embeds.append(single_image_embeds)
else: else:
image_embeds = ip_adapter_image_embeds image_embeds = []
for single_image_embeds in ip_adapter_image_embeds:
if do_classifier_free_guidance:
single_negative_image_embeds, single_image_embeds = single_image_embeds.chunk(2)
single_negative_image_embeds = single_negative_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
else:
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
image_embeds.append(single_image_embeds)
return image_embeds return image_embeds
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.run_safety_checker # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.run_safety_checker
...@@ -711,6 +721,16 @@ class StableDiffusionControlNetPipeline( ...@@ -711,6 +721,16 @@ class StableDiffusionControlNetPipeline(
"Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined." "Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined."
) )
if ip_adapter_image_embeds is not None:
if not isinstance(ip_adapter_image_embeds, list):
raise ValueError(
f"`ip_adapter_image_embeds` has to be of type `list` but is {type(ip_adapter_image_embeds)}"
)
elif ip_adapter_image_embeds[0].ndim != 3:
raise ValueError(
f"`ip_adapter_image_embeds` has to be a list of 3D tensors but is {ip_adapter_image_embeds[0].ndim}D"
)
def check_image(self, image, prompt, prompt_embeds): def check_image(self, image, prompt, prompt_embeds):
image_is_pil = isinstance(image, PIL.Image.Image) image_is_pil = isinstance(image, PIL.Image.Image)
image_is_tensor = isinstance(image, torch.Tensor) image_is_tensor = isinstance(image, torch.Tensor)
...@@ -933,8 +953,10 @@ class StableDiffusionControlNetPipeline( ...@@ -933,8 +953,10 @@ class StableDiffusionControlNetPipeline(
not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument. not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument.
ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters. ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. If not Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
provided, embeddings are computed from the `ip_adapter_image` input argument. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should contain the negative image embedding
if `do_classifier_free_guidance` is set to `True`.
If not provided, embeddings are computed from the `ip_adapter_image` input argument.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generated image. Choose between `PIL.Image` or `np.array`. The output format of the generated image. Choose between `PIL.Image` or `np.array`.
return_dict (`bool`, *optional*, defaults to `True`): return_dict (`bool`, *optional*, defaults to `True`):
...@@ -1076,7 +1098,11 @@ class StableDiffusionControlNetPipeline( ...@@ -1076,7 +1098,11 @@ class StableDiffusionControlNetPipeline(
if ip_adapter_image is not None or ip_adapter_image_embeds is not None: if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
image_embeds = self.prepare_ip_adapter_image_embeds( image_embeds = self.prepare_ip_adapter_image_embeds(
ip_adapter_image, ip_adapter_image_embeds, device, batch_size * num_images_per_prompt ip_adapter_image,
ip_adapter_image_embeds,
device,
batch_size * num_images_per_prompt,
self.do_classifier_free_guidance,
) )
# 4. Prepare image # 4. Prepare image
......
...@@ -473,7 +473,7 @@ class StableDiffusionControlNetImg2ImgPipeline( ...@@ -473,7 +473,7 @@ class StableDiffusionControlNetImg2ImgPipeline(
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds
def prepare_ip_adapter_image_embeds( def prepare_ip_adapter_image_embeds(
self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance
): ):
if ip_adapter_image_embeds is None: if ip_adapter_image_embeds is None:
if not isinstance(ip_adapter_image, list): if not isinstance(ip_adapter_image, list):
...@@ -497,13 +497,23 @@ class StableDiffusionControlNetImg2ImgPipeline( ...@@ -497,13 +497,23 @@ class StableDiffusionControlNetImg2ImgPipeline(
[single_negative_image_embeds] * num_images_per_prompt, dim=0 [single_negative_image_embeds] * num_images_per_prompt, dim=0
) )
if self.do_classifier_free_guidance: if do_classifier_free_guidance:
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds]) single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
single_image_embeds = single_image_embeds.to(device) single_image_embeds = single_image_embeds.to(device)
image_embeds.append(single_image_embeds) image_embeds.append(single_image_embeds)
else: else:
image_embeds = ip_adapter_image_embeds image_embeds = []
for single_image_embeds in ip_adapter_image_embeds:
if do_classifier_free_guidance:
single_negative_image_embeds, single_image_embeds = single_image_embeds.chunk(2)
single_negative_image_embeds = single_negative_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
else:
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
image_embeds.append(single_image_embeds)
return image_embeds return image_embeds
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.run_safety_checker # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.run_safety_checker
...@@ -698,6 +708,16 @@ class StableDiffusionControlNetImg2ImgPipeline( ...@@ -698,6 +708,16 @@ class StableDiffusionControlNetImg2ImgPipeline(
"Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined." "Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined."
) )
if ip_adapter_image_embeds is not None:
if not isinstance(ip_adapter_image_embeds, list):
raise ValueError(
f"`ip_adapter_image_embeds` has to be of type `list` but is {type(ip_adapter_image_embeds)}"
)
elif ip_adapter_image_embeds[0].ndim != 3:
raise ValueError(
f"`ip_adapter_image_embeds` has to be a list of 3D tensors but is {ip_adapter_image_embeds[0].ndim}D"
)
# Copied from diffusers.pipelines.controlnet.pipeline_controlnet.StableDiffusionControlNetPipeline.check_image # Copied from diffusers.pipelines.controlnet.pipeline_controlnet.StableDiffusionControlNetPipeline.check_image
def check_image(self, image, prompt, prompt_embeds): def check_image(self, image, prompt, prompt_embeds):
image_is_pil = isinstance(image, PIL.Image.Image) image_is_pil = isinstance(image, PIL.Image.Image)
...@@ -951,8 +971,10 @@ class StableDiffusionControlNetImg2ImgPipeline( ...@@ -951,8 +971,10 @@ class StableDiffusionControlNetImg2ImgPipeline(
not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument. not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument.
ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters. ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. If not Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
provided, embeddings are computed from the `ip_adapter_image` input argument. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should contain the negative image embedding
if `do_classifier_free_guidance` is set to `True`.
If not provided, embeddings are computed from the `ip_adapter_image` input argument.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generated image. Choose between `PIL.Image` or `np.array`. The output format of the generated image. Choose between `PIL.Image` or `np.array`.
return_dict (`bool`, *optional*, defaults to `True`): return_dict (`bool`, *optional*, defaults to `True`):
...@@ -1088,7 +1110,11 @@ class StableDiffusionControlNetImg2ImgPipeline( ...@@ -1088,7 +1110,11 @@ class StableDiffusionControlNetImg2ImgPipeline(
if ip_adapter_image is not None or ip_adapter_image_embeds is not None: if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
image_embeds = self.prepare_ip_adapter_image_embeds( image_embeds = self.prepare_ip_adapter_image_embeds(
ip_adapter_image, ip_adapter_image_embeds, device, batch_size * num_images_per_prompt ip_adapter_image,
ip_adapter_image_embeds,
device,
batch_size * num_images_per_prompt,
self.do_classifier_free_guidance,
) )
# 4. Prepare image # 4. Prepare image
......
...@@ -598,7 +598,7 @@ class StableDiffusionControlNetInpaintPipeline( ...@@ -598,7 +598,7 @@ class StableDiffusionControlNetInpaintPipeline(
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds
def prepare_ip_adapter_image_embeds( def prepare_ip_adapter_image_embeds(
self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance
): ):
if ip_adapter_image_embeds is None: if ip_adapter_image_embeds is None:
if not isinstance(ip_adapter_image, list): if not isinstance(ip_adapter_image, list):
...@@ -622,13 +622,23 @@ class StableDiffusionControlNetInpaintPipeline( ...@@ -622,13 +622,23 @@ class StableDiffusionControlNetInpaintPipeline(
[single_negative_image_embeds] * num_images_per_prompt, dim=0 [single_negative_image_embeds] * num_images_per_prompt, dim=0
) )
if self.do_classifier_free_guidance: if do_classifier_free_guidance:
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds]) single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
single_image_embeds = single_image_embeds.to(device) single_image_embeds = single_image_embeds.to(device)
image_embeds.append(single_image_embeds) image_embeds.append(single_image_embeds)
else: else:
image_embeds = ip_adapter_image_embeds image_embeds = []
for single_image_embeds in ip_adapter_image_embeds:
if do_classifier_free_guidance:
single_negative_image_embeds, single_image_embeds = single_image_embeds.chunk(2)
single_negative_image_embeds = single_negative_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
else:
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
image_embeds.append(single_image_embeds)
return image_embeds return image_embeds
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.run_safety_checker # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.run_safety_checker
...@@ -856,6 +866,16 @@ class StableDiffusionControlNetInpaintPipeline( ...@@ -856,6 +866,16 @@ class StableDiffusionControlNetInpaintPipeline(
"Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined." "Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined."
) )
if ip_adapter_image_embeds is not None:
if not isinstance(ip_adapter_image_embeds, list):
raise ValueError(
f"`ip_adapter_image_embeds` has to be of type `list` but is {type(ip_adapter_image_embeds)}"
)
elif ip_adapter_image_embeds[0].ndim != 3:
raise ValueError(
f"`ip_adapter_image_embeds` has to be a list of 3D tensors but is {ip_adapter_image_embeds[0].ndim}D"
)
# Copied from diffusers.pipelines.controlnet.pipeline_controlnet.StableDiffusionControlNetPipeline.check_image # Copied from diffusers.pipelines.controlnet.pipeline_controlnet.StableDiffusionControlNetPipeline.check_image
def check_image(self, image, prompt, prompt_embeds): def check_image(self, image, prompt, prompt_embeds):
image_is_pil = isinstance(image, PIL.Image.Image) image_is_pil = isinstance(image, PIL.Image.Image)
...@@ -1180,8 +1200,10 @@ class StableDiffusionControlNetInpaintPipeline( ...@@ -1180,8 +1200,10 @@ class StableDiffusionControlNetInpaintPipeline(
not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument. not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument.
ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters. ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. If not Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
provided, embeddings are computed from the `ip_adapter_image` input argument. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should contain the negative image embedding
if `do_classifier_free_guidance` is set to `True`.
If not provided, embeddings are computed from the `ip_adapter_image` input argument.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generated image. Choose between `PIL.Image` or `np.array`. The output format of the generated image. Choose between `PIL.Image` or `np.array`.
return_dict (`bool`, *optional*, defaults to `True`): return_dict (`bool`, *optional*, defaults to `True`):
...@@ -1330,7 +1352,11 @@ class StableDiffusionControlNetInpaintPipeline( ...@@ -1330,7 +1352,11 @@ class StableDiffusionControlNetInpaintPipeline(
if ip_adapter_image is not None or ip_adapter_image_embeds is not None: if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
image_embeds = self.prepare_ip_adapter_image_embeds( image_embeds = self.prepare_ip_adapter_image_embeds(
ip_adapter_image, ip_adapter_image_embeds, device, batch_size * num_images_per_prompt ip_adapter_image,
ip_adapter_image_embeds,
device,
batch_size * num_images_per_prompt,
self.do_classifier_free_guidance,
) )
# 4. Prepare image # 4. Prepare image
......
...@@ -507,7 +507,7 @@ class StableDiffusionXLControlNetInpaintPipeline( ...@@ -507,7 +507,7 @@ class StableDiffusionXLControlNetInpaintPipeline(
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds
def prepare_ip_adapter_image_embeds( def prepare_ip_adapter_image_embeds(
self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance
): ):
if ip_adapter_image_embeds is None: if ip_adapter_image_embeds is None:
if not isinstance(ip_adapter_image, list): if not isinstance(ip_adapter_image, list):
...@@ -531,13 +531,23 @@ class StableDiffusionXLControlNetInpaintPipeline( ...@@ -531,13 +531,23 @@ class StableDiffusionXLControlNetInpaintPipeline(
[single_negative_image_embeds] * num_images_per_prompt, dim=0 [single_negative_image_embeds] * num_images_per_prompt, dim=0
) )
if self.do_classifier_free_guidance: if do_classifier_free_guidance:
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds]) single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
single_image_embeds = single_image_embeds.to(device) single_image_embeds = single_image_embeds.to(device)
image_embeds.append(single_image_embeds) image_embeds.append(single_image_embeds)
else: else:
image_embeds = ip_adapter_image_embeds image_embeds = []
for single_image_embeds in ip_adapter_image_embeds:
if do_classifier_free_guidance:
single_negative_image_embeds, single_image_embeds = single_image_embeds.chunk(2)
single_negative_image_embeds = single_negative_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
else:
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
image_embeds.append(single_image_embeds)
return image_embeds return image_embeds
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_extra_step_kwargs # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_extra_step_kwargs
...@@ -802,6 +812,16 @@ class StableDiffusionXLControlNetInpaintPipeline( ...@@ -802,6 +812,16 @@ class StableDiffusionXLControlNetInpaintPipeline(
"Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined." "Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined."
) )
if ip_adapter_image_embeds is not None:
if not isinstance(ip_adapter_image_embeds, list):
raise ValueError(
f"`ip_adapter_image_embeds` has to be of type `list` but is {type(ip_adapter_image_embeds)}"
)
elif ip_adapter_image_embeds[0].ndim != 3:
raise ValueError(
f"`ip_adapter_image_embeds` has to be a list of 3D tensors but is {ip_adapter_image_embeds[0].ndim}D"
)
def prepare_control_image( def prepare_control_image(
self, self,
image, image,
...@@ -1220,8 +1240,10 @@ class StableDiffusionXLControlNetInpaintPipeline( ...@@ -1220,8 +1240,10 @@ class StableDiffusionXLControlNetInpaintPipeline(
argument. argument.
ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters. ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. If not Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
provided, embeddings are computed from the `ip_adapter_image` input argument. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should contain the negative image embedding
if `do_classifier_free_guidance` is set to `True`.
If not provided, embeddings are computed from the `ip_adapter_image` input argument.
pooled_prompt_embeds (`torch.FloatTensor`, *optional*): pooled_prompt_embeds (`torch.FloatTensor`, *optional*):
Pre-generated pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. Pre-generated pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting.
If not provided, pooled text embeddings will be generated from `prompt` input argument. If not provided, pooled text embeddings will be generated from `prompt` input argument.
...@@ -1411,7 +1433,11 @@ class StableDiffusionXLControlNetInpaintPipeline( ...@@ -1411,7 +1433,11 @@ class StableDiffusionXLControlNetInpaintPipeline(
# 3.1 Encode ip_adapter_image # 3.1 Encode ip_adapter_image
if ip_adapter_image is not None or ip_adapter_image_embeds is not None: if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
image_embeds = self.prepare_ip_adapter_image_embeds( image_embeds = self.prepare_ip_adapter_image_embeds(
ip_adapter_image, ip_adapter_image_embeds, device, batch_size * num_images_per_prompt ip_adapter_image,
ip_adapter_image_embeds,
device,
batch_size * num_images_per_prompt,
self.do_classifier_free_guidance,
) )
# 4. set timesteps # 4. set timesteps
......
...@@ -485,7 +485,7 @@ class StableDiffusionXLControlNetPipeline( ...@@ -485,7 +485,7 @@ class StableDiffusionXLControlNetPipeline(
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds
def prepare_ip_adapter_image_embeds( def prepare_ip_adapter_image_embeds(
self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance
): ):
if ip_adapter_image_embeds is None: if ip_adapter_image_embeds is None:
if not isinstance(ip_adapter_image, list): if not isinstance(ip_adapter_image, list):
...@@ -509,13 +509,23 @@ class StableDiffusionXLControlNetPipeline( ...@@ -509,13 +509,23 @@ class StableDiffusionXLControlNetPipeline(
[single_negative_image_embeds] * num_images_per_prompt, dim=0 [single_negative_image_embeds] * num_images_per_prompt, dim=0
) )
if self.do_classifier_free_guidance: if do_classifier_free_guidance:
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds]) single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
single_image_embeds = single_image_embeds.to(device) single_image_embeds = single_image_embeds.to(device)
image_embeds.append(single_image_embeds) image_embeds.append(single_image_embeds)
else: else:
image_embeds = ip_adapter_image_embeds image_embeds = []
for single_image_embeds in ip_adapter_image_embeds:
if do_classifier_free_guidance:
single_negative_image_embeds, single_image_embeds = single_image_embeds.chunk(2)
single_negative_image_embeds = single_negative_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
else:
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
image_embeds.append(single_image_embeds)
return image_embeds return image_embeds
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_extra_step_kwargs # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_extra_step_kwargs
...@@ -715,6 +725,16 @@ class StableDiffusionXLControlNetPipeline( ...@@ -715,6 +725,16 @@ class StableDiffusionXLControlNetPipeline(
"Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined." "Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined."
) )
if ip_adapter_image_embeds is not None:
if not isinstance(ip_adapter_image_embeds, list):
raise ValueError(
f"`ip_adapter_image_embeds` has to be of type `list` but is {type(ip_adapter_image_embeds)}"
)
elif ip_adapter_image_embeds[0].ndim != 3:
raise ValueError(
f"`ip_adapter_image_embeds` has to be a list of 3D tensors but is {ip_adapter_image_embeds[0].ndim}D"
)
# Copied from diffusers.pipelines.controlnet.pipeline_controlnet.StableDiffusionControlNetPipeline.check_image # Copied from diffusers.pipelines.controlnet.pipeline_controlnet.StableDiffusionControlNetPipeline.check_image
def check_image(self, image, prompt, prompt_embeds): def check_image(self, image, prompt, prompt_embeds):
image_is_pil = isinstance(image, PIL.Image.Image) image_is_pil = isinstance(image, PIL.Image.Image)
...@@ -998,8 +1018,10 @@ class StableDiffusionXLControlNetPipeline( ...@@ -998,8 +1018,10 @@ class StableDiffusionXLControlNetPipeline(
argument. argument.
ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters. ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. If not Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
provided, embeddings are computed from the `ip_adapter_image` input argument. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should contain the negative image embedding
if `do_classifier_free_guidance` is set to `True`.
If not provided, embeddings are computed from the `ip_adapter_image` input argument.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generated image. Choose between `PIL.Image` or `np.array`. The output format of the generated image. Choose between `PIL.Image` or `np.array`.
return_dict (`bool`, *optional*, defaults to `True`): return_dict (`bool`, *optional*, defaults to `True`):
...@@ -1171,7 +1193,11 @@ class StableDiffusionXLControlNetPipeline( ...@@ -1171,7 +1193,11 @@ class StableDiffusionXLControlNetPipeline(
# 3.2 Encode ip_adapter_image # 3.2 Encode ip_adapter_image
if ip_adapter_image is not None or ip_adapter_image_embeds is not None: if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
image_embeds = self.prepare_ip_adapter_image_embeds( image_embeds = self.prepare_ip_adapter_image_embeds(
ip_adapter_image, ip_adapter_image_embeds, device, batch_size * num_images_per_prompt ip_adapter_image,
ip_adapter_image_embeds,
device,
batch_size * num_images_per_prompt,
self.do_classifier_free_guidance,
) )
# 4. Prepare image # 4. Prepare image
......
...@@ -537,7 +537,7 @@ class StableDiffusionXLControlNetImg2ImgPipeline( ...@@ -537,7 +537,7 @@ class StableDiffusionXLControlNetImg2ImgPipeline(
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds
def prepare_ip_adapter_image_embeds( def prepare_ip_adapter_image_embeds(
self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance
): ):
if ip_adapter_image_embeds is None: if ip_adapter_image_embeds is None:
if not isinstance(ip_adapter_image, list): if not isinstance(ip_adapter_image, list):
...@@ -561,13 +561,23 @@ class StableDiffusionXLControlNetImg2ImgPipeline( ...@@ -561,13 +561,23 @@ class StableDiffusionXLControlNetImg2ImgPipeline(
[single_negative_image_embeds] * num_images_per_prompt, dim=0 [single_negative_image_embeds] * num_images_per_prompt, dim=0
) )
if self.do_classifier_free_guidance: if do_classifier_free_guidance:
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds]) single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
single_image_embeds = single_image_embeds.to(device) single_image_embeds = single_image_embeds.to(device)
image_embeds.append(single_image_embeds) image_embeds.append(single_image_embeds)
else: else:
image_embeds = ip_adapter_image_embeds image_embeds = []
for single_image_embeds in ip_adapter_image_embeds:
if do_classifier_free_guidance:
single_negative_image_embeds, single_image_embeds = single_image_embeds.chunk(2)
single_negative_image_embeds = single_negative_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
else:
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
image_embeds.append(single_image_embeds)
return image_embeds return image_embeds
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_extra_step_kwargs # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_extra_step_kwargs
...@@ -779,6 +789,16 @@ class StableDiffusionXLControlNetImg2ImgPipeline( ...@@ -779,6 +789,16 @@ class StableDiffusionXLControlNetImg2ImgPipeline(
"Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined." "Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined."
) )
if ip_adapter_image_embeds is not None:
if not isinstance(ip_adapter_image_embeds, list):
raise ValueError(
f"`ip_adapter_image_embeds` has to be of type `list` but is {type(ip_adapter_image_embeds)}"
)
elif ip_adapter_image_embeds[0].ndim != 3:
raise ValueError(
f"`ip_adapter_image_embeds` has to be a list of 3D tensors but is {ip_adapter_image_embeds[0].ndim}D"
)
# Copied from diffusers.pipelines.controlnet.pipeline_controlnet_sd_xl.StableDiffusionXLControlNetPipeline.check_image # Copied from diffusers.pipelines.controlnet.pipeline_controlnet_sd_xl.StableDiffusionXLControlNetPipeline.check_image
def check_image(self, image, prompt, prompt_embeds): def check_image(self, image, prompt, prompt_embeds):
image_is_pil = isinstance(image, PIL.Image.Image) image_is_pil = isinstance(image, PIL.Image.Image)
...@@ -1149,8 +1169,10 @@ class StableDiffusionXLControlNetImg2ImgPipeline( ...@@ -1149,8 +1169,10 @@ class StableDiffusionXLControlNetImg2ImgPipeline(
input argument. input argument.
ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters. ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. If not Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
provided, embeddings are computed from the `ip_adapter_image` input argument. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should contain the negative image embedding
if `do_classifier_free_guidance` is set to `True`.
If not provided, embeddings are computed from the `ip_adapter_image` input argument.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generate image. Choose between The output format of the generate image. Choose between
[PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`. [PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`.
...@@ -1334,7 +1356,11 @@ class StableDiffusionXLControlNetImg2ImgPipeline( ...@@ -1334,7 +1356,11 @@ class StableDiffusionXLControlNetImg2ImgPipeline(
# 3.2 Encode ip_adapter_image # 3.2 Encode ip_adapter_image
if ip_adapter_image is not None or ip_adapter_image_embeds is not None: if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
image_embeds = self.prepare_ip_adapter_image_embeds( image_embeds = self.prepare_ip_adapter_image_embeds(
ip_adapter_image, ip_adapter_image_embeds, device, batch_size * num_images_per_prompt ip_adapter_image,
ip_adapter_image_embeds,
device,
batch_size * num_images_per_prompt,
self.do_classifier_free_guidance,
) )
# 4. Prepare image and controlnet_conditioning_image # 4. Prepare image and controlnet_conditioning_image
......
...@@ -423,7 +423,7 @@ class LatentConsistencyModelImg2ImgPipeline( ...@@ -423,7 +423,7 @@ class LatentConsistencyModelImg2ImgPipeline(
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds
def prepare_ip_adapter_image_embeds( def prepare_ip_adapter_image_embeds(
self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance
): ):
if ip_adapter_image_embeds is None: if ip_adapter_image_embeds is None:
if not isinstance(ip_adapter_image, list): if not isinstance(ip_adapter_image, list):
...@@ -447,13 +447,23 @@ class LatentConsistencyModelImg2ImgPipeline( ...@@ -447,13 +447,23 @@ class LatentConsistencyModelImg2ImgPipeline(
[single_negative_image_embeds] * num_images_per_prompt, dim=0 [single_negative_image_embeds] * num_images_per_prompt, dim=0
) )
if self.do_classifier_free_guidance: if do_classifier_free_guidance:
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds]) single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
single_image_embeds = single_image_embeds.to(device) single_image_embeds = single_image_embeds.to(device)
image_embeds.append(single_image_embeds) image_embeds.append(single_image_embeds)
else: else:
image_embeds = ip_adapter_image_embeds image_embeds = []
for single_image_embeds in ip_adapter_image_embeds:
if do_classifier_free_guidance:
single_negative_image_embeds, single_image_embeds = single_image_embeds.chunk(2)
single_negative_image_embeds = single_negative_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
else:
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
image_embeds.append(single_image_embeds)
return image_embeds return image_embeds
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.run_safety_checker # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.run_safety_checker
...@@ -632,6 +642,16 @@ class LatentConsistencyModelImg2ImgPipeline( ...@@ -632,6 +642,16 @@ class LatentConsistencyModelImg2ImgPipeline(
"Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined." "Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined."
) )
if ip_adapter_image_embeds is not None:
if not isinstance(ip_adapter_image_embeds, list):
raise ValueError(
f"`ip_adapter_image_embeds` has to be of type `list` but is {type(ip_adapter_image_embeds)}"
)
elif ip_adapter_image_embeds[0].ndim != 3:
raise ValueError(
f"`ip_adapter_image_embeds` has to be a list of 3D tensors but is {ip_adapter_image_embeds[0].ndim}D"
)
@property @property
def guidance_scale(self): def guidance_scale(self):
return self._guidance_scale return self._guidance_scale
...@@ -720,8 +740,10 @@ class LatentConsistencyModelImg2ImgPipeline( ...@@ -720,8 +740,10 @@ class LatentConsistencyModelImg2ImgPipeline(
ip_adapter_image: (`PipelineImageInput`, *optional*): ip_adapter_image: (`PipelineImageInput`, *optional*):
Optional image input to work with IP Adapters. Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. If not Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
provided, embeddings are computed from the `ip_adapter_image` input argument. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should contain the negative image embedding
if `do_classifier_free_guidance` is set to `True`.
If not provided, embeddings are computed from the `ip_adapter_image` input argument.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generated image. Choose between `PIL.Image` or `np.array`. The output format of the generated image. Choose between `PIL.Image` or `np.array`.
return_dict (`bool`, *optional*, defaults to `True`): return_dict (`bool`, *optional*, defaults to `True`):
...@@ -794,7 +816,11 @@ class LatentConsistencyModelImg2ImgPipeline( ...@@ -794,7 +816,11 @@ class LatentConsistencyModelImg2ImgPipeline(
if ip_adapter_image is not None or ip_adapter_image_embeds is not None: if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
image_embeds = self.prepare_ip_adapter_image_embeds( image_embeds = self.prepare_ip_adapter_image_embeds(
ip_adapter_image, ip_adapter_image_embeds, device, batch_size * num_images_per_prompt ip_adapter_image,
ip_adapter_image_embeds,
device,
batch_size * num_images_per_prompt,
self.do_classifier_free_guidance,
) )
# 3. Encode input prompt # 3. Encode input prompt
......
...@@ -407,7 +407,7 @@ class LatentConsistencyModelPipeline( ...@@ -407,7 +407,7 @@ class LatentConsistencyModelPipeline(
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds
def prepare_ip_adapter_image_embeds( def prepare_ip_adapter_image_embeds(
self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance
): ):
if ip_adapter_image_embeds is None: if ip_adapter_image_embeds is None:
if not isinstance(ip_adapter_image, list): if not isinstance(ip_adapter_image, list):
...@@ -431,13 +431,23 @@ class LatentConsistencyModelPipeline( ...@@ -431,13 +431,23 @@ class LatentConsistencyModelPipeline(
[single_negative_image_embeds] * num_images_per_prompt, dim=0 [single_negative_image_embeds] * num_images_per_prompt, dim=0
) )
if self.do_classifier_free_guidance: if do_classifier_free_guidance:
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds]) single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
single_image_embeds = single_image_embeds.to(device) single_image_embeds = single_image_embeds.to(device)
image_embeds.append(single_image_embeds) image_embeds.append(single_image_embeds)
else: else:
image_embeds = ip_adapter_image_embeds image_embeds = []
for single_image_embeds in ip_adapter_image_embeds:
if do_classifier_free_guidance:
single_negative_image_embeds, single_image_embeds = single_image_embeds.chunk(2)
single_negative_image_embeds = single_negative_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
else:
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
image_embeds.append(single_image_embeds)
return image_embeds return image_embeds
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.run_safety_checker # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.run_safety_checker
...@@ -527,6 +537,8 @@ class LatentConsistencyModelPipeline( ...@@ -527,6 +537,8 @@ class LatentConsistencyModelPipeline(
width: int, width: int,
callback_steps: int, callback_steps: int,
prompt_embeds: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.FloatTensor] = None,
ip_adapter_image=None,
ip_adapter_image_embeds=None,
callback_on_step_end_tensor_inputs=None, callback_on_step_end_tensor_inputs=None,
): ):
if height % 8 != 0 or width % 8 != 0: if height % 8 != 0 or width % 8 != 0:
...@@ -557,6 +569,21 @@ class LatentConsistencyModelPipeline( ...@@ -557,6 +569,21 @@ class LatentConsistencyModelPipeline(
elif prompt is not None and (not isinstance(prompt, str) and not isinstance(prompt, list)): elif prompt is not None and (not isinstance(prompt, str) and not isinstance(prompt, list)):
raise ValueError(f"`prompt` has to be of type `str` or `list` but is {type(prompt)}") raise ValueError(f"`prompt` has to be of type `str` or `list` but is {type(prompt)}")
if ip_adapter_image is not None and ip_adapter_image_embeds is not None:
raise ValueError(
"Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined."
)
if ip_adapter_image_embeds is not None:
if not isinstance(ip_adapter_image_embeds, list):
raise ValueError(
f"`ip_adapter_image_embeds` has to be of type `list` but is {type(ip_adapter_image_embeds)}"
)
elif ip_adapter_image_embeds[0].ndim != 3:
raise ValueError(
f"`ip_adapter_image_embeds` has to be a list of 3D tensors but is {ip_adapter_image_embeds[0].ndim}D"
)
@property @property
def guidance_scale(self): def guidance_scale(self):
return self._guidance_scale return self._guidance_scale
...@@ -645,8 +672,10 @@ class LatentConsistencyModelPipeline( ...@@ -645,8 +672,10 @@ class LatentConsistencyModelPipeline(
ip_adapter_image: (`PipelineImageInput`, *optional*): ip_adapter_image: (`PipelineImageInput`, *optional*):
Optional image input to work with IP Adapters. Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. If not Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
provided, embeddings are computed from the `ip_adapter_image` input argument. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should contain the negative image embedding
if `do_classifier_free_guidance` is set to `True`.
If not provided, embeddings are computed from the `ip_adapter_image` input argument.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generated image. Choose between `PIL.Image` or `np.array`. The output format of the generated image. Choose between `PIL.Image` or `np.array`.
return_dict (`bool`, *optional*, defaults to `True`): return_dict (`bool`, *optional*, defaults to `True`):
...@@ -699,7 +728,16 @@ class LatentConsistencyModelPipeline( ...@@ -699,7 +728,16 @@ class LatentConsistencyModelPipeline(
width = width or self.unet.config.sample_size * self.vae_scale_factor width = width or self.unet.config.sample_size * self.vae_scale_factor
# 1. Check inputs. Raise error if not correct # 1. Check inputs. Raise error if not correct
self.check_inputs(prompt, height, width, callback_steps, prompt_embeds, callback_on_step_end_tensor_inputs) self.check_inputs(
prompt,
height,
width,
callback_steps,
prompt_embeds,
ip_adapter_image,
ip_adapter_image_embeds,
callback_on_step_end_tensor_inputs,
)
self._guidance_scale = guidance_scale self._guidance_scale = guidance_scale
self._clip_skip = clip_skip self._clip_skip = clip_skip
self._cross_attention_kwargs = cross_attention_kwargs self._cross_attention_kwargs = cross_attention_kwargs
...@@ -716,7 +754,11 @@ class LatentConsistencyModelPipeline( ...@@ -716,7 +754,11 @@ class LatentConsistencyModelPipeline(
if ip_adapter_image is not None or ip_adapter_image_embeds is not None: if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
image_embeds = self.prepare_ip_adapter_image_embeds( image_embeds = self.prepare_ip_adapter_image_embeds(
ip_adapter_image, ip_adapter_image_embeds, device, batch_size * num_images_per_prompt ip_adapter_image,
ip_adapter_image_embeds,
device,
batch_size * num_images_per_prompt,
self.do_classifier_free_guidance,
) )
# 3. Encode input prompt # 3. Encode input prompt
......
...@@ -577,9 +577,19 @@ class PIAPipeline( ...@@ -577,9 +577,19 @@ class PIAPipeline(
"Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined." "Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined."
) )
if ip_adapter_image_embeds is not None:
if not isinstance(ip_adapter_image_embeds, list):
raise ValueError(
f"`ip_adapter_image_embeds` has to be of type `list` but is {type(ip_adapter_image_embeds)}"
)
elif ip_adapter_image_embeds[0].ndim != 3:
raise ValueError(
f"`ip_adapter_image_embeds` has to be a list of 3D tensors but is {ip_adapter_image_embeds[0].ndim}D"
)
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds
def prepare_ip_adapter_image_embeds( def prepare_ip_adapter_image_embeds(
self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance
): ):
if ip_adapter_image_embeds is None: if ip_adapter_image_embeds is None:
if not isinstance(ip_adapter_image, list): if not isinstance(ip_adapter_image, list):
...@@ -603,13 +613,23 @@ class PIAPipeline( ...@@ -603,13 +613,23 @@ class PIAPipeline(
[single_negative_image_embeds] * num_images_per_prompt, dim=0 [single_negative_image_embeds] * num_images_per_prompt, dim=0
) )
if self.do_classifier_free_guidance: if do_classifier_free_guidance:
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds]) single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
single_image_embeds = single_image_embeds.to(device) single_image_embeds = single_image_embeds.to(device)
image_embeds.append(single_image_embeds) image_embeds.append(single_image_embeds)
else: else:
image_embeds = ip_adapter_image_embeds image_embeds = []
for single_image_embeds in ip_adapter_image_embeds:
if do_classifier_free_guidance:
single_negative_image_embeds, single_image_embeds = single_image_embeds.chunk(2)
single_negative_image_embeds = single_negative_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
else:
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
image_embeds.append(single_image_embeds)
return image_embeds return image_embeds
# Copied from diffusers.pipelines.text_to_video_synthesis.pipeline_text_to_video_synth.TextToVideoSDPipeline.prepare_latents # Copied from diffusers.pipelines.text_to_video_synthesis.pipeline_text_to_video_synth.TextToVideoSDPipeline.prepare_latents
...@@ -798,8 +818,10 @@ class PIAPipeline( ...@@ -798,8 +818,10 @@ class PIAPipeline(
ip_adapter_image: (`PipelineImageInput`, *optional*): ip_adapter_image: (`PipelineImageInput`, *optional*):
Optional image input to work with IP Adapters. Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. If not Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
provided, embeddings are computed from the `ip_adapter_image` input argument. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should contain the negative image embedding
if `do_classifier_free_guidance` is set to `True`.
If not provided, embeddings are computed from the `ip_adapter_image` input argument.
motion_scale: (`int`, *optional*, defaults to 0): motion_scale: (`int`, *optional*, defaults to 0):
Parameter that controls the amount and type of motion that is added to the image. Increasing the value increases the amount of motion, while specific Parameter that controls the amount and type of motion that is added to the image. Increasing the value increases the amount of motion, while specific
ranges of values control the type of motion that is added. Must be between 0 and 8. ranges of values control the type of motion that is added. Must be between 0 and 8.
...@@ -891,7 +913,11 @@ class PIAPipeline( ...@@ -891,7 +913,11 @@ class PIAPipeline(
if ip_adapter_image is not None or ip_adapter_image_embeds is not None: if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
image_embeds = self.prepare_ip_adapter_image_embeds( image_embeds = self.prepare_ip_adapter_image_embeds(
ip_adapter_image, ip_adapter_image_embeds, device, batch_size * num_videos_per_prompt ip_adapter_image,
ip_adapter_image_embeds,
device,
batch_size * num_videos_per_prompt,
self.do_classifier_free_guidance,
) )
# 4. Prepare timesteps # 4. Prepare timesteps
......
...@@ -490,7 +490,7 @@ class StableDiffusionPipeline( ...@@ -490,7 +490,7 @@ class StableDiffusionPipeline(
return image_embeds, uncond_image_embeds return image_embeds, uncond_image_embeds
def prepare_ip_adapter_image_embeds( def prepare_ip_adapter_image_embeds(
self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance
): ):
if ip_adapter_image_embeds is None: if ip_adapter_image_embeds is None:
if not isinstance(ip_adapter_image, list): if not isinstance(ip_adapter_image, list):
...@@ -514,13 +514,23 @@ class StableDiffusionPipeline( ...@@ -514,13 +514,23 @@ class StableDiffusionPipeline(
[single_negative_image_embeds] * num_images_per_prompt, dim=0 [single_negative_image_embeds] * num_images_per_prompt, dim=0
) )
if self.do_classifier_free_guidance: if do_classifier_free_guidance:
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds]) single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
single_image_embeds = single_image_embeds.to(device) single_image_embeds = single_image_embeds.to(device)
image_embeds.append(single_image_embeds) image_embeds.append(single_image_embeds)
else: else:
image_embeds = ip_adapter_image_embeds image_embeds = []
for single_image_embeds in ip_adapter_image_embeds:
if do_classifier_free_guidance:
single_negative_image_embeds, single_image_embeds = single_image_embeds.chunk(2)
single_negative_image_embeds = single_negative_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
else:
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
image_embeds.append(single_image_embeds)
return image_embeds return image_embeds
def run_safety_checker(self, image, device, dtype): def run_safety_checker(self, image, device, dtype):
...@@ -624,6 +634,16 @@ class StableDiffusionPipeline( ...@@ -624,6 +634,16 @@ class StableDiffusionPipeline(
"Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined." "Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined."
) )
if ip_adapter_image_embeds is not None:
if not isinstance(ip_adapter_image_embeds, list):
raise ValueError(
f"`ip_adapter_image_embeds` has to be of type `list` but is {type(ip_adapter_image_embeds)}"
)
elif ip_adapter_image_embeds[0].ndim != 3:
raise ValueError(
f"`ip_adapter_image_embeds` has to be a list of 3D tensors but is {ip_adapter_image_embeds[0].ndim}D"
)
def prepare_latents(self, batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None): def prepare_latents(self, batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None):
shape = (batch_size, num_channels_latents, height // self.vae_scale_factor, width // self.vae_scale_factor) shape = (batch_size, num_channels_latents, height // self.vae_scale_factor, width // self.vae_scale_factor)
if isinstance(generator, list) and len(generator) != batch_size: if isinstance(generator, list) and len(generator) != batch_size:
...@@ -772,8 +792,10 @@ class StableDiffusionPipeline( ...@@ -772,8 +792,10 @@ class StableDiffusionPipeline(
not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument. not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument.
ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters. ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. If not Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
provided, embeddings are computed from the `ip_adapter_image` input argument. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should contain the negative image embedding
if `do_classifier_free_guidance` is set to `True`.
If not provided, embeddings are computed from the `ip_adapter_image` input argument.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generated image. Choose between `PIL.Image` or `np.array`. The output format of the generated image. Choose between `PIL.Image` or `np.array`.
return_dict (`bool`, *optional*, defaults to `True`): return_dict (`bool`, *optional*, defaults to `True`):
...@@ -885,7 +907,11 @@ class StableDiffusionPipeline( ...@@ -885,7 +907,11 @@ class StableDiffusionPipeline(
if ip_adapter_image is not None or ip_adapter_image_embeds is not None: if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
image_embeds = self.prepare_ip_adapter_image_embeds( image_embeds = self.prepare_ip_adapter_image_embeds(
ip_adapter_image, ip_adapter_image_embeds, device, batch_size * num_images_per_prompt ip_adapter_image,
ip_adapter_image_embeds,
device,
batch_size * num_images_per_prompt,
self.do_classifier_free_guidance,
) )
# 4. Prepare timesteps # 4. Prepare timesteps
......
...@@ -534,7 +534,7 @@ class StableDiffusionImg2ImgPipeline( ...@@ -534,7 +534,7 @@ class StableDiffusionImg2ImgPipeline(
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds
def prepare_ip_adapter_image_embeds( def prepare_ip_adapter_image_embeds(
self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance
): ):
if ip_adapter_image_embeds is None: if ip_adapter_image_embeds is None:
if not isinstance(ip_adapter_image, list): if not isinstance(ip_adapter_image, list):
...@@ -558,13 +558,23 @@ class StableDiffusionImg2ImgPipeline( ...@@ -558,13 +558,23 @@ class StableDiffusionImg2ImgPipeline(
[single_negative_image_embeds] * num_images_per_prompt, dim=0 [single_negative_image_embeds] * num_images_per_prompt, dim=0
) )
if self.do_classifier_free_guidance: if do_classifier_free_guidance:
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds]) single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
single_image_embeds = single_image_embeds.to(device) single_image_embeds = single_image_embeds.to(device)
image_embeds.append(single_image_embeds) image_embeds.append(single_image_embeds)
else: else:
image_embeds = ip_adapter_image_embeds image_embeds = []
for single_image_embeds in ip_adapter_image_embeds:
if do_classifier_free_guidance:
single_negative_image_embeds, single_image_embeds = single_image_embeds.chunk(2)
single_negative_image_embeds = single_negative_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
else:
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
image_embeds.append(single_image_embeds)
return image_embeds return image_embeds
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.run_safety_checker # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.run_safety_checker
...@@ -670,6 +680,16 @@ class StableDiffusionImg2ImgPipeline( ...@@ -670,6 +680,16 @@ class StableDiffusionImg2ImgPipeline(
"Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined." "Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined."
) )
if ip_adapter_image_embeds is not None:
if not isinstance(ip_adapter_image_embeds, list):
raise ValueError(
f"`ip_adapter_image_embeds` has to be of type `list` but is {type(ip_adapter_image_embeds)}"
)
elif ip_adapter_image_embeds[0].ndim != 3:
raise ValueError(
f"`ip_adapter_image_embeds` has to be a list of 3D tensors but is {ip_adapter_image_embeds[0].ndim}D"
)
def get_timesteps(self, num_inference_steps, strength, device): def get_timesteps(self, num_inference_steps, strength, device):
# get the original timestep using init_timestep # get the original timestep using init_timestep
init_timestep = min(int(num_inference_steps * strength), num_inference_steps) init_timestep = min(int(num_inference_steps * strength), num_inference_steps)
...@@ -868,8 +888,10 @@ class StableDiffusionImg2ImgPipeline( ...@@ -868,8 +888,10 @@ class StableDiffusionImg2ImgPipeline(
not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument. not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument.
ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters. ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. If not Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
provided, embeddings are computed from the `ip_adapter_image` input argument. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should contain the negative image embedding
if `do_classifier_free_guidance` is set to `True`.
If not provided, embeddings are computed from the `ip_adapter_image` input argument.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generated image. Choose between `PIL.Image` or `np.array`. The output format of the generated image. Choose between `PIL.Image` or `np.array`.
return_dict (`bool`, *optional*, defaults to `True`): return_dict (`bool`, *optional*, defaults to `True`):
...@@ -967,7 +989,11 @@ class StableDiffusionImg2ImgPipeline( ...@@ -967,7 +989,11 @@ class StableDiffusionImg2ImgPipeline(
if ip_adapter_image is not None or ip_adapter_image_embeds is not None: if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
image_embeds = self.prepare_ip_adapter_image_embeds( image_embeds = self.prepare_ip_adapter_image_embeds(
ip_adapter_image, ip_adapter_image_embeds, device, batch_size * num_images_per_prompt ip_adapter_image,
ip_adapter_image_embeds,
device,
batch_size * num_images_per_prompt,
self.do_classifier_free_guidance,
) )
# 4. Preprocess image # 4. Preprocess image
......
...@@ -606,7 +606,7 @@ class StableDiffusionInpaintPipeline( ...@@ -606,7 +606,7 @@ class StableDiffusionInpaintPipeline(
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds
def prepare_ip_adapter_image_embeds( def prepare_ip_adapter_image_embeds(
self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance
): ):
if ip_adapter_image_embeds is None: if ip_adapter_image_embeds is None:
if not isinstance(ip_adapter_image, list): if not isinstance(ip_adapter_image, list):
...@@ -630,13 +630,23 @@ class StableDiffusionInpaintPipeline( ...@@ -630,13 +630,23 @@ class StableDiffusionInpaintPipeline(
[single_negative_image_embeds] * num_images_per_prompt, dim=0 [single_negative_image_embeds] * num_images_per_prompt, dim=0
) )
if self.do_classifier_free_guidance: if do_classifier_free_guidance:
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds]) single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
single_image_embeds = single_image_embeds.to(device) single_image_embeds = single_image_embeds.to(device)
image_embeds.append(single_image_embeds) image_embeds.append(single_image_embeds)
else: else:
image_embeds = ip_adapter_image_embeds image_embeds = []
for single_image_embeds in ip_adapter_image_embeds:
if do_classifier_free_guidance:
single_negative_image_embeds, single_image_embeds = single_image_embeds.chunk(2)
single_negative_image_embeds = single_negative_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
else:
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
image_embeds.append(single_image_embeds)
return image_embeds return image_embeds
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.run_safety_checker # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.run_safety_checker
...@@ -752,6 +762,16 @@ class StableDiffusionInpaintPipeline( ...@@ -752,6 +762,16 @@ class StableDiffusionInpaintPipeline(
"Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined." "Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined."
) )
if ip_adapter_image_embeds is not None:
if not isinstance(ip_adapter_image_embeds, list):
raise ValueError(
f"`ip_adapter_image_embeds` has to be of type `list` but is {type(ip_adapter_image_embeds)}"
)
elif ip_adapter_image_embeds[0].ndim != 3:
raise ValueError(
f"`ip_adapter_image_embeds` has to be a list of 3D tensors but is {ip_adapter_image_embeds[0].ndim}D"
)
def prepare_latents( def prepare_latents(
self, self,
batch_size, batch_size,
...@@ -1037,8 +1057,10 @@ class StableDiffusionInpaintPipeline( ...@@ -1037,8 +1057,10 @@ class StableDiffusionInpaintPipeline(
not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument. not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument.
ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters. ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. If not Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
provided, embeddings are computed from the `ip_adapter_image` input argument. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should contain the negative image embedding
if `do_classifier_free_guidance` is set to `True`.
If not provided, embeddings are computed from the `ip_adapter_image` input argument.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generated image. Choose between `PIL.Image` or `np.array`. The output format of the generated image. Choose between `PIL.Image` or `np.array`.
return_dict (`bool`, *optional*, defaults to `True`): return_dict (`bool`, *optional*, defaults to `True`):
...@@ -1175,7 +1197,11 @@ class StableDiffusionInpaintPipeline( ...@@ -1175,7 +1197,11 @@ class StableDiffusionInpaintPipeline(
if ip_adapter_image is not None or ip_adapter_image_embeds is not None: if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
image_embeds = self.prepare_ip_adapter_image_embeds( image_embeds = self.prepare_ip_adapter_image_embeds(
ip_adapter_image, ip_adapter_image_embeds, device, batch_size * num_images_per_prompt ip_adapter_image,
ip_adapter_image_embeds,
device,
batch_size * num_images_per_prompt,
self.do_classifier_free_guidance,
) )
# 4. set timesteps # 4. set timesteps
......
...@@ -410,8 +410,9 @@ class StableDiffusionLDM3DPipeline( ...@@ -410,8 +410,9 @@ class StableDiffusionLDM3DPipeline(
return image_embeds, uncond_image_embeds return image_embeds, uncond_image_embeds
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds
def prepare_ip_adapter_image_embeds( def prepare_ip_adapter_image_embeds(
self, ip_adapter_image, ip_adapter_image_embeds, do_classifier_free_guidance, device, num_images_per_prompt self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance
): ):
if ip_adapter_image_embeds is None: if ip_adapter_image_embeds is None:
if not isinstance(ip_adapter_image, list): if not isinstance(ip_adapter_image, list):
...@@ -441,7 +442,17 @@ class StableDiffusionLDM3DPipeline( ...@@ -441,7 +442,17 @@ class StableDiffusionLDM3DPipeline(
image_embeds.append(single_image_embeds) image_embeds.append(single_image_embeds)
else: else:
image_embeds = ip_adapter_image_embeds image_embeds = []
for single_image_embeds in ip_adapter_image_embeds:
if do_classifier_free_guidance:
single_negative_image_embeds, single_image_embeds = single_image_embeds.chunk(2)
single_negative_image_embeds = single_negative_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
else:
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
image_embeds.append(single_image_embeds)
return image_embeds return image_embeds
def run_safety_checker(self, image, device, dtype): def run_safety_checker(self, image, device, dtype):
...@@ -537,6 +548,16 @@ class StableDiffusionLDM3DPipeline( ...@@ -537,6 +548,16 @@ class StableDiffusionLDM3DPipeline(
"Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined." "Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined."
) )
if ip_adapter_image_embeds is not None:
if not isinstance(ip_adapter_image_embeds, list):
raise ValueError(
f"`ip_adapter_image_embeds` has to be of type `list` but is {type(ip_adapter_image_embeds)}"
)
elif ip_adapter_image_embeds[0].ndim != 3:
raise ValueError(
f"`ip_adapter_image_embeds` has to be a list of 3D tensors but is {ip_adapter_image_embeds[0].ndim}D"
)
def prepare_latents(self, batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None): def prepare_latents(self, batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None):
shape = (batch_size, num_channels_latents, height // self.vae_scale_factor, width // self.vae_scale_factor) shape = (batch_size, num_channels_latents, height // self.vae_scale_factor, width // self.vae_scale_factor)
if isinstance(generator, list) and len(generator) != batch_size: if isinstance(generator, list) and len(generator) != batch_size:
...@@ -619,8 +640,10 @@ class StableDiffusionLDM3DPipeline( ...@@ -619,8 +640,10 @@ class StableDiffusionLDM3DPipeline(
ip_adapter_image: (`PipelineImageInput`, *optional*): ip_adapter_image: (`PipelineImageInput`, *optional*):
Optional image input to work with IP Adapters. Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. If not Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
provided, embeddings are computed from the `ip_adapter_image` input argument. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should contain the negative image embedding
if `do_classifier_free_guidance` is set to `True`.
If not provided, embeddings are computed from the `ip_adapter_image` input argument.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generated image. Choose between `PIL.Image` or `np.array`. The output format of the generated image. Choose between `PIL.Image` or `np.array`.
return_dict (`bool`, *optional*, defaults to `True`): return_dict (`bool`, *optional*, defaults to `True`):
...@@ -682,9 +705,9 @@ class StableDiffusionLDM3DPipeline( ...@@ -682,9 +705,9 @@ class StableDiffusionLDM3DPipeline(
image_embeds = self.prepare_ip_adapter_image_embeds( image_embeds = self.prepare_ip_adapter_image_embeds(
ip_adapter_image, ip_adapter_image,
ip_adapter_image_embeds, ip_adapter_image_embeds,
do_classifier_free_guidance,
device, device,
batch_size * num_images_per_prompt, batch_size * num_images_per_prompt,
do_classifier_free_guidance,
) )
# 3. Encode input prompt # 3. Encode input prompt
......
...@@ -384,7 +384,7 @@ class StableDiffusionPanoramaPipeline( ...@@ -384,7 +384,7 @@ class StableDiffusionPanoramaPipeline(
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds
def prepare_ip_adapter_image_embeds( def prepare_ip_adapter_image_embeds(
self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance
): ):
if ip_adapter_image_embeds is None: if ip_adapter_image_embeds is None:
if not isinstance(ip_adapter_image, list): if not isinstance(ip_adapter_image, list):
...@@ -408,13 +408,23 @@ class StableDiffusionPanoramaPipeline( ...@@ -408,13 +408,23 @@ class StableDiffusionPanoramaPipeline(
[single_negative_image_embeds] * num_images_per_prompt, dim=0 [single_negative_image_embeds] * num_images_per_prompt, dim=0
) )
if self.do_classifier_free_guidance: if do_classifier_free_guidance:
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds]) single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
single_image_embeds = single_image_embeds.to(device) single_image_embeds = single_image_embeds.to(device)
image_embeds.append(single_image_embeds) image_embeds.append(single_image_embeds)
else: else:
image_embeds = ip_adapter_image_embeds image_embeds = []
for single_image_embeds in ip_adapter_image_embeds:
if do_classifier_free_guidance:
single_negative_image_embeds, single_image_embeds = single_image_embeds.chunk(2)
single_negative_image_embeds = single_negative_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
else:
single_image_embeds = single_image_embeds.repeat(num_images_per_prompt, 1, 1)
image_embeds.append(single_image_embeds)
return image_embeds return image_embeds
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.run_safety_checker # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.run_safety_checker
...@@ -535,6 +545,16 @@ class StableDiffusionPanoramaPipeline( ...@@ -535,6 +545,16 @@ class StableDiffusionPanoramaPipeline(
"Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined." "Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined."
) )
if ip_adapter_image_embeds is not None:
if not isinstance(ip_adapter_image_embeds, list):
raise ValueError(
f"`ip_adapter_image_embeds` has to be of type `list` but is {type(ip_adapter_image_embeds)}"
)
elif ip_adapter_image_embeds[0].ndim != 3:
raise ValueError(
f"`ip_adapter_image_embeds` has to be a list of 3D tensors but is {ip_adapter_image_embeds[0].ndim}D"
)
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_latents # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_latents
def prepare_latents(self, batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None): def prepare_latents(self, batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None):
shape = (batch_size, num_channels_latents, height // self.vae_scale_factor, width // self.vae_scale_factor) shape = (batch_size, num_channels_latents, height // self.vae_scale_factor, width // self.vae_scale_factor)
...@@ -644,8 +664,10 @@ class StableDiffusionPanoramaPipeline( ...@@ -644,8 +664,10 @@ class StableDiffusionPanoramaPipeline(
ip_adapter_image: (`PipelineImageInput`, *optional*): ip_adapter_image: (`PipelineImageInput`, *optional*):
Optional image input to work with IP Adapters. Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. If not Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
provided, embeddings are computed from the `ip_adapter_image` input argument. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should contain the negative image embedding
if `do_classifier_free_guidance` is set to `True`.
If not provided, embeddings are computed from the `ip_adapter_image` input argument.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generated image. Choose between `PIL.Image` or `np.array`. The output format of the generated image. Choose between `PIL.Image` or `np.array`.
return_dict (`bool`, *optional*, defaults to `True`): return_dict (`bool`, *optional*, defaults to `True`):
...@@ -710,7 +732,11 @@ class StableDiffusionPanoramaPipeline( ...@@ -710,7 +732,11 @@ class StableDiffusionPanoramaPipeline(
if ip_adapter_image is not None or ip_adapter_image_embeds is not None: if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
image_embeds = self.prepare_ip_adapter_image_embeds( image_embeds = self.prepare_ip_adapter_image_embeds(
ip_adapter_image, ip_adapter_image_embeds, device, batch_size * num_images_per_prompt ip_adapter_image,
ip_adapter_image_embeds,
device,
batch_size * num_images_per_prompt,
self.do_classifier_free_guidance,
) )
# 3. Encode input prompt # 3. Encode input prompt
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment