Unverified Commit be4afa0b authored by Mark Van Aken's avatar Mark Van Aken Committed by GitHub
Browse files

#7535 Update FloatTensor type hints to Tensor (#7883)

* find & replace all FloatTensors to Tensor

* apply formatting

* Update torch.FloatTensor to torch.Tensor in the remaining files

* formatting

* Fix the rest of the places where FloatTensor is used as well as in documentation

* formatting

* Update new file from FloatTensor to Tensor
parent 04f4bd54
...@@ -123,15 +123,15 @@ class KandinskyV22Pipeline(DiffusionPipeline): ...@@ -123,15 +123,15 @@ class KandinskyV22Pipeline(DiffusionPipeline):
@replace_example_docstring(EXAMPLE_DOC_STRING) @replace_example_docstring(EXAMPLE_DOC_STRING)
def __call__( def __call__(
self, self,
image_embeds: Union[torch.FloatTensor, List[torch.FloatTensor]], image_embeds: Union[torch.Tensor, List[torch.Tensor]],
negative_image_embeds: Union[torch.FloatTensor, List[torch.FloatTensor]], negative_image_embeds: Union[torch.Tensor, List[torch.Tensor]],
height: int = 512, height: int = 512,
width: int = 512, width: int = 512,
num_inference_steps: int = 100, num_inference_steps: int = 100,
guidance_scale: float = 4.0, guidance_scale: float = 4.0,
num_images_per_prompt: int = 1, num_images_per_prompt: int = 1,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None, latents: Optional[torch.Tensor] = None,
output_type: Optional[str] = "pil", output_type: Optional[str] = "pil",
return_dict: bool = True, return_dict: bool = True,
callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None, callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None,
...@@ -142,9 +142,9 @@ class KandinskyV22Pipeline(DiffusionPipeline): ...@@ -142,9 +142,9 @@ class KandinskyV22Pipeline(DiffusionPipeline):
Function invoked when calling the pipeline for generation. Function invoked when calling the pipeline for generation.
Args: Args:
image_embeds (`torch.FloatTensor` or `List[torch.FloatTensor]`): image_embeds (`torch.Tensor` or `List[torch.Tensor]`):
The clip image embeddings for text prompt, that will be used to condition the image generation. The clip image embeddings for text prompt, that will be used to condition the image generation.
negative_image_embeds (`torch.FloatTensor` or `List[torch.FloatTensor]`): negative_image_embeds (`torch.Tensor` or `List[torch.Tensor]`):
The clip image embeddings for negative text prompt, will be used to condition the image generation. The clip image embeddings for negative text prompt, will be used to condition the image generation.
height (`int`, *optional*, defaults to 512): height (`int`, *optional*, defaults to 512):
The height in pixels of the generated image. The height in pixels of the generated image.
...@@ -164,7 +164,7 @@ class KandinskyV22Pipeline(DiffusionPipeline): ...@@ -164,7 +164,7 @@ class KandinskyV22Pipeline(DiffusionPipeline):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*): generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
to make generation deterministic. to make generation deterministic.
latents (`torch.FloatTensor`, *optional*): latents (`torch.Tensor`, *optional*):
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor will ge generated by sampling using the supplied random `generator`. tensor will ge generated by sampling using the supplied random `generator`.
......
...@@ -213,9 +213,9 @@ class KandinskyV22CombinedPipeline(DiffusionPipeline): ...@@ -213,9 +213,9 @@ class KandinskyV22CombinedPipeline(DiffusionPipeline):
prior_guidance_scale: float = 4.0, prior_guidance_scale: float = 4.0,
prior_num_inference_steps: int = 25, prior_num_inference_steps: int = 25,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None, latents: Optional[torch.Tensor] = None,
output_type: Optional[str] = "pil", output_type: Optional[str] = "pil",
callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None, callback: Optional[Callable[[int, int, torch.Tensor], None]] = None,
callback_steps: int = 1, callback_steps: int = 1,
return_dict: bool = True, return_dict: bool = True,
prior_callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None, prior_callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None,
...@@ -259,7 +259,7 @@ class KandinskyV22CombinedPipeline(DiffusionPipeline): ...@@ -259,7 +259,7 @@ class KandinskyV22CombinedPipeline(DiffusionPipeline):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*): generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
to make generation deterministic. to make generation deterministic.
latents (`torch.FloatTensor`, *optional*): latents (`torch.Tensor`, *optional*):
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor will ge generated by sampling using the supplied random `generator`. tensor will ge generated by sampling using the supplied random `generator`.
...@@ -442,7 +442,7 @@ class KandinskyV22Img2ImgCombinedPipeline(DiffusionPipeline): ...@@ -442,7 +442,7 @@ class KandinskyV22Img2ImgCombinedPipeline(DiffusionPipeline):
def __call__( def __call__(
self, self,
prompt: Union[str, List[str]], prompt: Union[str, List[str]],
image: Union[torch.FloatTensor, PIL.Image.Image, List[torch.FloatTensor], List[PIL.Image.Image]], image: Union[torch.Tensor, PIL.Image.Image, List[torch.Tensor], List[PIL.Image.Image]],
negative_prompt: Optional[Union[str, List[str]]] = None, negative_prompt: Optional[Union[str, List[str]]] = None,
num_inference_steps: int = 100, num_inference_steps: int = 100,
guidance_scale: float = 4.0, guidance_scale: float = 4.0,
...@@ -453,9 +453,9 @@ class KandinskyV22Img2ImgCombinedPipeline(DiffusionPipeline): ...@@ -453,9 +453,9 @@ class KandinskyV22Img2ImgCombinedPipeline(DiffusionPipeline):
prior_guidance_scale: float = 4.0, prior_guidance_scale: float = 4.0,
prior_num_inference_steps: int = 25, prior_num_inference_steps: int = 25,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None, latents: Optional[torch.Tensor] = None,
output_type: Optional[str] = "pil", output_type: Optional[str] = "pil",
callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None, callback: Optional[Callable[[int, int, torch.Tensor], None]] = None,
callback_steps: int = 1, callback_steps: int = 1,
return_dict: bool = True, return_dict: bool = True,
prior_callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None, prior_callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None,
...@@ -469,7 +469,7 @@ class KandinskyV22Img2ImgCombinedPipeline(DiffusionPipeline): ...@@ -469,7 +469,7 @@ class KandinskyV22Img2ImgCombinedPipeline(DiffusionPipeline):
Args: Args:
prompt (`str` or `List[str]`): prompt (`str` or `List[str]`):
The prompt or prompts to guide the image generation. The prompt or prompts to guide the image generation.
image (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`): image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`):
`Image`, or tensor representing an image batch, that will be used as the starting point for the `Image`, or tensor representing an image batch, that will be used as the starting point for the
process. Can also accept image latents as `image`, if passing latents directly, it will not be encoded process. Can also accept image latents as `image`, if passing latents directly, it will not be encoded
again. again.
...@@ -509,7 +509,7 @@ class KandinskyV22Img2ImgCombinedPipeline(DiffusionPipeline): ...@@ -509,7 +509,7 @@ class KandinskyV22Img2ImgCombinedPipeline(DiffusionPipeline):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*): generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
to make generation deterministic. to make generation deterministic.
latents (`torch.FloatTensor`, *optional*): latents (`torch.Tensor`, *optional*):
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor will ge generated by sampling using the supplied random `generator`. tensor will ge generated by sampling using the supplied random `generator`.
...@@ -518,7 +518,7 @@ class KandinskyV22Img2ImgCombinedPipeline(DiffusionPipeline): ...@@ -518,7 +518,7 @@ class KandinskyV22Img2ImgCombinedPipeline(DiffusionPipeline):
(`np.array`) or `"pt"` (`torch.Tensor`). (`np.array`) or `"pt"` (`torch.Tensor`).
callback (`Callable`, *optional*): callback (`Callable`, *optional*):
A function that calls every `callback_steps` steps during inference. The function is called with the A function that calls every `callback_steps` steps during inference. The function is called with the
following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`. following arguments: `callback(step: int, timestep: int, latents: torch.Tensor)`.
callback_steps (`int`, *optional*, defaults to 1): callback_steps (`int`, *optional*, defaults to 1):
The frequency at which the `callback` function is called. If not specified, the callback is called at The frequency at which the `callback` function is called. If not specified, the callback is called at
every step. every step.
...@@ -681,8 +681,8 @@ class KandinskyV22InpaintCombinedPipeline(DiffusionPipeline): ...@@ -681,8 +681,8 @@ class KandinskyV22InpaintCombinedPipeline(DiffusionPipeline):
def __call__( def __call__(
self, self,
prompt: Union[str, List[str]], prompt: Union[str, List[str]],
image: Union[torch.FloatTensor, PIL.Image.Image, List[torch.FloatTensor], List[PIL.Image.Image]], image: Union[torch.Tensor, PIL.Image.Image, List[torch.Tensor], List[PIL.Image.Image]],
mask_image: Union[torch.FloatTensor, PIL.Image.Image, List[torch.FloatTensor], List[PIL.Image.Image]], mask_image: Union[torch.Tensor, PIL.Image.Image, List[torch.Tensor], List[PIL.Image.Image]],
negative_prompt: Optional[Union[str, List[str]]] = None, negative_prompt: Optional[Union[str, List[str]]] = None,
num_inference_steps: int = 100, num_inference_steps: int = 100,
guidance_scale: float = 4.0, guidance_scale: float = 4.0,
...@@ -692,7 +692,7 @@ class KandinskyV22InpaintCombinedPipeline(DiffusionPipeline): ...@@ -692,7 +692,7 @@ class KandinskyV22InpaintCombinedPipeline(DiffusionPipeline):
prior_guidance_scale: float = 4.0, prior_guidance_scale: float = 4.0,
prior_num_inference_steps: int = 25, prior_num_inference_steps: int = 25,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None, latents: Optional[torch.Tensor] = None,
output_type: Optional[str] = "pil", output_type: Optional[str] = "pil",
return_dict: bool = True, return_dict: bool = True,
prior_callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None, prior_callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None,
...@@ -707,7 +707,7 @@ class KandinskyV22InpaintCombinedPipeline(DiffusionPipeline): ...@@ -707,7 +707,7 @@ class KandinskyV22InpaintCombinedPipeline(DiffusionPipeline):
Args: Args:
prompt (`str` or `List[str]`): prompt (`str` or `List[str]`):
The prompt or prompts to guide the image generation. The prompt or prompts to guide the image generation.
image (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`): image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`):
`Image`, or tensor representing an image batch, that will be used as the starting point for the `Image`, or tensor representing an image batch, that will be used as the starting point for the
process. Can also accept image latents as `image`, if passing latents directly, it will not be encoded process. Can also accept image latents as `image`, if passing latents directly, it will not be encoded
again. again.
...@@ -746,7 +746,7 @@ class KandinskyV22InpaintCombinedPipeline(DiffusionPipeline): ...@@ -746,7 +746,7 @@ class KandinskyV22InpaintCombinedPipeline(DiffusionPipeline):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*): generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
to make generation deterministic. to make generation deterministic.
latents (`torch.FloatTensor`, *optional*): latents (`torch.Tensor`, *optional*):
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor will ge generated by sampling using the supplied random `generator`. tensor will ge generated by sampling using the supplied random `generator`.
......
...@@ -151,18 +151,18 @@ class KandinskyV22ControlnetPipeline(DiffusionPipeline): ...@@ -151,18 +151,18 @@ class KandinskyV22ControlnetPipeline(DiffusionPipeline):
@torch.no_grad() @torch.no_grad()
def __call__( def __call__(
self, self,
image_embeds: Union[torch.FloatTensor, List[torch.FloatTensor]], image_embeds: Union[torch.Tensor, List[torch.Tensor]],
negative_image_embeds: Union[torch.FloatTensor, List[torch.FloatTensor]], negative_image_embeds: Union[torch.Tensor, List[torch.Tensor]],
hint: torch.FloatTensor, hint: torch.Tensor,
height: int = 512, height: int = 512,
width: int = 512, width: int = 512,
num_inference_steps: int = 100, num_inference_steps: int = 100,
guidance_scale: float = 4.0, guidance_scale: float = 4.0,
num_images_per_prompt: int = 1, num_images_per_prompt: int = 1,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None, latents: Optional[torch.Tensor] = None,
output_type: Optional[str] = "pil", output_type: Optional[str] = "pil",
callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None, callback: Optional[Callable[[int, int, torch.Tensor], None]] = None,
callback_steps: int = 1, callback_steps: int = 1,
return_dict: bool = True, return_dict: bool = True,
): ):
...@@ -172,11 +172,11 @@ class KandinskyV22ControlnetPipeline(DiffusionPipeline): ...@@ -172,11 +172,11 @@ class KandinskyV22ControlnetPipeline(DiffusionPipeline):
Args: Args:
prompt (`str` or `List[str]`): prompt (`str` or `List[str]`):
The prompt or prompts to guide the image generation. The prompt or prompts to guide the image generation.
hint (`torch.FloatTensor`): hint (`torch.Tensor`):
The controlnet condition. The controlnet condition.
image_embeds (`torch.FloatTensor` or `List[torch.FloatTensor]`): image_embeds (`torch.Tensor` or `List[torch.Tensor]`):
The clip image embeddings for text prompt, that will be used to condition the image generation. The clip image embeddings for text prompt, that will be used to condition the image generation.
negative_image_embeds (`torch.FloatTensor` or `List[torch.FloatTensor]`): negative_image_embeds (`torch.Tensor` or `List[torch.Tensor]`):
The clip image embeddings for negative text prompt, will be used to condition the image generation. The clip image embeddings for negative text prompt, will be used to condition the image generation.
negative_prompt (`str` or `List[str]`, *optional*): negative_prompt (`str` or `List[str]`, *optional*):
The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored
...@@ -199,7 +199,7 @@ class KandinskyV22ControlnetPipeline(DiffusionPipeline): ...@@ -199,7 +199,7 @@ class KandinskyV22ControlnetPipeline(DiffusionPipeline):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*): generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
to make generation deterministic. to make generation deterministic.
latents (`torch.FloatTensor`, *optional*): latents (`torch.Tensor`, *optional*):
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor will ge generated by sampling using the supplied random `generator`. tensor will ge generated by sampling using the supplied random `generator`.
...@@ -208,7 +208,7 @@ class KandinskyV22ControlnetPipeline(DiffusionPipeline): ...@@ -208,7 +208,7 @@ class KandinskyV22ControlnetPipeline(DiffusionPipeline):
(`np.array`) or `"pt"` (`torch.Tensor`). (`np.array`) or `"pt"` (`torch.Tensor`).
callback (`Callable`, *optional*): callback (`Callable`, *optional*):
A function that calls every `callback_steps` steps during inference. The function is called with the A function that calls every `callback_steps` steps during inference. The function is called with the
following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`. following arguments: `callback(step: int, timestep: int, latents: torch.Tensor)`.
callback_steps (`int`, *optional*, defaults to 1): callback_steps (`int`, *optional*, defaults to 1):
The frequency at which the `callback` function is called. If not specified, the callback is called at The frequency at which the `callback` function is called. If not specified, the callback is called at
every step. every step.
......
...@@ -206,10 +206,10 @@ class KandinskyV22ControlnetImg2ImgPipeline(DiffusionPipeline): ...@@ -206,10 +206,10 @@ class KandinskyV22ControlnetImg2ImgPipeline(DiffusionPipeline):
@torch.no_grad() @torch.no_grad()
def __call__( def __call__(
self, self,
image_embeds: Union[torch.FloatTensor, List[torch.FloatTensor]], image_embeds: Union[torch.Tensor, List[torch.Tensor]],
image: Union[torch.FloatTensor, PIL.Image.Image, List[torch.FloatTensor], List[PIL.Image.Image]], image: Union[torch.Tensor, PIL.Image.Image, List[torch.Tensor], List[PIL.Image.Image]],
negative_image_embeds: Union[torch.FloatTensor, List[torch.FloatTensor]], negative_image_embeds: Union[torch.Tensor, List[torch.Tensor]],
hint: torch.FloatTensor, hint: torch.Tensor,
height: int = 512, height: int = 512,
width: int = 512, width: int = 512,
num_inference_steps: int = 100, num_inference_steps: int = 100,
...@@ -218,7 +218,7 @@ class KandinskyV22ControlnetImg2ImgPipeline(DiffusionPipeline): ...@@ -218,7 +218,7 @@ class KandinskyV22ControlnetImg2ImgPipeline(DiffusionPipeline):
num_images_per_prompt: int = 1, num_images_per_prompt: int = 1,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
output_type: Optional[str] = "pil", output_type: Optional[str] = "pil",
callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None, callback: Optional[Callable[[int, int, torch.Tensor], None]] = None,
callback_steps: int = 1, callback_steps: int = 1,
return_dict: bool = True, return_dict: bool = True,
): ):
...@@ -226,9 +226,9 @@ class KandinskyV22ControlnetImg2ImgPipeline(DiffusionPipeline): ...@@ -226,9 +226,9 @@ class KandinskyV22ControlnetImg2ImgPipeline(DiffusionPipeline):
Function invoked when calling the pipeline for generation. Function invoked when calling the pipeline for generation.
Args: Args:
image_embeds (`torch.FloatTensor` or `List[torch.FloatTensor]`): image_embeds (`torch.Tensor` or `List[torch.Tensor]`):
The clip image embeddings for text prompt, that will be used to condition the image generation. The clip image embeddings for text prompt, that will be used to condition the image generation.
image (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`): image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`):
`Image`, or tensor representing an image batch, that will be used as the starting point for the `Image`, or tensor representing an image batch, that will be used as the starting point for the
process. Can also accept image latents as `image`, if passing latents directly, it will not be encoded process. Can also accept image latents as `image`, if passing latents directly, it will not be encoded
again. again.
...@@ -238,9 +238,9 @@ class KandinskyV22ControlnetImg2ImgPipeline(DiffusionPipeline): ...@@ -238,9 +238,9 @@ class KandinskyV22ControlnetImg2ImgPipeline(DiffusionPipeline):
denoising steps depends on the amount of noise initially added. When `strength` is 1, added noise will denoising steps depends on the amount of noise initially added. When `strength` is 1, added noise will
be maximum and the denoising process will run for the full number of iterations specified in be maximum and the denoising process will run for the full number of iterations specified in
`num_inference_steps`. A value of 1, therefore, essentially ignores `image`. `num_inference_steps`. A value of 1, therefore, essentially ignores `image`.
hint (`torch.FloatTensor`): hint (`torch.Tensor`):
The controlnet condition. The controlnet condition.
negative_image_embeds (`torch.FloatTensor` or `List[torch.FloatTensor]`): negative_image_embeds (`torch.Tensor` or `List[torch.Tensor]`):
The clip image embeddings for negative text prompt, will be used to condition the image generation. The clip image embeddings for negative text prompt, will be used to condition the image generation.
height (`int`, *optional*, defaults to 512): height (`int`, *optional*, defaults to 512):
The height in pixels of the generated image. The height in pixels of the generated image.
...@@ -265,7 +265,7 @@ class KandinskyV22ControlnetImg2ImgPipeline(DiffusionPipeline): ...@@ -265,7 +265,7 @@ class KandinskyV22ControlnetImg2ImgPipeline(DiffusionPipeline):
(`np.array`) or `"pt"` (`torch.Tensor`). (`np.array`) or `"pt"` (`torch.Tensor`).
callback (`Callable`, *optional*): callback (`Callable`, *optional*):
A function that calls every `callback_steps` steps during inference. The function is called with the A function that calls every `callback_steps` steps during inference. The function is called with the
following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`. following arguments: `callback(step: int, timestep: int, latents: torch.Tensor)`.
callback_steps (`int`, *optional*, defaults to 1): callback_steps (`int`, *optional*, defaults to 1):
The frequency at which the `callback` function is called. If not specified, the callback is called at The frequency at which the `callback` function is called. If not specified, the callback is called at
every step. every step.
......
...@@ -190,9 +190,9 @@ class KandinskyV22Img2ImgPipeline(DiffusionPipeline): ...@@ -190,9 +190,9 @@ class KandinskyV22Img2ImgPipeline(DiffusionPipeline):
@torch.no_grad() @torch.no_grad()
def __call__( def __call__(
self, self,
image_embeds: Union[torch.FloatTensor, List[torch.FloatTensor]], image_embeds: Union[torch.Tensor, List[torch.Tensor]],
image: Union[torch.FloatTensor, PIL.Image.Image, List[torch.FloatTensor], List[PIL.Image.Image]], image: Union[torch.Tensor, PIL.Image.Image, List[torch.Tensor], List[PIL.Image.Image]],
negative_image_embeds: Union[torch.FloatTensor, List[torch.FloatTensor]], negative_image_embeds: Union[torch.Tensor, List[torch.Tensor]],
height: int = 512, height: int = 512,
width: int = 512, width: int = 512,
num_inference_steps: int = 100, num_inference_steps: int = 100,
...@@ -210,9 +210,9 @@ class KandinskyV22Img2ImgPipeline(DiffusionPipeline): ...@@ -210,9 +210,9 @@ class KandinskyV22Img2ImgPipeline(DiffusionPipeline):
Function invoked when calling the pipeline for generation. Function invoked when calling the pipeline for generation.
Args: Args:
image_embeds (`torch.FloatTensor` or `List[torch.FloatTensor]`): image_embeds (`torch.Tensor` or `List[torch.Tensor]`):
The clip image embeddings for text prompt, that will be used to condition the image generation. The clip image embeddings for text prompt, that will be used to condition the image generation.
image (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`): image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`):
`Image`, or tensor representing an image batch, that will be used as the starting point for the `Image`, or tensor representing an image batch, that will be used as the starting point for the
process. Can also accept image latents as `image`, if passing latents directly, it will not be encoded process. Can also accept image latents as `image`, if passing latents directly, it will not be encoded
again. again.
...@@ -222,7 +222,7 @@ class KandinskyV22Img2ImgPipeline(DiffusionPipeline): ...@@ -222,7 +222,7 @@ class KandinskyV22Img2ImgPipeline(DiffusionPipeline):
denoising steps depends on the amount of noise initially added. When `strength` is 1, added noise will denoising steps depends on the amount of noise initially added. When `strength` is 1, added noise will
be maximum and the denoising process will run for the full number of iterations specified in be maximum and the denoising process will run for the full number of iterations specified in
`num_inference_steps`. A value of 1, therefore, essentially ignores `image`. `num_inference_steps`. A value of 1, therefore, essentially ignores `image`.
negative_image_embeds (`torch.FloatTensor` or `List[torch.FloatTensor]`): negative_image_embeds (`torch.Tensor` or `List[torch.Tensor]`):
The clip image embeddings for negative text prompt, will be used to condition the image generation. The clip image embeddings for negative text prompt, will be used to condition the image generation.
height (`int`, *optional*, defaults to 512): height (`int`, *optional*, defaults to 512):
The height in pixels of the generated image. The height in pixels of the generated image.
......
...@@ -294,17 +294,17 @@ class KandinskyV22InpaintPipeline(DiffusionPipeline): ...@@ -294,17 +294,17 @@ class KandinskyV22InpaintPipeline(DiffusionPipeline):
@torch.no_grad() @torch.no_grad()
def __call__( def __call__(
self, self,
image_embeds: Union[torch.FloatTensor, List[torch.FloatTensor]], image_embeds: Union[torch.Tensor, List[torch.Tensor]],
image: Union[torch.FloatTensor, PIL.Image.Image], image: Union[torch.Tensor, PIL.Image.Image],
mask_image: Union[torch.FloatTensor, PIL.Image.Image, np.ndarray], mask_image: Union[torch.Tensor, PIL.Image.Image, np.ndarray],
negative_image_embeds: Union[torch.FloatTensor, List[torch.FloatTensor]], negative_image_embeds: Union[torch.Tensor, List[torch.Tensor]],
height: int = 512, height: int = 512,
width: int = 512, width: int = 512,
num_inference_steps: int = 100, num_inference_steps: int = 100,
guidance_scale: float = 4.0, guidance_scale: float = 4.0,
num_images_per_prompt: int = 1, num_images_per_prompt: int = 1,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None, latents: Optional[torch.Tensor] = None,
output_type: Optional[str] = "pil", output_type: Optional[str] = "pil",
return_dict: bool = True, return_dict: bool = True,
callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None, callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None,
...@@ -315,7 +315,7 @@ class KandinskyV22InpaintPipeline(DiffusionPipeline): ...@@ -315,7 +315,7 @@ class KandinskyV22InpaintPipeline(DiffusionPipeline):
Function invoked when calling the pipeline for generation. Function invoked when calling the pipeline for generation.
Args: Args:
image_embeds (`torch.FloatTensor` or `List[torch.FloatTensor]`): image_embeds (`torch.Tensor` or `List[torch.Tensor]`):
The clip image embeddings for text prompt, that will be used to condition the image generation. The clip image embeddings for text prompt, that will be used to condition the image generation.
image (`PIL.Image.Image`): image (`PIL.Image.Image`):
`Image`, or tensor representing an image batch which will be inpainted, *i.e.* parts of the image will `Image`, or tensor representing an image batch which will be inpainted, *i.e.* parts of the image will
...@@ -325,7 +325,7 @@ class KandinskyV22InpaintPipeline(DiffusionPipeline): ...@@ -325,7 +325,7 @@ class KandinskyV22InpaintPipeline(DiffusionPipeline):
black pixels will be preserved. If `mask_image` is a PIL image, it will be converted to a single black pixels will be preserved. If `mask_image` is a PIL image, it will be converted to a single
channel (luminance) before use. If it's a tensor, it should contain one color channel (L) instead of 3, channel (luminance) before use. If it's a tensor, it should contain one color channel (L) instead of 3,
so the expected shape would be `(B, H, W, 1)`. so the expected shape would be `(B, H, W, 1)`.
negative_image_embeds (`torch.FloatTensor` or `List[torch.FloatTensor]`): negative_image_embeds (`torch.Tensor` or `List[torch.Tensor]`):
The clip image embeddings for negative text prompt, will be used to condition the image generation. The clip image embeddings for negative text prompt, will be used to condition the image generation.
height (`int`, *optional*, defaults to 512): height (`int`, *optional*, defaults to 512):
The height in pixels of the generated image. The height in pixels of the generated image.
...@@ -345,7 +345,7 @@ class KandinskyV22InpaintPipeline(DiffusionPipeline): ...@@ -345,7 +345,7 @@ class KandinskyV22InpaintPipeline(DiffusionPipeline):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*): generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
to make generation deterministic. to make generation deterministic.
latents (`torch.FloatTensor`, *optional*): latents (`torch.Tensor`, *optional*):
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor will ge generated by sampling using the supplied random `generator`. tensor will ge generated by sampling using the supplied random `generator`.
......
...@@ -132,12 +132,12 @@ class KandinskyV22PriorPipeline(DiffusionPipeline): ...@@ -132,12 +132,12 @@ class KandinskyV22PriorPipeline(DiffusionPipeline):
@replace_example_docstring(EXAMPLE_INTERPOLATE_DOC_STRING) @replace_example_docstring(EXAMPLE_INTERPOLATE_DOC_STRING)
def interpolate( def interpolate(
self, self,
images_and_prompts: List[Union[str, PIL.Image.Image, torch.FloatTensor]], images_and_prompts: List[Union[str, PIL.Image.Image, torch.Tensor]],
weights: List[float], weights: List[float],
num_images_per_prompt: int = 1, num_images_per_prompt: int = 1,
num_inference_steps: int = 25, num_inference_steps: int = 25,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None, latents: Optional[torch.Tensor] = None,
negative_prior_prompt: Optional[str] = None, negative_prior_prompt: Optional[str] = None,
negative_prompt: str = "", negative_prompt: str = "",
guidance_scale: float = 4.0, guidance_scale: float = 4.0,
...@@ -147,7 +147,7 @@ class KandinskyV22PriorPipeline(DiffusionPipeline): ...@@ -147,7 +147,7 @@ class KandinskyV22PriorPipeline(DiffusionPipeline):
Function invoked when using the prior pipeline for interpolation. Function invoked when using the prior pipeline for interpolation.
Args: Args:
images_and_prompts (`List[Union[str, PIL.Image.Image, torch.FloatTensor]]`): images_and_prompts (`List[Union[str, PIL.Image.Image, torch.Tensor]]`):
list of prompts and images to guide the image generation. list of prompts and images to guide the image generation.
weights: (`List[float]`): weights: (`List[float]`):
list of weights for each condition in `images_and_prompts` list of weights for each condition in `images_and_prompts`
...@@ -159,7 +159,7 @@ class KandinskyV22PriorPipeline(DiffusionPipeline): ...@@ -159,7 +159,7 @@ class KandinskyV22PriorPipeline(DiffusionPipeline):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*): generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
to make generation deterministic. to make generation deterministic.
latents (`torch.FloatTensor`, *optional*): latents (`torch.Tensor`, *optional*):
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor will ge generated by sampling using the supplied random `generator`. tensor will ge generated by sampling using the supplied random `generator`.
...@@ -376,7 +376,7 @@ class KandinskyV22PriorPipeline(DiffusionPipeline): ...@@ -376,7 +376,7 @@ class KandinskyV22PriorPipeline(DiffusionPipeline):
num_images_per_prompt: int = 1, num_images_per_prompt: int = 1,
num_inference_steps: int = 25, num_inference_steps: int = 25,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None, latents: Optional[torch.Tensor] = None,
guidance_scale: float = 4.0, guidance_scale: float = 4.0,
output_type: Optional[str] = "pt", # pt only output_type: Optional[str] = "pt", # pt only
return_dict: bool = True, return_dict: bool = True,
...@@ -400,7 +400,7 @@ class KandinskyV22PriorPipeline(DiffusionPipeline): ...@@ -400,7 +400,7 @@ class KandinskyV22PriorPipeline(DiffusionPipeline):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*): generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
to make generation deterministic. to make generation deterministic.
latents (`torch.FloatTensor`, *optional*): latents (`torch.Tensor`, *optional*):
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor will ge generated by sampling using the supplied random `generator`. tensor will ge generated by sampling using the supplied random `generator`.
......
...@@ -156,12 +156,12 @@ class KandinskyV22PriorEmb2EmbPipeline(DiffusionPipeline): ...@@ -156,12 +156,12 @@ class KandinskyV22PriorEmb2EmbPipeline(DiffusionPipeline):
@replace_example_docstring(EXAMPLE_INTERPOLATE_DOC_STRING) @replace_example_docstring(EXAMPLE_INTERPOLATE_DOC_STRING)
def interpolate( def interpolate(
self, self,
images_and_prompts: List[Union[str, PIL.Image.Image, torch.FloatTensor]], images_and_prompts: List[Union[str, PIL.Image.Image, torch.Tensor]],
weights: List[float], weights: List[float],
num_images_per_prompt: int = 1, num_images_per_prompt: int = 1,
num_inference_steps: int = 25, num_inference_steps: int = 25,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None, latents: Optional[torch.Tensor] = None,
negative_prior_prompt: Optional[str] = None, negative_prior_prompt: Optional[str] = None,
negative_prompt: str = "", negative_prompt: str = "",
guidance_scale: float = 4.0, guidance_scale: float = 4.0,
...@@ -171,7 +171,7 @@ class KandinskyV22PriorEmb2EmbPipeline(DiffusionPipeline): ...@@ -171,7 +171,7 @@ class KandinskyV22PriorEmb2EmbPipeline(DiffusionPipeline):
Function invoked when using the prior pipeline for interpolation. Function invoked when using the prior pipeline for interpolation.
Args: Args:
images_and_prompts (`List[Union[str, PIL.Image.Image, torch.FloatTensor]]`): images_and_prompts (`List[Union[str, PIL.Image.Image, torch.Tensor]]`):
list of prompts and images to guide the image generation. list of prompts and images to guide the image generation.
weights: (`List[float]`): weights: (`List[float]`):
list of weights for each condition in `images_and_prompts` list of weights for each condition in `images_and_prompts`
...@@ -183,7 +183,7 @@ class KandinskyV22PriorEmb2EmbPipeline(DiffusionPipeline): ...@@ -183,7 +183,7 @@ class KandinskyV22PriorEmb2EmbPipeline(DiffusionPipeline):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*): generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
to make generation deterministic. to make generation deterministic.
latents (`torch.FloatTensor`, *optional*): latents (`torch.Tensor`, *optional*):
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor will ge generated by sampling using the supplied random `generator`. tensor will ge generated by sampling using the supplied random `generator`.
...@@ -418,7 +418,7 @@ class KandinskyV22PriorEmb2EmbPipeline(DiffusionPipeline): ...@@ -418,7 +418,7 @@ class KandinskyV22PriorEmb2EmbPipeline(DiffusionPipeline):
Conceptually, indicates how much to transform the reference `emb`. Must be between 0 and 1. `image` Conceptually, indicates how much to transform the reference `emb`. Must be between 0 and 1. `image`
will be used as a starting point, adding more noise to it the larger the `strength`. The number of will be used as a starting point, adding more noise to it the larger the `strength`. The number of
denoising steps depends on the amount of noise initially added. denoising steps depends on the amount of noise initially added.
emb (`torch.FloatTensor`): emb (`torch.Tensor`):
The image embedding. The image embedding.
negative_prompt (`str` or `List[str]`, *optional*): negative_prompt (`str` or `List[str]`, *optional*):
The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored
......
...@@ -87,11 +87,11 @@ class Kandinsky3Pipeline(DiffusionPipeline, LoraLoaderMixin): ...@@ -87,11 +87,11 @@ class Kandinsky3Pipeline(DiffusionPipeline, LoraLoaderMixin):
num_images_per_prompt=1, num_images_per_prompt=1,
device=None, device=None,
negative_prompt=None, negative_prompt=None,
prompt_embeds: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.Tensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.Tensor] = None,
_cut_context=False, _cut_context=False,
attention_mask: Optional[torch.FloatTensor] = None, attention_mask: Optional[torch.Tensor] = None,
negative_attention_mask: Optional[torch.FloatTensor] = None, negative_attention_mask: Optional[torch.Tensor] = None,
): ):
r""" r"""
Encodes the prompt into text encoder hidden states. Encodes the prompt into text encoder hidden states.
...@@ -109,16 +109,16 @@ class Kandinsky3Pipeline(DiffusionPipeline, LoraLoaderMixin): ...@@ -109,16 +109,16 @@ class Kandinsky3Pipeline(DiffusionPipeline, LoraLoaderMixin):
The prompt or prompts not to guide the image generation. If not defined, one has to pass The prompt or prompts not to guide the image generation. If not defined, one has to pass
`negative_prompt_embeds`. instead. If not defined, one has to pass `negative_prompt_embeds`. instead. `negative_prompt_embeds`. instead. If not defined, one has to pass `negative_prompt_embeds`. instead.
Ignored when not using guidance (i.e., ignored if `guidance_scale` is less than `1`). Ignored when not using guidance (i.e., ignored if `guidance_scale` is less than `1`).
prompt_embeds (`torch.FloatTensor`, *optional*): prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
provided, text embeddings will be generated from `prompt` input argument. provided, text embeddings will be generated from `prompt` input argument.
negative_prompt_embeds (`torch.FloatTensor`, *optional*): negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
argument. argument.
attention_mask (`torch.FloatTensor`, *optional*): attention_mask (`torch.Tensor`, *optional*):
Pre-generated attention mask. Must provide if passing `prompt_embeds` directly. Pre-generated attention mask. Must provide if passing `prompt_embeds` directly.
negative_attention_mask (`torch.FloatTensor`, *optional*): negative_attention_mask (`torch.Tensor`, *optional*):
Pre-generated negative attention mask. Must provide if passing `negative_prompt_embeds` directly. Pre-generated negative attention mask. Must provide if passing `negative_prompt_embeds` directly.
""" """
if prompt is not None and negative_prompt is not None: if prompt is not None and negative_prompt is not None:
...@@ -334,10 +334,10 @@ class Kandinsky3Pipeline(DiffusionPipeline, LoraLoaderMixin): ...@@ -334,10 +334,10 @@ class Kandinsky3Pipeline(DiffusionPipeline, LoraLoaderMixin):
height: Optional[int] = 1024, height: Optional[int] = 1024,
width: Optional[int] = 1024, width: Optional[int] = 1024,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
prompt_embeds: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.Tensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.Tensor] = None,
attention_mask: Optional[torch.FloatTensor] = None, attention_mask: Optional[torch.Tensor] = None,
negative_attention_mask: Optional[torch.FloatTensor] = None, negative_attention_mask: Optional[torch.Tensor] = None,
output_type: Optional[str] = "pil", output_type: Optional[str] = "pil",
return_dict: bool = True, return_dict: bool = True,
latents=None, latents=None,
...@@ -380,16 +380,16 @@ class Kandinsky3Pipeline(DiffusionPipeline, LoraLoaderMixin): ...@@ -380,16 +380,16 @@ class Kandinsky3Pipeline(DiffusionPipeline, LoraLoaderMixin):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*): generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
to make generation deterministic. to make generation deterministic.
prompt_embeds (`torch.FloatTensor`, *optional*): prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
provided, text embeddings will be generated from `prompt` input argument. provided, text embeddings will be generated from `prompt` input argument.
negative_prompt_embeds (`torch.FloatTensor`, *optional*): negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
argument. argument.
attention_mask (`torch.FloatTensor`, *optional*): attention_mask (`torch.Tensor`, *optional*):
Pre-generated attention mask. Must provide if passing `prompt_embeds` directly. Pre-generated attention mask. Must provide if passing `prompt_embeds` directly.
negative_attention_mask (`torch.FloatTensor`, *optional*): negative_attention_mask (`torch.Tensor`, *optional*):
Pre-generated negative attention mask. Must provide if passing `negative_prompt_embeds` directly. Pre-generated negative attention mask. Must provide if passing `negative_prompt_embeds` directly.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generate image. Choose between The output format of the generate image. Choose between
...@@ -398,7 +398,7 @@ class Kandinsky3Pipeline(DiffusionPipeline, LoraLoaderMixin): ...@@ -398,7 +398,7 @@ class Kandinsky3Pipeline(DiffusionPipeline, LoraLoaderMixin):
Whether or not to return a [`~pipelines.stable_diffusion.IFPipelineOutput`] instead of a plain tuple. Whether or not to return a [`~pipelines.stable_diffusion.IFPipelineOutput`] instead of a plain tuple.
callback (`Callable`, *optional*): callback (`Callable`, *optional*):
A function that will be called every `callback_steps` steps during inference. The function will be A function that will be called every `callback_steps` steps during inference. The function will be
called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`. called with the following arguments: `callback(step: int, timestep: int, latents: torch.Tensor)`.
callback_steps (`int`, *optional*, defaults to 1): callback_steps (`int`, *optional*, defaults to 1):
The frequency at which the `callback` function will be called. If not specified, the callback will be The frequency at which the `callback` function will be called. If not specified, the callback will be
called at every step. called at every step.
......
...@@ -112,11 +112,11 @@ class Kandinsky3Img2ImgPipeline(DiffusionPipeline, LoraLoaderMixin): ...@@ -112,11 +112,11 @@ class Kandinsky3Img2ImgPipeline(DiffusionPipeline, LoraLoaderMixin):
num_images_per_prompt=1, num_images_per_prompt=1,
device=None, device=None,
negative_prompt=None, negative_prompt=None,
prompt_embeds: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.Tensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.Tensor] = None,
_cut_context=False, _cut_context=False,
attention_mask: Optional[torch.FloatTensor] = None, attention_mask: Optional[torch.Tensor] = None,
negative_attention_mask: Optional[torch.FloatTensor] = None, negative_attention_mask: Optional[torch.Tensor] = None,
): ):
r""" r"""
Encodes the prompt into text encoder hidden states. Encodes the prompt into text encoder hidden states.
...@@ -134,16 +134,16 @@ class Kandinsky3Img2ImgPipeline(DiffusionPipeline, LoraLoaderMixin): ...@@ -134,16 +134,16 @@ class Kandinsky3Img2ImgPipeline(DiffusionPipeline, LoraLoaderMixin):
The prompt or prompts not to guide the image generation. If not defined, one has to pass The prompt or prompts not to guide the image generation. If not defined, one has to pass
`negative_prompt_embeds`. instead. If not defined, one has to pass `negative_prompt_embeds`. instead. `negative_prompt_embeds`. instead. If not defined, one has to pass `negative_prompt_embeds`. instead.
Ignored when not using guidance (i.e., ignored if `guidance_scale` is less than `1`). Ignored when not using guidance (i.e., ignored if `guidance_scale` is less than `1`).
prompt_embeds (`torch.FloatTensor`, *optional*): prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
provided, text embeddings will be generated from `prompt` input argument. provided, text embeddings will be generated from `prompt` input argument.
negative_prompt_embeds (`torch.FloatTensor`, *optional*): negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
argument. argument.
attention_mask (`torch.FloatTensor`, *optional*): attention_mask (`torch.Tensor`, *optional*):
Pre-generated attention mask. Must provide if passing `prompt_embeds` directly. Pre-generated attention mask. Must provide if passing `prompt_embeds` directly.
negative_attention_mask (`torch.FloatTensor`, *optional*): negative_attention_mask (`torch.Tensor`, *optional*):
Pre-generated negative attention mask. Must provide if passing `negative_prompt_embeds` directly. Pre-generated negative attention mask. Must provide if passing `negative_prompt_embeds` directly.
""" """
if prompt is not None and negative_prompt is not None: if prompt is not None and negative_prompt is not None:
...@@ -403,17 +403,17 @@ class Kandinsky3Img2ImgPipeline(DiffusionPipeline, LoraLoaderMixin): ...@@ -403,17 +403,17 @@ class Kandinsky3Img2ImgPipeline(DiffusionPipeline, LoraLoaderMixin):
def __call__( def __call__(
self, self,
prompt: Union[str, List[str]] = None, prompt: Union[str, List[str]] = None,
image: Union[torch.FloatTensor, PIL.Image.Image, List[torch.FloatTensor], List[PIL.Image.Image]] = None, image: Union[torch.Tensor, PIL.Image.Image, List[torch.Tensor], List[PIL.Image.Image]] = None,
strength: float = 0.3, strength: float = 0.3,
num_inference_steps: int = 25, num_inference_steps: int = 25,
guidance_scale: float = 3.0, guidance_scale: float = 3.0,
negative_prompt: Optional[Union[str, List[str]]] = None, negative_prompt: Optional[Union[str, List[str]]] = None,
num_images_per_prompt: Optional[int] = 1, num_images_per_prompt: Optional[int] = 1,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
prompt_embeds: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.Tensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.Tensor] = None,
attention_mask: Optional[torch.FloatTensor] = None, attention_mask: Optional[torch.Tensor] = None,
negative_attention_mask: Optional[torch.FloatTensor] = None, negative_attention_mask: Optional[torch.Tensor] = None,
output_type: Optional[str] = "pil", output_type: Optional[str] = "pil",
return_dict: bool = True, return_dict: bool = True,
callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None, callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None,
...@@ -427,7 +427,7 @@ class Kandinsky3Img2ImgPipeline(DiffusionPipeline, LoraLoaderMixin): ...@@ -427,7 +427,7 @@ class Kandinsky3Img2ImgPipeline(DiffusionPipeline, LoraLoaderMixin):
prompt (`str` or `List[str]`, *optional*): prompt (`str` or `List[str]`, *optional*):
The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`. The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`.
instead. instead.
image (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`): image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`):
`Image`, or tensor representing an image batch, that will be used as the starting point for the `Image`, or tensor representing an image batch, that will be used as the starting point for the
process. process.
strength (`float`, *optional*, defaults to 0.8): strength (`float`, *optional*, defaults to 0.8):
...@@ -454,16 +454,16 @@ class Kandinsky3Img2ImgPipeline(DiffusionPipeline, LoraLoaderMixin): ...@@ -454,16 +454,16 @@ class Kandinsky3Img2ImgPipeline(DiffusionPipeline, LoraLoaderMixin):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*): generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
to make generation deterministic. to make generation deterministic.
prompt_embeds (`torch.FloatTensor`, *optional*): prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
provided, text embeddings will be generated from `prompt` input argument. provided, text embeddings will be generated from `prompt` input argument.
negative_prompt_embeds (`torch.FloatTensor`, *optional*): negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
argument. argument.
attention_mask (`torch.FloatTensor`, *optional*): attention_mask (`torch.Tensor`, *optional*):
Pre-generated attention mask. Must provide if passing `prompt_embeds` directly. Pre-generated attention mask. Must provide if passing `prompt_embeds` directly.
negative_attention_mask (`torch.FloatTensor`, *optional*): negative_attention_mask (`torch.Tensor`, *optional*):
Pre-generated negative attention mask. Must provide if passing `negative_prompt_embeds` directly. Pre-generated negative attention mask. Must provide if passing `negative_prompt_embeds` directly.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generate image. Choose between The output format of the generate image. Choose between
......
...@@ -237,8 +237,8 @@ class LatentConsistencyModelImg2ImgPipeline( ...@@ -237,8 +237,8 @@ class LatentConsistencyModelImg2ImgPipeline(
num_images_per_prompt, num_images_per_prompt,
do_classifier_free_guidance, do_classifier_free_guidance,
negative_prompt=None, negative_prompt=None,
prompt_embeds: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.Tensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.Tensor] = None,
lora_scale: Optional[float] = None, lora_scale: Optional[float] = None,
clip_skip: Optional[int] = None, clip_skip: Optional[int] = None,
): ):
...@@ -258,10 +258,10 @@ class LatentConsistencyModelImg2ImgPipeline( ...@@ -258,10 +258,10 @@ class LatentConsistencyModelImg2ImgPipeline(
The prompt or prompts not to guide the image generation. If not defined, one has to pass The prompt or prompts not to guide the image generation. If not defined, one has to pass
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is
less than `1`). less than `1`).
prompt_embeds (`torch.FloatTensor`, *optional*): prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
provided, text embeddings will be generated from `prompt` input argument. provided, text embeddings will be generated from `prompt` input argument.
negative_prompt_embeds (`torch.FloatTensor`, *optional*): negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
argument. argument.
...@@ -565,7 +565,7 @@ class LatentConsistencyModelImg2ImgPipeline( ...@@ -565,7 +565,7 @@ class LatentConsistencyModelImg2ImgPipeline(
# Copied from diffusers.pipelines.latent_consistency_models.pipeline_latent_consistency_text2img.LatentConsistencyModelPipeline.get_guidance_scale_embedding # Copied from diffusers.pipelines.latent_consistency_models.pipeline_latent_consistency_text2img.LatentConsistencyModelPipeline.get_guidance_scale_embedding
def get_guidance_scale_embedding( def get_guidance_scale_embedding(
self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32 self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32
) -> torch.FloatTensor: ) -> torch.Tensor:
""" """
See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298 See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298
...@@ -578,7 +578,7 @@ class LatentConsistencyModelImg2ImgPipeline( ...@@ -578,7 +578,7 @@ class LatentConsistencyModelImg2ImgPipeline(
Data type of the generated embeddings. Data type of the generated embeddings.
Returns: Returns:
`torch.FloatTensor`: Embedding vectors with shape `(len(w), embedding_dim)`. `torch.Tensor`: Embedding vectors with shape `(len(w), embedding_dim)`.
""" """
assert len(w.shape) == 1 assert len(w.shape) == 1
w = w * 1000.0 w = w * 1000.0
...@@ -628,7 +628,7 @@ class LatentConsistencyModelImg2ImgPipeline( ...@@ -628,7 +628,7 @@ class LatentConsistencyModelImg2ImgPipeline(
prompt: Union[str, List[str]], prompt: Union[str, List[str]],
strength: float, strength: float,
callback_steps: int, callback_steps: int,
prompt_embeds: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.Tensor] = None,
ip_adapter_image=None, ip_adapter_image=None,
ip_adapter_image_embeds=None, ip_adapter_image_embeds=None,
callback_on_step_end_tensor_inputs=None, callback_on_step_end_tensor_inputs=None,
...@@ -709,10 +709,10 @@ class LatentConsistencyModelImg2ImgPipeline( ...@@ -709,10 +709,10 @@ class LatentConsistencyModelImg2ImgPipeline(
guidance_scale: float = 8.5, guidance_scale: float = 8.5,
num_images_per_prompt: Optional[int] = 1, num_images_per_prompt: Optional[int] = 1,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None, latents: Optional[torch.Tensor] = None,
prompt_embeds: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.Tensor] = None,
ip_adapter_image: Optional[PipelineImageInput] = None, ip_adapter_image: Optional[PipelineImageInput] = None,
ip_adapter_image_embeds: Optional[List[torch.FloatTensor]] = None, ip_adapter_image_embeds: Optional[List[torch.Tensor]] = None,
output_type: Optional[str] = "pil", output_type: Optional[str] = "pil",
return_dict: bool = True, return_dict: bool = True,
cross_attention_kwargs: Optional[Dict[str, Any]] = None, cross_attention_kwargs: Optional[Dict[str, Any]] = None,
...@@ -754,16 +754,16 @@ class LatentConsistencyModelImg2ImgPipeline( ...@@ -754,16 +754,16 @@ class LatentConsistencyModelImg2ImgPipeline(
generator (`torch.Generator` or `List[torch.Generator]`, *optional*): generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make
generation deterministic. generation deterministic.
latents (`torch.FloatTensor`, *optional*): latents (`torch.Tensor`, *optional*):
Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor is generated by sampling using the supplied random `generator`. tensor is generated by sampling using the supplied random `generator`.
prompt_embeds (`torch.FloatTensor`, *optional*): prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not
provided, text embeddings are generated from the `prompt` input argument. provided, text embeddings are generated from the `prompt` input argument.
ip_adapter_image: (`PipelineImageInput`, *optional*): ip_adapter_image: (`PipelineImageInput`, *optional*):
Optional image input to work with IP Adapters. Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.Tensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of
IP-adapters. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should IP-adapters. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should
contain the negative image embedding if `do_classifier_free_guidance` is set to `True`. If not contain the negative image embedding if `do_classifier_free_guidance` is set to `True`. If not
......
...@@ -221,8 +221,8 @@ class LatentConsistencyModelPipeline( ...@@ -221,8 +221,8 @@ class LatentConsistencyModelPipeline(
num_images_per_prompt, num_images_per_prompt,
do_classifier_free_guidance, do_classifier_free_guidance,
negative_prompt=None, negative_prompt=None,
prompt_embeds: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.Tensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.Tensor] = None,
lora_scale: Optional[float] = None, lora_scale: Optional[float] = None,
clip_skip: Optional[int] = None, clip_skip: Optional[int] = None,
): ):
...@@ -242,10 +242,10 @@ class LatentConsistencyModelPipeline( ...@@ -242,10 +242,10 @@ class LatentConsistencyModelPipeline(
The prompt or prompts not to guide the image generation. If not defined, one has to pass The prompt or prompts not to guide the image generation. If not defined, one has to pass
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is
less than `1`). less than `1`).
prompt_embeds (`torch.FloatTensor`, *optional*): prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
provided, text embeddings will be generated from `prompt` input argument. provided, text embeddings will be generated from `prompt` input argument.
negative_prompt_embeds (`torch.FloatTensor`, *optional*): negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
argument. argument.
...@@ -512,7 +512,7 @@ class LatentConsistencyModelPipeline( ...@@ -512,7 +512,7 @@ class LatentConsistencyModelPipeline(
def get_guidance_scale_embedding( def get_guidance_scale_embedding(
self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32 self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32
) -> torch.FloatTensor: ) -> torch.Tensor:
""" """
See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298 See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298
...@@ -525,7 +525,7 @@ class LatentConsistencyModelPipeline( ...@@ -525,7 +525,7 @@ class LatentConsistencyModelPipeline(
Data type of the generated embeddings. Data type of the generated embeddings.
Returns: Returns:
`torch.FloatTensor`: Embedding vectors with shape `(len(w), embedding_dim)`. `torch.Tensor`: Embedding vectors with shape `(len(w), embedding_dim)`.
""" """
assert len(w.shape) == 1 assert len(w.shape) == 1
w = w * 1000.0 w = w * 1000.0
...@@ -565,7 +565,7 @@ class LatentConsistencyModelPipeline( ...@@ -565,7 +565,7 @@ class LatentConsistencyModelPipeline(
height: int, height: int,
width: int, width: int,
callback_steps: int, callback_steps: int,
prompt_embeds: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.Tensor] = None,
ip_adapter_image=None, ip_adapter_image=None,
ip_adapter_image_embeds=None, ip_adapter_image_embeds=None,
callback_on_step_end_tensor_inputs=None, callback_on_step_end_tensor_inputs=None,
...@@ -646,10 +646,10 @@ class LatentConsistencyModelPipeline( ...@@ -646,10 +646,10 @@ class LatentConsistencyModelPipeline(
guidance_scale: float = 8.5, guidance_scale: float = 8.5,
num_images_per_prompt: Optional[int] = 1, num_images_per_prompt: Optional[int] = 1,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None, latents: Optional[torch.Tensor] = None,
prompt_embeds: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.Tensor] = None,
ip_adapter_image: Optional[PipelineImageInput] = None, ip_adapter_image: Optional[PipelineImageInput] = None,
ip_adapter_image_embeds: Optional[List[torch.FloatTensor]] = None, ip_adapter_image_embeds: Optional[List[torch.Tensor]] = None,
output_type: Optional[str] = "pil", output_type: Optional[str] = "pil",
return_dict: bool = True, return_dict: bool = True,
cross_attention_kwargs: Optional[Dict[str, Any]] = None, cross_attention_kwargs: Optional[Dict[str, Any]] = None,
...@@ -691,16 +691,16 @@ class LatentConsistencyModelPipeline( ...@@ -691,16 +691,16 @@ class LatentConsistencyModelPipeline(
generator (`torch.Generator` or `List[torch.Generator]`, *optional*): generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make
generation deterministic. generation deterministic.
latents (`torch.FloatTensor`, *optional*): latents (`torch.Tensor`, *optional*):
Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor is generated by sampling using the supplied random `generator`. tensor is generated by sampling using the supplied random `generator`.
prompt_embeds (`torch.FloatTensor`, *optional*): prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not
provided, text embeddings are generated from the `prompt` input argument. provided, text embeddings are generated from the `prompt` input argument.
ip_adapter_image: (`PipelineImageInput`, *optional*): ip_adapter_image: (`PipelineImageInput`, *optional*):
Optional image input to work with IP Adapters. Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.Tensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of
IP-adapters. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should IP-adapters. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should
contain the negative image embedding if `do_classifier_free_guidance` is set to `True`. If not contain the negative image embedding if `do_classifier_free_guidance` is set to `True`. If not
......
...@@ -74,7 +74,7 @@ class LDMTextToImagePipeline(DiffusionPipeline): ...@@ -74,7 +74,7 @@ class LDMTextToImagePipeline(DiffusionPipeline):
guidance_scale: Optional[float] = 1.0, guidance_scale: Optional[float] = 1.0,
eta: Optional[float] = 0.0, eta: Optional[float] = 0.0,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None, latents: Optional[torch.Tensor] = None,
output_type: Optional[str] = "pil", output_type: Optional[str] = "pil",
return_dict: bool = True, return_dict: bool = True,
**kwargs, **kwargs,
...@@ -98,7 +98,7 @@ class LDMTextToImagePipeline(DiffusionPipeline): ...@@ -98,7 +98,7 @@ class LDMTextToImagePipeline(DiffusionPipeline):
generator (`torch.Generator`, *optional*): generator (`torch.Generator`, *optional*):
A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make
generation deterministic. generation deterministic.
latents (`torch.FloatTensor`, *optional*): latents (`torch.Tensor`, *optional*):
Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor is generated by sampling using the supplied random `generator`. tensor is generated by sampling using the supplied random `generator`.
...@@ -465,17 +465,17 @@ class LDMBertEncoderLayer(nn.Module): ...@@ -465,17 +465,17 @@ class LDMBertEncoderLayer(nn.Module):
def forward( def forward(
self, self,
hidden_states: torch.FloatTensor, hidden_states: torch.Tensor,
attention_mask: torch.FloatTensor, attention_mask: torch.Tensor,
layer_head_mask: torch.FloatTensor, layer_head_mask: torch.Tensor,
output_attentions: Optional[bool] = False, output_attentions: Optional[bool] = False,
) -> Tuple[torch.FloatTensor, Optional[torch.FloatTensor]]: ) -> Tuple[torch.Tensor, Optional[torch.Tensor]]:
""" """
Args: Args:
hidden_states (`torch.FloatTensor`): input to the layer of shape `(seq_len, batch, embed_dim)` hidden_states (`torch.Tensor`): input to the layer of shape `(seq_len, batch, embed_dim)`
attention_mask (`torch.FloatTensor`): attention mask of size attention_mask (`torch.Tensor`): attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
layer_head_mask (`torch.FloatTensor`): mask for attention heads in a given layer of size layer_head_mask (`torch.Tensor`): mask for attention heads in a given layer of size
`(encoder_attention_heads,)`. `(encoder_attention_heads,)`.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
...@@ -587,7 +587,7 @@ class LDMBertEncoder(LDMBertPreTrainedModel): ...@@ -587,7 +587,7 @@ class LDMBertEncoder(LDMBertPreTrainedModel):
attention_mask: Optional[torch.Tensor] = None, attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None, position_ids: Optional[torch.LongTensor] = None,
head_mask: Optional[torch.Tensor] = None, head_mask: Optional[torch.Tensor] = None,
inputs_embeds: Optional[torch.FloatTensor] = None, inputs_embeds: Optional[torch.Tensor] = None,
output_attentions: Optional[bool] = None, output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None, output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None, return_dict: Optional[bool] = None,
...@@ -615,7 +615,7 @@ class LDMBertEncoder(LDMBertPreTrainedModel): ...@@ -615,7 +615,7 @@ class LDMBertEncoder(LDMBertPreTrainedModel):
- 1 indicates the head is **not masked**, - 1 indicates the head is **not masked**,
- 0 indicates the head is **masked**. - 0 indicates the head is **masked**.
inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): inputs_embeds (`torch.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
This is useful if you want more control over how to convert `input_ids` indices into associated vectors This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix. than the model's internal embedding lookup matrix.
......
...@@ -502,8 +502,8 @@ class LEditsPPPipelineStableDiffusion( ...@@ -502,8 +502,8 @@ class LEditsPPPipelineStableDiffusion(
enable_edit_guidance, enable_edit_guidance,
negative_prompt=None, negative_prompt=None,
editing_prompt=None, editing_prompt=None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.Tensor] = None,
editing_prompt_embeds: Optional[torch.FloatTensor] = None, editing_prompt_embeds: Optional[torch.Tensor] = None,
lora_scale: Optional[float] = None, lora_scale: Optional[float] = None,
clip_skip: Optional[int] = None, clip_skip: Optional[int] = None,
): ):
...@@ -523,10 +523,10 @@ class LEditsPPPipelineStableDiffusion( ...@@ -523,10 +523,10 @@ class LEditsPPPipelineStableDiffusion(
less than `1`). less than `1`).
editing_prompt (`str` or `List[str]`, *optional*): editing_prompt (`str` or `List[str]`, *optional*):
Editing prompt(s) to be encoded. If not defined, one has to pass `editing_prompt_embeds` instead. Editing prompt(s) to be encoded. If not defined, one has to pass `editing_prompt_embeds` instead.
editing_prompt_embeds (`torch.FloatTensor`, *optional*): editing_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
provided, text embeddings will be generated from `prompt` input argument. provided, text embeddings will be generated from `prompt` input argument.
negative_prompt_embeds (`torch.FloatTensor`, *optional*): negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
argument. argument.
...@@ -704,13 +704,13 @@ class LEditsPPPipelineStableDiffusion( ...@@ -704,13 +704,13 @@ class LEditsPPPipelineStableDiffusion(
return_dict: bool = True, return_dict: bool = True,
editing_prompt: Optional[Union[str, List[str]]] = None, editing_prompt: Optional[Union[str, List[str]]] = None,
editing_prompt_embeds: Optional[torch.Tensor] = None, editing_prompt_embeds: Optional[torch.Tensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.Tensor] = None,
reverse_editing_direction: Optional[Union[bool, List[bool]]] = False, reverse_editing_direction: Optional[Union[bool, List[bool]]] = False,
edit_guidance_scale: Optional[Union[float, List[float]]] = 5, edit_guidance_scale: Optional[Union[float, List[float]]] = 5,
edit_warmup_steps: Optional[Union[int, List[int]]] = 0, edit_warmup_steps: Optional[Union[int, List[int]]] = 0,
edit_cooldown_steps: Optional[Union[int, List[int]]] = None, edit_cooldown_steps: Optional[Union[int, List[int]]] = None,
edit_threshold: Optional[Union[float, List[float]]] = 0.9, edit_threshold: Optional[Union[float, List[float]]] = 0.9,
user_mask: Optional[torch.FloatTensor] = None, user_mask: Optional[torch.Tensor] = None,
sem_guidance: Optional[List[torch.Tensor]] = None, sem_guidance: Optional[List[torch.Tensor]] = None,
use_cross_attn_mask: bool = False, use_cross_attn_mask: bool = False,
use_intersect_mask: bool = True, use_intersect_mask: bool = True,
...@@ -748,7 +748,7 @@ class LEditsPPPipelineStableDiffusion( ...@@ -748,7 +748,7 @@ class LEditsPPPipelineStableDiffusion(
editing_prompt_embeds (`torch.Tensor>`, *optional*): editing_prompt_embeds (`torch.Tensor>`, *optional*):
Pre-computed embeddings to use for guiding the image generation. Guidance direction of embedding should Pre-computed embeddings to use for guiding the image generation. Guidance direction of embedding should
be specified via `reverse_editing_direction`. be specified via `reverse_editing_direction`.
negative_prompt_embeds (`torch.FloatTensor`, *optional*): negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If
not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument. not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument.
reverse_editing_direction (`bool` or `List[bool]`, *optional*, defaults to `False`): reverse_editing_direction (`bool` or `List[bool]`, *optional*, defaults to `False`):
...@@ -765,7 +765,7 @@ class LEditsPPPipelineStableDiffusion( ...@@ -765,7 +765,7 @@ class LEditsPPPipelineStableDiffusion(
Masking threshold of guidance. Threshold should be proportional to the image region that is modified. Masking threshold of guidance. Threshold should be proportional to the image region that is modified.
'edit_threshold' is defined as 'λ' of equation 12 of [LEDITS++ 'edit_threshold' is defined as 'λ' of equation 12 of [LEDITS++
Paper](https://arxiv.org/abs/2301.12247). Paper](https://arxiv.org/abs/2301.12247).
user_mask (`torch.FloatTensor`, *optional*): user_mask (`torch.Tensor`, *optional*):
User-provided mask for even better control over the editing process. This is helpful when LEDITS++'s User-provided mask for even better control over the editing process. This is helpful when LEDITS++'s
implicit masks do not meet user preferences. implicit masks do not meet user preferences.
sem_guidance (`List[torch.Tensor]`, *optional*): sem_guidance (`List[torch.Tensor]`, *optional*):
......
...@@ -409,14 +409,14 @@ class LEditsPPPipelineStableDiffusionXL( ...@@ -409,14 +409,14 @@ class LEditsPPPipelineStableDiffusionXL(
num_images_per_prompt: int = 1, num_images_per_prompt: int = 1,
negative_prompt: Optional[str] = None, negative_prompt: Optional[str] = None,
negative_prompt_2: Optional[str] = None, negative_prompt_2: Optional[str] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.Tensor] = None,
negative_pooled_prompt_embeds: Optional[torch.FloatTensor] = None, negative_pooled_prompt_embeds: Optional[torch.Tensor] = None,
lora_scale: Optional[float] = None, lora_scale: Optional[float] = None,
clip_skip: Optional[int] = None, clip_skip: Optional[int] = None,
enable_edit_guidance: bool = True, enable_edit_guidance: bool = True,
editing_prompt: Optional[str] = None, editing_prompt: Optional[str] = None,
editing_prompt_embeds: Optional[torch.FloatTensor] = None, editing_prompt_embeds: Optional[torch.Tensor] = None,
editing_pooled_prompt_embeds: Optional[torch.FloatTensor] = None, editing_pooled_prompt_embeds: Optional[torch.Tensor] = None,
) -> object: ) -> object:
r""" r"""
Encodes the prompt into text encoder hidden states. Encodes the prompt into text encoder hidden states.
...@@ -432,11 +432,11 @@ class LEditsPPPipelineStableDiffusionXL( ...@@ -432,11 +432,11 @@ class LEditsPPPipelineStableDiffusionXL(
negative_prompt_2 (`str` or `List[str]`, *optional*): negative_prompt_2 (`str` or `List[str]`, *optional*):
The prompt or prompts not to guide the image generation to be sent to `tokenizer_2` and The prompt or prompts not to guide the image generation to be sent to `tokenizer_2` and
`text_encoder_2`. If not defined, `negative_prompt` is used in both text-encoders `text_encoder_2`. If not defined, `negative_prompt` is used in both text-encoders
negative_prompt_embeds (`torch.FloatTensor`, *optional*): negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
argument. argument.
negative_pooled_prompt_embeds (`torch.FloatTensor`, *optional*): negative_pooled_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
weighting. If not provided, pooled negative_prompt_embeds will be generated from `negative_prompt` weighting. If not provided, pooled negative_prompt_embeds will be generated from `negative_prompt`
input argument. input argument.
...@@ -450,11 +450,11 @@ class LEditsPPPipelineStableDiffusionXL( ...@@ -450,11 +450,11 @@ class LEditsPPPipelineStableDiffusionXL(
editing_prompt (`str` or `List[str]`, *optional*): editing_prompt (`str` or `List[str]`, *optional*):
Editing prompt(s) to be encoded. If not defined and 'enable_edit_guidance' is True, one has to pass Editing prompt(s) to be encoded. If not defined and 'enable_edit_guidance' is True, one has to pass
`editing_prompt_embeds` instead. `editing_prompt_embeds` instead.
editing_prompt_embeds (`torch.FloatTensor`, *optional*): editing_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated edit text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. Pre-generated edit text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting.
If not provided and 'enable_edit_guidance' is True, editing_prompt_embeds will be generated from If not provided and 'enable_edit_guidance' is True, editing_prompt_embeds will be generated from
`editing_prompt` input argument. `editing_prompt` input argument.
editing_pooled_prompt_embeds (`torch.FloatTensor`, *optional*): editing_pooled_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated edit pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt Pre-generated edit pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
weighting. If not provided, pooled editing_pooled_prompt_embeds will be generated from `editing_prompt` weighting. If not provided, pooled editing_pooled_prompt_embeds will be generated from `editing_prompt`
input argument. input argument.
...@@ -713,7 +713,7 @@ class LEditsPPPipelineStableDiffusionXL( ...@@ -713,7 +713,7 @@ class LEditsPPPipelineStableDiffusionXL(
# Copied from diffusers.pipelines.latent_consistency_models.pipeline_latent_consistency_text2img.LatentConsistencyModelPipeline.get_guidance_scale_embedding # Copied from diffusers.pipelines.latent_consistency_models.pipeline_latent_consistency_text2img.LatentConsistencyModelPipeline.get_guidance_scale_embedding
def get_guidance_scale_embedding( def get_guidance_scale_embedding(
self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32 self, w: torch.Tensor, embedding_dim: int = 512, dtype: torch.dtype = torch.float32
) -> torch.FloatTensor: ) -> torch.Tensor:
""" """
See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298 See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298
...@@ -726,7 +726,7 @@ class LEditsPPPipelineStableDiffusionXL( ...@@ -726,7 +726,7 @@ class LEditsPPPipelineStableDiffusionXL(
Data type of the generated embeddings. Data type of the generated embeddings.
Returns: Returns:
`torch.FloatTensor`: Embedding vectors with shape `(len(w), embedding_dim)`. `torch.Tensor`: Embedding vectors with shape `(len(w), embedding_dim)`.
""" """
assert len(w.shape) == 1 assert len(w.shape) == 1
w = w * 1000.0 w = w * 1000.0
...@@ -804,8 +804,8 @@ class LEditsPPPipelineStableDiffusionXL( ...@@ -804,8 +804,8 @@ class LEditsPPPipelineStableDiffusionXL(
denoising_end: Optional[float] = None, denoising_end: Optional[float] = None,
negative_prompt: Optional[Union[str, List[str]]] = None, negative_prompt: Optional[Union[str, List[str]]] = None,
negative_prompt_2: Optional[Union[str, List[str]]] = None, negative_prompt_2: Optional[Union[str, List[str]]] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.Tensor] = None,
negative_pooled_prompt_embeds: Optional[torch.FloatTensor] = None, negative_pooled_prompt_embeds: Optional[torch.Tensor] = None,
ip_adapter_image: Optional[PipelineImageInput] = None, ip_adapter_image: Optional[PipelineImageInput] = None,
output_type: Optional[str] = "pil", output_type: Optional[str] = "pil",
return_dict: bool = True, return_dict: bool = True,
...@@ -824,7 +824,7 @@ class LEditsPPPipelineStableDiffusionXL( ...@@ -824,7 +824,7 @@ class LEditsPPPipelineStableDiffusionXL(
sem_guidance: Optional[List[torch.Tensor]] = None, sem_guidance: Optional[List[torch.Tensor]] = None,
use_cross_attn_mask: bool = False, use_cross_attn_mask: bool = False,
use_intersect_mask: bool = False, use_intersect_mask: bool = False,
user_mask: Optional[torch.FloatTensor] = None, user_mask: Optional[torch.Tensor] = None,
attn_store_steps: Optional[List[int]] = [], attn_store_steps: Optional[List[int]] = [],
store_averaged_over_steps: bool = True, store_averaged_over_steps: bool = True,
clip_skip: Optional[int] = None, clip_skip: Optional[int] = None,
...@@ -851,11 +851,11 @@ class LEditsPPPipelineStableDiffusionXL( ...@@ -851,11 +851,11 @@ class LEditsPPPipelineStableDiffusionXL(
negative_prompt_2 (`str` or `List[str]`, *optional*): negative_prompt_2 (`str` or `List[str]`, *optional*):
The prompt or prompts not to guide the image generation to be sent to `tokenizer_2` and The prompt or prompts not to guide the image generation to be sent to `tokenizer_2` and
`text_encoder_2`. If not defined, `negative_prompt` is used in both text-encoders `text_encoder_2`. If not defined, `negative_prompt` is used in both text-encoders
negative_prompt_embeds (`torch.FloatTensor`, *optional*): negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
argument. argument.
negative_pooled_prompt_embeds (`torch.FloatTensor`, *optional*): negative_pooled_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
weighting. If not provided, pooled negative_prompt_embeds will be generated from `negative_prompt` weighting. If not provided, pooled negative_prompt_embeds will be generated from `negative_prompt`
input argument. input argument.
...@@ -869,7 +869,7 @@ class LEditsPPPipelineStableDiffusionXL( ...@@ -869,7 +869,7 @@ class LEditsPPPipelineStableDiffusionXL(
of a plain tuple. of a plain tuple.
callback (`Callable`, *optional*): callback (`Callable`, *optional*):
A function that will be called every `callback_steps` steps during inference. The function will be A function that will be called every `callback_steps` steps during inference. The function will be
called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`. called with the following arguments: `callback(step: int, timestep: int, latents: torch.Tensor)`.
callback_steps (`int`, *optional*, defaults to 1): callback_steps (`int`, *optional*, defaults to 1):
The frequency at which the `callback` function will be called. If not specified, the callback will be The frequency at which the `callback` function will be called. If not specified, the callback will be
called at every step. called at every step.
......
...@@ -120,8 +120,8 @@ class MusicLDMPipeline(DiffusionPipeline, StableDiffusionMixin): ...@@ -120,8 +120,8 @@ class MusicLDMPipeline(DiffusionPipeline, StableDiffusionMixin):
num_waveforms_per_prompt, num_waveforms_per_prompt,
do_classifier_free_guidance, do_classifier_free_guidance,
negative_prompt=None, negative_prompt=None,
prompt_embeds: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.Tensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.Tensor] = None,
): ):
r""" r"""
Encodes the prompt into text encoder hidden states. Encodes the prompt into text encoder hidden states.
...@@ -139,10 +139,10 @@ class MusicLDMPipeline(DiffusionPipeline, StableDiffusionMixin): ...@@ -139,10 +139,10 @@ class MusicLDMPipeline(DiffusionPipeline, StableDiffusionMixin):
The prompt or prompts not to guide the audio generation. If not defined, one has to pass The prompt or prompts not to guide the audio generation. If not defined, one has to pass
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is
less than `1`). less than `1`).
prompt_embeds (`torch.FloatTensor`, *optional*): prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
provided, text embeddings will be generated from `prompt` input argument. provided, text embeddings will be generated from `prompt` input argument.
negative_prompt_embeds (`torch.FloatTensor`, *optional*): negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
argument. argument.
...@@ -427,11 +427,11 @@ class MusicLDMPipeline(DiffusionPipeline, StableDiffusionMixin): ...@@ -427,11 +427,11 @@ class MusicLDMPipeline(DiffusionPipeline, StableDiffusionMixin):
num_waveforms_per_prompt: Optional[int] = 1, num_waveforms_per_prompt: Optional[int] = 1,
eta: float = 0.0, eta: float = 0.0,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None, latents: Optional[torch.Tensor] = None,
prompt_embeds: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.Tensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.Tensor] = None,
return_dict: bool = True, return_dict: bool = True,
callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None, callback: Optional[Callable[[int, int, torch.Tensor], None]] = None,
callback_steps: Optional[int] = 1, callback_steps: Optional[int] = 1,
cross_attention_kwargs: Optional[Dict[str, Any]] = None, cross_attention_kwargs: Optional[Dict[str, Any]] = None,
output_type: Optional[str] = "np", output_type: Optional[str] = "np",
...@@ -465,21 +465,21 @@ class MusicLDMPipeline(DiffusionPipeline, StableDiffusionMixin): ...@@ -465,21 +465,21 @@ class MusicLDMPipeline(DiffusionPipeline, StableDiffusionMixin):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*): generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make
generation deterministic. generation deterministic.
latents (`torch.FloatTensor`, *optional*): latents (`torch.Tensor`, *optional*):
Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor is generated by sampling using the supplied random `generator`. tensor is generated by sampling using the supplied random `generator`.
prompt_embeds (`torch.FloatTensor`, *optional*): prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not
provided, text embeddings are generated from the `prompt` input argument. provided, text embeddings are generated from the `prompt` input argument.
negative_prompt_embeds (`torch.FloatTensor`, *optional*): negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If
not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument. not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument.
return_dict (`bool`, *optional*, defaults to `True`): return_dict (`bool`, *optional*, defaults to `True`):
Whether or not to return a [`~pipelines.AudioPipelineOutput`] instead of a plain tuple. Whether or not to return a [`~pipelines.AudioPipelineOutput`] instead of a plain tuple.
callback (`Callable`, *optional*): callback (`Callable`, *optional*):
A function that calls every `callback_steps` steps during inference. The function is called with the A function that calls every `callback_steps` steps during inference. The function is called with the
following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`. following arguments: `callback(step: int, timestep: int, latents: torch.Tensor)`.
callback_steps (`int`, *optional*, defaults to 1): callback_steps (`int`, *optional*, defaults to 1):
The frequency at which the `callback` function is called. If not specified, the callback is called at The frequency at which the `callback` function is called. If not specified, the callback is called at
every step. every step.
......
...@@ -266,7 +266,7 @@ class PaintByExamplePipeline(DiffusionPipeline, StableDiffusionMixin): ...@@ -266,7 +266,7 @@ class PaintByExamplePipeline(DiffusionPipeline, StableDiffusionMixin):
and not isinstance(image, list) and not isinstance(image, list)
): ):
raise ValueError( raise ValueError(
"`image` has to be of type `torch.FloatTensor` or `PIL.Image.Image` or `List[PIL.Image.Image]` but is" "`image` has to be of type `torch.Tensor` or `PIL.Image.Image` or `List[PIL.Image.Image]` but is"
f" {type(image)}" f" {type(image)}"
) )
...@@ -393,9 +393,9 @@ class PaintByExamplePipeline(DiffusionPipeline, StableDiffusionMixin): ...@@ -393,9 +393,9 @@ class PaintByExamplePipeline(DiffusionPipeline, StableDiffusionMixin):
@torch.no_grad() @torch.no_grad()
def __call__( def __call__(
self, self,
example_image: Union[torch.FloatTensor, PIL.Image.Image], example_image: Union[torch.Tensor, PIL.Image.Image],
image: Union[torch.FloatTensor, PIL.Image.Image], image: Union[torch.Tensor, PIL.Image.Image],
mask_image: Union[torch.FloatTensor, PIL.Image.Image], mask_image: Union[torch.Tensor, PIL.Image.Image],
height: Optional[int] = None, height: Optional[int] = None,
width: Optional[int] = None, width: Optional[int] = None,
num_inference_steps: int = 50, num_inference_steps: int = 50,
...@@ -404,22 +404,22 @@ class PaintByExamplePipeline(DiffusionPipeline, StableDiffusionMixin): ...@@ -404,22 +404,22 @@ class PaintByExamplePipeline(DiffusionPipeline, StableDiffusionMixin):
num_images_per_prompt: Optional[int] = 1, num_images_per_prompt: Optional[int] = 1,
eta: float = 0.0, eta: float = 0.0,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None, latents: Optional[torch.Tensor] = None,
output_type: Optional[str] = "pil", output_type: Optional[str] = "pil",
return_dict: bool = True, return_dict: bool = True,
callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None, callback: Optional[Callable[[int, int, torch.Tensor], None]] = None,
callback_steps: int = 1, callback_steps: int = 1,
): ):
r""" r"""
The call function to the pipeline for generation. The call function to the pipeline for generation.
Args: Args:
example_image (`torch.FloatTensor` or `PIL.Image.Image` or `List[PIL.Image.Image]`): example_image (`torch.Tensor` or `PIL.Image.Image` or `List[PIL.Image.Image]`):
An example image to guide image generation. An example image to guide image generation.
image (`torch.FloatTensor` or `PIL.Image.Image` or `List[PIL.Image.Image]`): image (`torch.Tensor` or `PIL.Image.Image` or `List[PIL.Image.Image]`):
`Image` or tensor representing an image batch to be inpainted (parts of the image are masked out with `Image` or tensor representing an image batch to be inpainted (parts of the image are masked out with
`mask_image` and repainted according to `prompt`). `mask_image` and repainted according to `prompt`).
mask_image (`torch.FloatTensor` or `PIL.Image.Image` or `List[PIL.Image.Image]`): mask_image (`torch.Tensor` or `PIL.Image.Image` or `List[PIL.Image.Image]`):
`Image` or tensor representing an image batch to mask `image`. White pixels in the mask are repainted, `Image` or tensor representing an image batch to mask `image`. White pixels in the mask are repainted,
while black pixels are preserved. If `mask_image` is a PIL image, it is converted to a single channel while black pixels are preserved. If `mask_image` is a PIL image, it is converted to a single channel
(luminance) before use. If it's a tensor, it should contain one color channel (L) instead of 3, so the (luminance) before use. If it's a tensor, it should contain one color channel (L) instead of 3, so the
...@@ -445,7 +445,7 @@ class PaintByExamplePipeline(DiffusionPipeline, StableDiffusionMixin): ...@@ -445,7 +445,7 @@ class PaintByExamplePipeline(DiffusionPipeline, StableDiffusionMixin):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*): generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make
generation deterministic. generation deterministic.
latents (`torch.FloatTensor`, *optional*): latents (`torch.Tensor`, *optional*):
Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor is generated by sampling using the supplied random `generator`. tensor is generated by sampling using the supplied random `generator`.
...@@ -456,7 +456,7 @@ class PaintByExamplePipeline(DiffusionPipeline, StableDiffusionMixin): ...@@ -456,7 +456,7 @@ class PaintByExamplePipeline(DiffusionPipeline, StableDiffusionMixin):
plain tuple. plain tuple.
callback (`Callable`, *optional*): callback (`Callable`, *optional*):
A function that calls every `callback_steps` steps during inference. The function is called with the A function that calls every `callback_steps` steps during inference. The function is called with the
following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`. following arguments: `callback(step: int, timestep: int, latents: torch.Tensor)`.
callback_steps (`int`, *optional*, defaults to 1): callback_steps (`int`, *optional*, defaults to 1):
The frequency at which the `callback` function is called. If not specified, the callback is called at The frequency at which the `callback` function is called. If not specified, the callback is called at
every step. every step.
......
...@@ -207,8 +207,8 @@ class PIAPipeline( ...@@ -207,8 +207,8 @@ class PIAPipeline(
num_images_per_prompt, num_images_per_prompt,
do_classifier_free_guidance, do_classifier_free_guidance,
negative_prompt=None, negative_prompt=None,
prompt_embeds: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.Tensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.Tensor] = None,
lora_scale: Optional[float] = None, lora_scale: Optional[float] = None,
clip_skip: Optional[int] = None, clip_skip: Optional[int] = None,
): ):
...@@ -228,10 +228,10 @@ class PIAPipeline( ...@@ -228,10 +228,10 @@ class PIAPipeline(
The prompt or prompts not to guide the image generation. If not defined, one has to pass The prompt or prompts not to guide the image generation. If not defined, one has to pass
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is
less than `1`). less than `1`).
prompt_embeds (`torch.FloatTensor`, *optional*): prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
provided, text embeddings will be generated from `prompt` input argument. provided, text embeddings will be generated from `prompt` input argument.
negative_prompt_embeds (`torch.FloatTensor`, *optional*): negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
argument. argument.
...@@ -680,11 +680,11 @@ class PIAPipeline( ...@@ -680,11 +680,11 @@ class PIAPipeline(
num_videos_per_prompt: Optional[int] = 1, num_videos_per_prompt: Optional[int] = 1,
eta: float = 0.0, eta: float = 0.0,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None, latents: Optional[torch.Tensor] = None,
prompt_embeds: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.Tensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.Tensor] = None,
ip_adapter_image: Optional[PipelineImageInput] = None, ip_adapter_image: Optional[PipelineImageInput] = None,
ip_adapter_image_embeds: Optional[List[torch.FloatTensor]] = None, ip_adapter_image_embeds: Optional[List[torch.Tensor]] = None,
motion_scale: int = 0, motion_scale: int = 0,
output_type: Optional[str] = "pil", output_type: Optional[str] = "pil",
return_dict: bool = True, return_dict: bool = True,
...@@ -725,20 +725,20 @@ class PIAPipeline( ...@@ -725,20 +725,20 @@ class PIAPipeline(
generator (`torch.Generator` or `List[torch.Generator]`, *optional*): generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make
generation deterministic. generation deterministic.
latents (`torch.FloatTensor`, *optional*): latents (`torch.Tensor`, *optional*):
Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for video Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for video
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor is generated by sampling using the supplied random `generator`. Latents should be of shape tensor is generated by sampling using the supplied random `generator`. Latents should be of shape
`(batch_size, num_channel, num_frames, height, width)`. `(batch_size, num_channel, num_frames, height, width)`.
prompt_embeds (`torch.FloatTensor`, *optional*): prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not
provided, text embeddings are generated from the `prompt` input argument. provided, text embeddings are generated from the `prompt` input argument.
negative_prompt_embeds (`torch.FloatTensor`, *optional*): negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If
not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument. not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument.
ip_adapter_image: (`PipelineImageInput`, *optional*): ip_adapter_image: (`PipelineImageInput`, *optional*):
Optional image input to work with IP Adapters. Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*): ip_adapter_image_embeds (`List[torch.Tensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of
IP-adapters. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should IP-adapters. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should
contain the negative image embedding if `do_classifier_free_guidance` is set to `True`. If not contain the negative image embedding if `do_classifier_free_guidance` is set to `True`. If not
...@@ -749,8 +749,7 @@ class PIAPipeline( ...@@ -749,8 +749,7 @@ class PIAPipeline(
added. Must be between 0 and 8. Set between 0-2 to only increase the amount of motion. Set between 3-5 added. Must be between 0 and 8. Set between 0-2 to only increase the amount of motion. Set between 3-5
to create looping motion. Set between 6-8 to perform motion with image style transfer. to create looping motion. Set between 6-8 to perform motion with image style transfer.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generated video. Choose between `torch.FloatTensor`, `PIL.Image` or The output format of the generated video. Choose between `torch.Tensor`, `PIL.Image` or `np.array`.
`np.array`.
return_dict (`bool`, *optional*, defaults to `True`): return_dict (`bool`, *optional*, defaults to `True`):
Whether or not to return a [`~pipelines.text_to_video_synthesis.TextToVideoSDPipelineOutput`] instead Whether or not to return a [`~pipelines.text_to_video_synthesis.TextToVideoSDPipelineOutput`] instead
of a plain tuple. of a plain tuple.
......
...@@ -296,10 +296,10 @@ class PixArtAlphaPipeline(DiffusionPipeline): ...@@ -296,10 +296,10 @@ class PixArtAlphaPipeline(DiffusionPipeline):
negative_prompt: str = "", negative_prompt: str = "",
num_images_per_prompt: int = 1, num_images_per_prompt: int = 1,
device: Optional[torch.device] = None, device: Optional[torch.device] = None,
prompt_embeds: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.Tensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.Tensor] = None,
prompt_attention_mask: Optional[torch.FloatTensor] = None, prompt_attention_mask: Optional[torch.Tensor] = None,
negative_prompt_attention_mask: Optional[torch.FloatTensor] = None, negative_prompt_attention_mask: Optional[torch.Tensor] = None,
clean_caption: bool = False, clean_caption: bool = False,
max_sequence_length: int = 120, max_sequence_length: int = 120,
**kwargs, **kwargs,
...@@ -320,10 +320,10 @@ class PixArtAlphaPipeline(DiffusionPipeline): ...@@ -320,10 +320,10 @@ class PixArtAlphaPipeline(DiffusionPipeline):
number of images that should be generated per prompt number of images that should be generated per prompt
device: (`torch.device`, *optional*): device: (`torch.device`, *optional*):
torch device to place the resulting embeddings on torch device to place the resulting embeddings on
prompt_embeds (`torch.FloatTensor`, *optional*): prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
provided, text embeddings will be generated from `prompt` input argument. provided, text embeddings will be generated from `prompt` input argument.
negative_prompt_embeds (`torch.FloatTensor`, *optional*): negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. For PixArt-Alpha, it's should be the embeddings of the "" Pre-generated negative text embeddings. For PixArt-Alpha, it's should be the embeddings of the ""
string. string.
clean_caption (`bool`, defaults to `False`): clean_caption (`bool`, defaults to `False`):
...@@ -694,14 +694,14 @@ class PixArtAlphaPipeline(DiffusionPipeline): ...@@ -694,14 +694,14 @@ class PixArtAlphaPipeline(DiffusionPipeline):
width: Optional[int] = None, width: Optional[int] = None,
eta: float = 0.0, eta: float = 0.0,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None, latents: Optional[torch.Tensor] = None,
prompt_embeds: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.Tensor] = None,
prompt_attention_mask: Optional[torch.FloatTensor] = None, prompt_attention_mask: Optional[torch.Tensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.Tensor] = None,
negative_prompt_attention_mask: Optional[torch.FloatTensor] = None, negative_prompt_attention_mask: Optional[torch.Tensor] = None,
output_type: Optional[str] = "pil", output_type: Optional[str] = "pil",
return_dict: bool = True, return_dict: bool = True,
callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None, callback: Optional[Callable[[int, int, torch.Tensor], None]] = None,
callback_steps: int = 1, callback_steps: int = 1,
clean_caption: bool = True, clean_caption: bool = True,
use_resolution_binning: bool = True, use_resolution_binning: bool = True,
...@@ -748,18 +748,18 @@ class PixArtAlphaPipeline(DiffusionPipeline): ...@@ -748,18 +748,18 @@ class PixArtAlphaPipeline(DiffusionPipeline):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*): generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
to make generation deterministic. to make generation deterministic.
latents (`torch.FloatTensor`, *optional*): latents (`torch.Tensor`, *optional*):
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor will ge generated by sampling using the supplied random `generator`. tensor will ge generated by sampling using the supplied random `generator`.
prompt_embeds (`torch.FloatTensor`, *optional*): prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
provided, text embeddings will be generated from `prompt` input argument. provided, text embeddings will be generated from `prompt` input argument.
prompt_attention_mask (`torch.FloatTensor`, *optional*): Pre-generated attention mask for text embeddings. prompt_attention_mask (`torch.Tensor`, *optional*): Pre-generated attention mask for text embeddings.
negative_prompt_embeds (`torch.FloatTensor`, *optional*): negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. For PixArt-Alpha this negative prompt should be "". If not Pre-generated negative text embeddings. For PixArt-Alpha this negative prompt should be "". If not
provided, negative_prompt_embeds will be generated from `negative_prompt` input argument. provided, negative_prompt_embeds will be generated from `negative_prompt` input argument.
negative_prompt_attention_mask (`torch.FloatTensor`, *optional*): negative_prompt_attention_mask (`torch.Tensor`, *optional*):
Pre-generated attention mask for negative text embeddings. Pre-generated attention mask for negative text embeddings.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generate image. Choose between The output format of the generate image. Choose between
...@@ -768,7 +768,7 @@ class PixArtAlphaPipeline(DiffusionPipeline): ...@@ -768,7 +768,7 @@ class PixArtAlphaPipeline(DiffusionPipeline):
Whether or not to return a [`~pipelines.stable_diffusion.IFPipelineOutput`] instead of a plain tuple. Whether or not to return a [`~pipelines.stable_diffusion.IFPipelineOutput`] instead of a plain tuple.
callback (`Callable`, *optional*): callback (`Callable`, *optional*):
A function that will be called every `callback_steps` steps during inference. The function will be A function that will be called every `callback_steps` steps during inference. The function will be
called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`. called with the following arguments: `callback(step: int, timestep: int, latents: torch.Tensor)`.
callback_steps (`int`, *optional*, defaults to 1): callback_steps (`int`, *optional*, defaults to 1):
The frequency at which the `callback` function will be called. If not specified, the callback will be The frequency at which the `callback` function will be called. If not specified, the callback will be
called at every step. called at every step.
......
...@@ -222,10 +222,10 @@ class PixArtSigmaPipeline(DiffusionPipeline): ...@@ -222,10 +222,10 @@ class PixArtSigmaPipeline(DiffusionPipeline):
negative_prompt: str = "", negative_prompt: str = "",
num_images_per_prompt: int = 1, num_images_per_prompt: int = 1,
device: Optional[torch.device] = None, device: Optional[torch.device] = None,
prompt_embeds: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.Tensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.Tensor] = None,
prompt_attention_mask: Optional[torch.FloatTensor] = None, prompt_attention_mask: Optional[torch.Tensor] = None,
negative_prompt_attention_mask: Optional[torch.FloatTensor] = None, negative_prompt_attention_mask: Optional[torch.Tensor] = None,
clean_caption: bool = False, clean_caption: bool = False,
max_sequence_length: int = 120, max_sequence_length: int = 120,
**kwargs, **kwargs,
...@@ -246,10 +246,10 @@ class PixArtSigmaPipeline(DiffusionPipeline): ...@@ -246,10 +246,10 @@ class PixArtSigmaPipeline(DiffusionPipeline):
number of images that should be generated per prompt number of images that should be generated per prompt
device: (`torch.device`, *optional*): device: (`torch.device`, *optional*):
torch device to place the resulting embeddings on torch device to place the resulting embeddings on
prompt_embeds (`torch.FloatTensor`, *optional*): prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
provided, text embeddings will be generated from `prompt` input argument. provided, text embeddings will be generated from `prompt` input argument.
negative_prompt_embeds (`torch.FloatTensor`, *optional*): negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. For PixArt-Alpha, it's should be the embeddings of the "" Pre-generated negative text embeddings. For PixArt-Alpha, it's should be the embeddings of the ""
string. string.
clean_caption (`bool`, defaults to `False`): clean_caption (`bool`, defaults to `False`):
...@@ -621,14 +621,14 @@ class PixArtSigmaPipeline(DiffusionPipeline): ...@@ -621,14 +621,14 @@ class PixArtSigmaPipeline(DiffusionPipeline):
width: Optional[int] = None, width: Optional[int] = None,
eta: float = 0.0, eta: float = 0.0,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None, latents: Optional[torch.Tensor] = None,
prompt_embeds: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.Tensor] = None,
prompt_attention_mask: Optional[torch.FloatTensor] = None, prompt_attention_mask: Optional[torch.Tensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.Tensor] = None,
negative_prompt_attention_mask: Optional[torch.FloatTensor] = None, negative_prompt_attention_mask: Optional[torch.Tensor] = None,
output_type: Optional[str] = "pil", output_type: Optional[str] = "pil",
return_dict: bool = True, return_dict: bool = True,
callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None, callback: Optional[Callable[[int, int, torch.Tensor], None]] = None,
callback_steps: int = 1, callback_steps: int = 1,
clean_caption: bool = True, clean_caption: bool = True,
use_resolution_binning: bool = True, use_resolution_binning: bool = True,
...@@ -675,18 +675,18 @@ class PixArtSigmaPipeline(DiffusionPipeline): ...@@ -675,18 +675,18 @@ class PixArtSigmaPipeline(DiffusionPipeline):
generator (`torch.Generator` or `List[torch.Generator]`, *optional*): generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
to make generation deterministic. to make generation deterministic.
latents (`torch.FloatTensor`, *optional*): latents (`torch.Tensor`, *optional*):
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor will ge generated by sampling using the supplied random `generator`. tensor will ge generated by sampling using the supplied random `generator`.
prompt_embeds (`torch.FloatTensor`, *optional*): prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
provided, text embeddings will be generated from `prompt` input argument. provided, text embeddings will be generated from `prompt` input argument.
prompt_attention_mask (`torch.FloatTensor`, *optional*): Pre-generated attention mask for text embeddings. prompt_attention_mask (`torch.Tensor`, *optional*): Pre-generated attention mask for text embeddings.
negative_prompt_embeds (`torch.FloatTensor`, *optional*): negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. For PixArt-Sigma this negative prompt should be "". If not Pre-generated negative text embeddings. For PixArt-Sigma this negative prompt should be "". If not
provided, negative_prompt_embeds will be generated from `negative_prompt` input argument. provided, negative_prompt_embeds will be generated from `negative_prompt` input argument.
negative_prompt_attention_mask (`torch.FloatTensor`, *optional*): negative_prompt_attention_mask (`torch.Tensor`, *optional*):
Pre-generated attention mask for negative text embeddings. Pre-generated attention mask for negative text embeddings.
output_type (`str`, *optional*, defaults to `"pil"`): output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generate image. Choose between The output format of the generate image. Choose between
...@@ -695,7 +695,7 @@ class PixArtSigmaPipeline(DiffusionPipeline): ...@@ -695,7 +695,7 @@ class PixArtSigmaPipeline(DiffusionPipeline):
Whether or not to return a [`~pipelines.stable_diffusion.IFPipelineOutput`] instead of a plain tuple. Whether or not to return a [`~pipelines.stable_diffusion.IFPipelineOutput`] instead of a plain tuple.
callback (`Callable`, *optional*): callback (`Callable`, *optional*):
A function that will be called every `callback_steps` steps during inference. The function will be A function that will be called every `callback_steps` steps during inference. The function will be
called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`. called with the following arguments: `callback(step: int, timestep: int, latents: torch.Tensor)`.
callback_steps (`int`, *optional*, defaults to 1): callback_steps (`int`, *optional*, defaults to 1):
The frequency at which the `callback` function will be called. If not specified, the callback will be The frequency at which the `callback` function will be called. If not specified, the callback will be
called at every step. called at every step.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment