"prompt_template_encode":"<|im_start|>system\nDescribe the key features of the input image (color, shape, size, texture, objects, background), then explain how the user's text instruction should alter or modify the image. Generate a new image that meets the user's requirements while maintaining consistency with the original input where appropriate.<|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>{}<|im_end|>\n<|im_start|>assistant\n",
"prompt_template_encode":"<|im_start|>system\nDescribe the key features of the input image (color, shape, size, texture, objects, background), then explain how the user's text instruction should alter or modify the image. Generate a new image that meets the user's requirements while maintaining consistency with the original input where appropriate.<|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>{}<|im_end|>\n<|im_start|>assistant\n",
"prompt_template_encode":"<|im_start|>system\nDescribe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|>\n<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n",
deprecation_message="The `scale` argument is deprecated and will be ignored. Please remove it, as passing it will raise an error in the future. `scale` should directly be passed while calling the underlying pipeline component i.e., via `cross_attention_kwargs`."
deprecate("scale","1.0.0",deprecation_message)
hidden_states=self.proj(hidden_states)
ifis_torch_npu_available():
# using torch_npu.npu_geglu can run faster and save memory on NPU.
# "feed_forward_chunk_size" can be used to save memory
ifhidden_states.shape[chunk_dim]%chunk_size!=0:
raiseValueError(
f"`hidden_states` dimension to be chunked: {hidden_states.shape[chunk_dim]} has to be divisible by chunk size: {chunk_size}. Make sure to set an appropriate `chunk_size` when calling `unet.enable_forward_chunking`."
raiseValueError(f"`norm_type` is set to {norm_type}, but `num_embeds_ada_norm` is not defined. Please make sure to define `num_embeds_ada_norm` if setting `norm_type` to {norm_type}.")
raiseValueError(f"`norm_type` is set to {norm_type}, but `num_embeds_ada_norm` is not defined. Please make sure to define `num_embeds_ada_norm` if setting `norm_type` to {norm_type}.")
deprecation_message="The `scale` argument is deprecated and will be ignored. Please remove it, as passing it will raise an error in the future. `scale` should directly be passed while calling the underlying pipeline component i.e., via `cross_attention_kwargs`."
raiseRuntimeError(f"Flash Attention backend '{backend.value}' is not usable because of missing package or the version is too old. Please install `flash-attn>={_REQUIRED_FLASH_VERSION}`.")
raiseRuntimeError(f"Flash Attention 3 backend '{backend.value}' is not usable because of missing package or the version is too old. Please build FA3 beta release from source.")
f"Sage Attention backend '{backend.value}' is not usable because of missing package or the version is too old. Please install `sageattention>={_REQUIRED_SAGE_VERSION}`."
)
elifbackend==AttentionBackendName.FLEX:
ifnot_CAN_USE_FLEX_ATTN:
raiseRuntimeError(f"Flex Attention backend '{backend.value}' is not usable because of missing package or the version is too old. Please install `torch>=2.5.0`.")
elifbackend==AttentionBackendName._NATIVE_NPU:
ifnot_CAN_USE_NPU_ATTN:
raiseRuntimeError(f"NPU Attention backend '{backend.value}' is not usable because of missing package or the version is too old. Please install `torch_npu`.")
elifbackend==AttentionBackendName._NATIVE_XLA:
ifnot_CAN_USE_XLA_ATTN:
raiseRuntimeError(f"XLA Attention backend '{backend.value}' is not usable because of missing package or the version is too old. Please install `torch_xla>={_REQUIRED_XLA_VERSION}`.")
elifbackend==AttentionBackendName.XFORMERS:
ifnot_CAN_USE_XFORMERS_ATTN:
raiseRuntimeError(
f"Xformers Attention backend '{backend.value}' is not usable because of missing package or the version is too old. Please install `xformers>={_REQUIRED_XFORMERS_VERSION}`."
The embedding dimension of inputs. It must be divisible by 16.
spatial_size (`int` or `Tuple[int, int]`):
The spatial dimension of positional embeddings. If an integer is provided, the same size is applied to both
spatial dimensions (height and width).
temporal_size (`int`):
The temporal dimension of positional embeddings (number of frames).
spatial_interpolation_scale (`float`, defaults to 1.0):
Scale factor for spatial grid interpolation.
temporal_interpolation_scale (`float`, defaults to 1.0):
Scale factor for temporal grid interpolation.
Returns:
`np.ndarray`:
The 3D sinusoidal positional embeddings of shape `[temporal_size, spatial_size[0] * spatial_size[1],
embed_dim]`.
"""
deprecation_message="`get_3d_sincos_pos_embed` uses `torch` and supports `device`. `from_numpy` is no longer required. Pass `output_type='pt' to use the new version now."
Shape is either `[grid_size * grid_size, embed_dim]` if not using cls_token, or `[1 + grid_size*grid_size,
embed_dim]` if using cls_token
"""
ifoutput_type=="np":
deprecation_message="`get_2d_sincos_pos_embed` uses `torch` and supports `device`. `from_numpy` is no longer required. Pass `output_type='pt' to use the new version now."
This function generates 2D sinusoidal positional embeddings from a grid.
Args:
embed_dim (`int`): The embedding dimension.
grid (`torch.Tensor`): Grid of positions with shape `(H * W,)`.
Returns:
`torch.Tensor`: The 2D sinusoidal positional embeddings with shape `(H * W, embed_dim)`
"""
ifoutput_type=="np":
deprecation_message="`get_2d_sincos_pos_embed_from_grid` uses `torch` and supports `device`. `from_numpy` is no longer required. Pass `output_type='pt' to use the new version now."
This function generates 1D positional embeddings from a grid.
Args:
embed_dim (`int`): The embedding dimension `D`
pos (`torch.Tensor`): 1D tensor of positions with shape `(M,)`
Returns:
`torch.Tensor`: Sinusoidal positional embeddings of shape `(M, D)`.
"""
ifoutput_type=="np":
deprecation_message="`get_1d_sincos_pos_embed_from_grid` uses `torch` and supports `device`. `from_numpy` is no longer required. Pass `output_type='pt' to use the new version now."
"It is currently not possible to generate videos at a different resolution that the defaults. This should only be the case with 'THUDM/CogVideoX-5b-I2V'."
"If you think this is incorrect, please open an issue at https://github.com/huggingface/diffusers/issues."
The top-left and bottom-right coordinates of the crop.
grid_size (`Tuple[int]`):
The grid size of the positional embedding.
use_real (`bool`):
If True, return real part and imaginary part separately. Otherwise, return complex numbers.
device: (`torch.device`, **optional**):
The device used to create tensors.
Returns:
`torch.Tensor`: positional embedding with shape `( grid_size * grid_size, embed_dim/2)`.
"""
ifoutput_type=="np":
deprecation_message="`get_2d_sincos_pos_embed` uses `torch` and supports `device`. `from_numpy` is no longer required. Pass `output_type='pt' to use the new version now."
# 1. a tensor (deprecated) with shape [batch_size, embed_dim] or [batch_size, sequence_length, embed_dim]
# 2. list of `n` tensors where `n` is number of ip-adapters, each tensor can hae shape [batch_size, num_images, embed_dim] or [batch_size, num_images, sequence_length, embed_dim]
ifnotisinstance(image_embeds,list):
deprecation_message=(
"You have passed a tensor as `image_embeds`.This is deprecated and will be removed in a future release."
" Please make sure to update your script to pass `image_embeds` as a list of tensors to suppress this warning."
)
deprecate("image_embeds not a list","1.0.0",deprecation_message,standard_warn=False)
deprecation_message="Importing and using `FluxPosEmbed` from `diffusers.models.embeddings` is deprecated. Please import it from `diffusers.models.transformers.transformer_flux`."