Unverified Commit 6a05b274 authored by M. Tolga Cangöz's avatar M. Tolga Cangöz Committed by GitHub
Browse files

Fix Typos (#7325)

* Fix PyTorch's convention for inplace functions

* Fix import structure in __init__.py and update config loading logic in test_config.py

* Update configuration access

* Fix typos

* Trim trailing white spaces

* Fix typo in logger name

* Revert "Fix PyTorch's convention for inplace functions"

This reverts commit f65dc4afcb57ceb43d5d06389229d47bafb10d2d.

* Fix typo in step_index property description

* Revert "Update configuration access"

This reverts commit 8d44e870b8c1ad08802e3e904c34baeca1b598f8.

* Revert "Fix import structure in __init__.py and update config loading logic in test_config.py"

This reverts commit 2ad5e8bca25aede3b912da22bd57285b598fe171.

* Fix typos

* Fix typos

* Fix typos

* Fix a typo: tranform -> transform
parent 98d46a3f
...@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License. ...@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# 메모리와 속도 # 메모리와 속도
메모리 또는 속도에 대해 🤗 Diffusers *추론*을 최적화하기 위한 몇 가지 기술과 아이디어를 제시합니다. 메모리 또는 속도에 대해 🤗 Diffusers *추론*을 최적화하기 위한 몇 가지 기술과 아이디어를 제시합니다.
일반적으로, memory-efficient attention을 위해 [xFormers](https://github.com/facebookresearch/xformers) 사용을 추천하기 때문에, 추천하는 [설치 방법](xformers)을 보고 설치해 보세요. 일반적으로, memory-efficient attention을 위해 [xFormers](https://github.com/facebookresearch/xformers) 사용을 추천하기 때문에, 추천하는 [설치 방법](xformers)을 보고 설치해 보세요.
다음 설정이 성능과 메모리에 미치는 영향에 대해 설명합니다. 다음 설정이 성능과 메모리에 미치는 영향에 대해 설명합니다.
...@@ -27,7 +27,7 @@ specific language governing permissions and limitations under the License. ...@@ -27,7 +27,7 @@ specific language governing permissions and limitations under the License.
| memory-efficient attention | 2.63s | x3.61 | | memory-efficient attention | 2.63s | x3.61 |
<em> <em>
NVIDIA TITAN RTX에서 50 DDIM 스텝의 "a photo of an astronaut riding a horse on mars" 프롬프트로 512x512 크기의 단일 이미지를 생성하였습니다. NVIDIA TITAN RTX에서 50 DDIM 스텝의 "a photo of an astronaut riding a horse on mars" 프롬프트로 512x512 크기의 단일 이미지를 생성하였습니다.
</em> </em>
## cuDNN auto-tuner 활성화하기 ## cuDNN auto-tuner 활성화하기
...@@ -44,11 +44,11 @@ torch.backends.cudnn.benchmark = True ...@@ -44,11 +44,11 @@ torch.backends.cudnn.benchmark = True
### fp32 대신 tf32 사용하기 (Ampere 및 이후 CUDA 장치들에서) ### fp32 대신 tf32 사용하기 (Ampere 및 이후 CUDA 장치들에서)
Ampere 및 이후 CUDA 장치에서 행렬곱 및 컨볼루션은 TensorFloat32(TF32) 모드를 사용하여 더 빠르지만 약간 덜 정확할 수 있습니다. Ampere 및 이후 CUDA 장치에서 행렬곱 및 컨볼루션은 TensorFloat32(TF32) 모드를 사용하여 더 빠르지만 약간 덜 정확할 수 있습니다.
기본적으로 PyTorch는 컨볼루션에 대해 TF32 모드를 활성화하지만 행렬 곱셈은 활성화하지 않습니다. 기본적으로 PyTorch는 컨볼루션에 대해 TF32 모드를 활성화하지만 행렬 곱셈은 활성화하지 않습니다.
네트워크에 완전한 float32 정밀도가 필요한 경우가 아니면 행렬 곱셈에 대해서도 이 설정을 활성화하는 것이 좋습니다. 네트워크에 완전한 float32 정밀도가 필요한 경우가 아니면 행렬 곱셈에 대해서도 이 설정을 활성화하는 것이 좋습니다.
이는 일반적으로 무시할 수 있는 수치의 정확도 손실이 있지만, 계산 속도를 크게 높일 수 있습니다. 이는 일반적으로 무시할 수 있는 수치의 정확도 손실이 있지만, 계산 속도를 크게 높일 수 있습니다.
그것에 대해 [여기](https://huggingface.co/docs/transformers/v4.18.0/en/performance#tf32)서 더 읽을 수 있습니다. 그것에 대해 [여기](https://huggingface.co/docs/transformers/v4.18.0/en/performance#tf32)서 더 읽을 수 있습니다.
추론하기 전에 다음을 추가하기만 하면 됩니다: 추론하기 전에 다음을 추가하기만 하면 됩니다:
```python ```python
...@@ -59,13 +59,13 @@ torch.backends.cuda.matmul.allow_tf32 = True ...@@ -59,13 +59,13 @@ torch.backends.cuda.matmul.allow_tf32 = True
## 반정밀도 가중치 ## 반정밀도 가중치
더 많은 GPU 메모리를 절약하고 더 빠른 속도를 얻기 위해 모델 가중치를 반정밀도(half precision)로 직접 불러오고 실행할 수 있습니다. 더 많은 GPU 메모리를 절약하고 더 빠른 속도를 얻기 위해 모델 가중치를 반정밀도(half precision)로 직접 불러오고 실행할 수 있습니다.
여기에는 `fp16`이라는 브랜치에 저장된 float16 버전의 가중치를 불러오고, 그 때 `float16` 유형을 사용하도록 PyTorch에 지시하는 작업이 포함됩니다. 여기에는 `fp16`이라는 브랜치에 저장된 float16 버전의 가중치를 불러오고, 그 때 `float16` 유형을 사용하도록 PyTorch에 지시하는 작업이 포함됩니다.
```Python ```Python
pipe = StableDiffusionPipeline.from_pretrained( pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", "runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
pipe = pipe.to("cuda") pipe = pipe.to("cuda")
...@@ -75,7 +75,7 @@ image = pipe(prompt).images[0] ...@@ -75,7 +75,7 @@ image = pipe(prompt).images[0]
``` ```
<Tip warning={true}> <Tip warning={true}>
어떤 파이프라인에서도 [`torch.autocast`](https://pytorch.org/docs/stable/amp.html#torch.autocast) 를 사용하는 것은 검은색 이미지를 생성할 수 있고, 순수한 float16 정밀도를 사용하는 것보다 항상 느리기 때문에 사용하지 않는 것이 좋습니다. 어떤 파이프라인에서도 [`torch.autocast`](https://pytorch.org/docs/stable/amp.html#torch.autocast) 를 사용하는 것은 검은색 이미지를 생성할 수 있고, 순수한 float16 정밀도를 사용하는 것보다 항상 느리기 때문에 사용하지 않는 것이 좋습니다.
</Tip> </Tip>
## 추가 메모리 절약을 위한 슬라이스 어텐션 ## 추가 메모리 절약을 위한 슬라이스 어텐션
...@@ -95,7 +95,7 @@ from diffusers import StableDiffusionPipeline ...@@ -95,7 +95,7 @@ from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained( pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", "runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
pipe = pipe.to("cuda") pipe = pipe.to("cuda")
...@@ -122,7 +122,7 @@ from diffusers import StableDiffusionPipeline ...@@ -122,7 +122,7 @@ from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained( pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", "runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
pipe = pipe.to("cuda") pipe = pipe.to("cuda")
...@@ -148,7 +148,7 @@ from diffusers import StableDiffusionPipeline ...@@ -148,7 +148,7 @@ from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained( pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", "runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
...@@ -165,7 +165,7 @@ image = pipe(prompt).images[0] ...@@ -165,7 +165,7 @@ image = pipe(prompt).images[0]
또 다른 최적화 방법인 <a href="#model_offloading">모델 오프로딩</a>을 사용하는 것을 고려하십시오. 이는 훨씬 빠르지만 메모리 절약이 크지는 않습니다. 또 다른 최적화 방법인 <a href="#model_offloading">모델 오프로딩</a>을 사용하는 것을 고려하십시오. 이는 훨씬 빠르지만 메모리 절약이 크지는 않습니다.
</Tip> </Tip>
또한 ttention slicing과 연결해서 최소 메모리(< 2GB)로도 동작할 수 있습니다. 또한 ttention slicing과 연결해서 최소 메모리(< 2GB)로도 동작할 수 있습니다.
```Python ```Python
...@@ -174,7 +174,7 @@ from diffusers import StableDiffusionPipeline ...@@ -174,7 +174,7 @@ from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained( pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", "runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
...@@ -204,7 +204,7 @@ import torch ...@@ -204,7 +204,7 @@ import torch
from diffusers import StableDiffusionPipeline from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained( pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", "runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
...@@ -387,7 +387,7 @@ with torch.inference_mode(): ...@@ -387,7 +387,7 @@ with torch.inference_mode():
| A100-SXM4-40GB | 18.6it/s | 29.it/s | | A100-SXM4-40GB | 18.6it/s | 29.it/s |
| A100-SXM-80GB | 18.7it/s | 29.5it/s | | A100-SXM-80GB | 18.7it/s | 29.5it/s |
이를 활용하려면 다음을 만족해야 합니다: 이를 활용하려면 다음을 만족해야 합니다:
- PyTorch > 1.12 - PyTorch > 1.12
- Cuda 사용 가능 - Cuda 사용 가능
- [xformers 라이브러리를 설치함](xformers) - [xformers 라이브러리를 설치함](xformers)
......
...@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License. ...@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.
[[open-in-colab]] [[open-in-colab]]
🧨 Diffusers는 사용자 친화적이며 유연한 도구 상자로, 사용사례에 맞게 diffusion 시스템을 구축 할 수 있도록 설계되었습니다. 이 도구 상자의 핵심은 모델과 스케줄러입니다. [`DiffusionPipeline`]은 편의를 위해 이러한 구성 요소를 번들로 제공하지만, 파이프라인을 분리하고 모델과 스케줄러를 개별적으로 사용해 새로운 diffusion 시스템을 만들 수도 있습니다. 🧨 Diffusers는 사용자 친화적이며 유연한 도구 상자로, 사용사례에 맞게 diffusion 시스템을 구축 할 수 있도록 설계되었습니다. 이 도구 상자의 핵심은 모델과 스케줄러입니다. [`DiffusionPipeline`]은 편의를 위해 이러한 구성 요소를 번들로 제공하지만, 파이프라인을 분리하고 모델과 스케줄러를 개별적으로 사용해 새로운 diffusion 시스템을 만들 수도 있습니다.
이 튜토리얼에서는 기본 파이프라인부터 시작해 Stable Diffusion 파이프라인까지 진행하며 모델과 스케줄러를 사용해 추론을 위한 diffusion 시스템을 조립하는 방법을 배웁니다. 이 튜토리얼에서는 기본 파이프라인부터 시작해 Stable Diffusion 파이프라인까지 진행하며 모델과 스케줄러를 사용해 추론을 위한 diffusion 시스템을 조립하는 방법을 배웁니다.
...@@ -36,7 +36,7 @@ specific language governing permissions and limitations under the License. ...@@ -36,7 +36,7 @@ specific language governing permissions and limitations under the License.
정말 쉽습니다. 그런데 파이프라인은 어떻게 이렇게 할 수 있었을까요? 파이프라인을 세분화하여 내부에서 어떤 일이 일어나고 있는지 살펴보겠습니다. 정말 쉽습니다. 그런데 파이프라인은 어떻게 이렇게 할 수 있었을까요? 파이프라인을 세분화하여 내부에서 어떤 일이 일어나고 있는지 살펴보겠습니다.
위 예시에서 파이프라인에는 [`UNet2DModel`] 모델과 [`DDPMScheduler`]가 포함되어 있습니다. 파이프라인은 원하는 출력 크기의 랜덤 노이즈를 받아 모델을 여러번 통과시켜 이미지의 노이즈를 제거합니다. 각 timestep에서 모델은 *noise residual*을 예측하고 스케줄러는 이를 사용하여 노이즈가 적은 이미지를 예측합니다. 파이프라인은 지정된 추론 스텝수에 도달할 때까지 이 과정을 반복합니다. 위 예시에서 파이프라인에는 [`UNet2DModel`] 모델과 [`DDPMScheduler`]가 포함되어 있습니다. 파이프라인은 원하는 출력 크기의 랜덤 노이즈를 받아 모델을 여러번 통과시켜 이미지의 노이즈를 제거합니다. 각 timestep에서 모델은 *noise residual*을 예측하고 스케줄러는 이를 사용하여 노이즈가 적은 이미지를 예측합니다. 파이프라인은 지정된 추론 스텝수에 도달할 때까지 이 과정을 반복합니다.
모델과 스케줄러를 별도로 사용하여 파이프라인을 다시 생성하기 위해 자체적인 노이즈 제거 프로세스를 작성해 보겠습니다. 모델과 스케줄러를 별도로 사용하여 파이프라인을 다시 생성하기 위해 자체적인 노이즈 제거 프로세스를 작성해 보겠습니다.
......
...@@ -440,7 +440,7 @@ def betas_for_alpha_bar( ...@@ -440,7 +440,7 @@ def betas_for_alpha_bar(
return math.exp(t * -12.0) return math.exp(t * -12.0)
else: else:
raise ValueError(f"Unsupported alpha_tranform_type: {alpha_transform_type}") raise ValueError(f"Unsupported alpha_transform_type: {alpha_transform_type}")
betas = [] betas = []
for i in range(num_diffusion_timesteps): for i in range(num_diffusion_timesteps):
......
...@@ -348,7 +348,7 @@ def betas_for_alpha_bar( ...@@ -348,7 +348,7 @@ def betas_for_alpha_bar(
return math.exp(t * -12.0) return math.exp(t * -12.0)
else: else:
raise ValueError(f"Unsupported alpha_tranform_type: {alpha_transform_type}") raise ValueError(f"Unsupported alpha_transform_type: {alpha_transform_type}")
betas = [] betas = []
for i in range(num_diffusion_timesteps): for i in range(num_diffusion_timesteps):
......
...@@ -206,7 +206,7 @@ def prepare_mask_and_masked_image(image, mask, height, width, return_image: bool ...@@ -206,7 +206,7 @@ def prepare_mask_and_masked_image(image, mask, height, width, return_image: bool
dimensions: ``batch x channels x height x width``. dimensions: ``batch x channels x height x width``.
""" """
# checkpoint. TOD(Yiyi) - need to clean this up later # checkpoint. #TODO(Yiyi) - need to clean this up later
if image is None: if image is None:
raise ValueError("`image` input cannot be undefined.") raise ValueError("`image` input cannot be undefined.")
...@@ -277,7 +277,7 @@ def prepare_mask_and_masked_image(image, mask, height, width, return_image: bool ...@@ -277,7 +277,7 @@ def prepare_mask_and_masked_image(image, mask, height, width, return_image: bool
# images are in latent space and thus can't # images are in latent space and thus can't
# be masked set masked_image to None # be masked set masked_image to None
# we assume that the checkpoint is not an inpainting # we assume that the checkpoint is not an inpainting
# checkpoint. TOD(Yiyi) - need to clean this up later # checkpoint. #TODO(Yiyi) - need to clean this up later
masked_image = None masked_image = None
else: else:
masked_image = image * (mask < 0.5) masked_image = image * (mask < 0.5)
......
...@@ -81,7 +81,7 @@ def betas_for_alpha_bar( ...@@ -81,7 +81,7 @@ def betas_for_alpha_bar(
return math.exp(t * -12.0) return math.exp(t * -12.0)
else: else:
raise ValueError(f"Unsupported alpha_tranform_type: {alpha_transform_type}") raise ValueError(f"Unsupported alpha_transform_type: {alpha_transform_type}")
betas = [] betas = []
for i in range(num_diffusion_timesteps): for i in range(num_diffusion_timesteps):
......
...@@ -424,7 +424,7 @@ class Attention(nn.Module): ...@@ -424,7 +424,7 @@ class Attention(nn.Module):
# If doesn't apply LoRA do `add_k_proj` or `add_v_proj` # If doesn't apply LoRA do `add_k_proj` or `add_v_proj`
is_lora_activated.pop("add_k_proj", None) is_lora_activated.pop("add_k_proj", None)
is_lora_activated.pop("add_v_proj", None) is_lora_activated.pop("add_v_proj", None)
# 2. else it is not posssible that only some layers have LoRA activated # 2. else it is not possible that only some layers have LoRA activated
if not all(is_lora_activated.values()): if not all(is_lora_activated.values()):
raise ValueError( raise ValueError(
f"Make sure that either all layers or no layers have LoRA activated, but have {is_lora_activated}" f"Make sure that either all layers or no layers have LoRA activated, but have {is_lora_activated}"
...@@ -2098,7 +2098,7 @@ class LoRAAttnAddedKVProcessor(nn.Module): ...@@ -2098,7 +2098,7 @@ class LoRAAttnAddedKVProcessor(nn.Module):
class IPAdapterAttnProcessor(nn.Module): class IPAdapterAttnProcessor(nn.Module):
r""" r"""
Attention processor for Multiple IP-Adapater. Attention processor for Multiple IP-Adapters.
Args: Args:
hidden_size (`int`): hidden_size (`int`):
...@@ -2152,8 +2152,8 @@ class IPAdapterAttnProcessor(nn.Module): ...@@ -2152,8 +2152,8 @@ class IPAdapterAttnProcessor(nn.Module):
encoder_hidden_states, ip_hidden_states = encoder_hidden_states encoder_hidden_states, ip_hidden_states = encoder_hidden_states
else: else:
deprecation_message = ( deprecation_message = (
"You have passed a tensor as `encoder_hidden_states`.This is deprecated and will be removed in a future release." "You have passed a tensor as `encoder_hidden_states`. This is deprecated and will be removed in a future release."
" Please make sure to update your script to pass `encoder_hidden_states` as a tuple to supress this warning." " Please make sure to update your script to pass `encoder_hidden_states` as a tuple to suppress this warning."
) )
deprecate("encoder_hidden_states not a tuple", "1.0.0", deprecation_message, standard_warn=False) deprecate("encoder_hidden_states not a tuple", "1.0.0", deprecation_message, standard_warn=False)
end_pos = encoder_hidden_states.shape[1] - self.num_tokens[0] end_pos = encoder_hidden_states.shape[1] - self.num_tokens[0]
...@@ -2253,7 +2253,7 @@ class IPAdapterAttnProcessor(nn.Module): ...@@ -2253,7 +2253,7 @@ class IPAdapterAttnProcessor(nn.Module):
class IPAdapterAttnProcessor2_0(torch.nn.Module): class IPAdapterAttnProcessor2_0(torch.nn.Module):
r""" r"""
Attention processor for IP-Adapater for PyTorch 2.0. Attention processor for IP-Adapter for PyTorch 2.0.
Args: Args:
hidden_size (`int`): hidden_size (`int`):
...@@ -2312,8 +2312,8 @@ class IPAdapterAttnProcessor2_0(torch.nn.Module): ...@@ -2312,8 +2312,8 @@ class IPAdapterAttnProcessor2_0(torch.nn.Module):
encoder_hidden_states, ip_hidden_states = encoder_hidden_states encoder_hidden_states, ip_hidden_states = encoder_hidden_states
else: else:
deprecation_message = ( deprecation_message = (
"You have passed a tensor as `encoder_hidden_states`.This is deprecated and will be removed in a future release." "You have passed a tensor as `encoder_hidden_states`. This is deprecated and will be removed in a future release."
" Please make sure to update your script to pass `encoder_hidden_states` as a tuple to supress this warning." " Please make sure to update your script to pass `encoder_hidden_states` as a tuple to suppress this warning."
) )
deprecate("encoder_hidden_states not a tuple", "1.0.0", deprecation_message, standard_warn=False) deprecate("encoder_hidden_states not a tuple", "1.0.0", deprecation_message, standard_warn=False)
end_pos = encoder_hidden_states.shape[1] - self.num_tokens[0] end_pos = encoder_hidden_states.shape[1] - self.num_tokens[0]
......
...@@ -281,7 +281,7 @@ class ControlNetModel(ModelMixin, ConfigMixin, FromOriginalControlNetMixin): ...@@ -281,7 +281,7 @@ class ControlNetModel(ModelMixin, ConfigMixin, FromOriginalControlNetMixin):
elif encoder_hid_dim_type == "text_image_proj": elif encoder_hid_dim_type == "text_image_proj":
# image_embed_dim DOESN'T have to be `cross_attention_dim`. To not clutter the __init__ too much # image_embed_dim DOESN'T have to be `cross_attention_dim`. To not clutter the __init__ too much
# they are set to `cross_attention_dim` here as this is exactly the required dimension for the currently only use # they are set to `cross_attention_dim` here as this is exactly the required dimension for the currently only use
# case when `addition_embed_type == "text_image_proj"` (Kadinsky 2.1)` # case when `addition_embed_type == "text_image_proj"` (Kandinsky 2.1)`
self.encoder_hid_proj = TextImageProjection( self.encoder_hid_proj = TextImageProjection(
text_embed_dim=encoder_hid_dim, text_embed_dim=encoder_hid_dim,
image_embed_dim=cross_attention_dim, image_embed_dim=cross_attention_dim,
...@@ -330,7 +330,7 @@ class ControlNetModel(ModelMixin, ConfigMixin, FromOriginalControlNetMixin): ...@@ -330,7 +330,7 @@ class ControlNetModel(ModelMixin, ConfigMixin, FromOriginalControlNetMixin):
elif addition_embed_type == "text_image": elif addition_embed_type == "text_image":
# text_embed_dim and image_embed_dim DON'T have to be `cross_attention_dim`. To not clutter the __init__ too much # text_embed_dim and image_embed_dim DON'T have to be `cross_attention_dim`. To not clutter the __init__ too much
# they are set to `cross_attention_dim` here as this is exactly the required dimension for the currently only use # they are set to `cross_attention_dim` here as this is exactly the required dimension for the currently only use
# case when `addition_embed_type == "text_image"` (Kadinsky 2.1)` # case when `addition_embed_type == "text_image"` (Kandinsky 2.1)`
self.add_embedding = TextImageTimeEmbedding( self.add_embedding = TextImageTimeEmbedding(
text_embed_dim=cross_attention_dim, image_embed_dim=cross_attention_dim, time_embed_dim=time_embed_dim text_embed_dim=cross_attention_dim, image_embed_dim=cross_attention_dim, time_embed_dim=time_embed_dim
) )
......
...@@ -20,15 +20,15 @@ from .transformers.transformer_temporal import ( ...@@ -20,15 +20,15 @@ from .transformers.transformer_temporal import (
class TransformerTemporalModelOutput(TransformerTemporalModelOutput): class TransformerTemporalModelOutput(TransformerTemporalModelOutput):
deprecation_message = "Importing `TransformerTemporalModelOutput` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.tranformer_temporal import TransformerTemporalModelOutput`, instead." deprecation_message = "Importing `TransformerTemporalModelOutput` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.transformer_temporal import TransformerTemporalModelOutput`, instead."
deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message) deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message)
class TransformerTemporalModel(TransformerTemporalModel): class TransformerTemporalModel(TransformerTemporalModel):
deprecation_message = "Importing `TransformerTemporalModel` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.tranformer_temporal import TransformerTemporalModel`, instead." deprecation_message = "Importing `TransformerTemporalModel` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.transformer_temporal import TransformerTemporalModel`, instead."
deprecate("TransformerTemporalModel", "0.29", deprecation_message) deprecate("TransformerTemporalModel", "0.29", deprecation_message)
class TransformerSpatioTemporalModel(TransformerSpatioTemporalModel): class TransformerSpatioTemporalModel(TransformerSpatioTemporalModel):
deprecation_message = "Importing `TransformerSpatioTemporalModel` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.tranformer_temporal import TransformerSpatioTemporalModel`, instead." deprecation_message = "Importing `TransformerSpatioTemporalModel` from `diffusers.models.transformer_temporal` is deprecated and this will be removed in a future version. Please use `from diffusers.models.transformers.transformer_temporal import TransformerSpatioTemporalModel`, instead."
deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message) deprecate("TransformerTemporalModelOutput", "0.29", deprecation_message)
...@@ -129,7 +129,7 @@ class Transformer2DModel(ModelMixin, ConfigMixin): ...@@ -129,7 +129,7 @@ class Transformer2DModel(ModelMixin, ConfigMixin):
if norm_type == "layer_norm" and num_embeds_ada_norm is not None: if norm_type == "layer_norm" and num_embeds_ada_norm is not None:
deprecation_message = ( deprecation_message = (
f"The configuration file of this model: {self.__class__} is outdated. `norm_type` is either not set or" f"The configuration file of this model: {self.__class__} is outdated. `norm_type` is either not set or"
" incorrectly set to `'layer_norm'`.Make sure to set `norm_type` to `'ada_norm'` in the config." " incorrectly set to `'layer_norm'`. Make sure to set `norm_type` to `'ada_norm'` in the config."
" Please make sure to update the config accordingly as leaving `norm_type` might led to incorrect" " Please make sure to update the config accordingly as leaving `norm_type` might led to incorrect"
" results in future versions. If you have downloaded this checkpoint from the Hugging Face Hub, it" " results in future versions. If you have downloaded this checkpoint from the Hugging Face Hub, it"
" would be very nice if you could open a Pull request for the `transformer/config.json` file" " would be very nice if you could open a Pull request for the `transformer/config.json` file"
......
...@@ -580,7 +580,7 @@ class UNet2DConditionModel(ModelMixin, ConfigMixin, UNet2DConditionLoadersMixin, ...@@ -580,7 +580,7 @@ class UNet2DConditionModel(ModelMixin, ConfigMixin, UNet2DConditionLoadersMixin,
elif encoder_hid_dim_type == "text_image_proj": elif encoder_hid_dim_type == "text_image_proj":
# image_embed_dim DOESN'T have to be `cross_attention_dim`. To not clutter the __init__ too much # image_embed_dim DOESN'T have to be `cross_attention_dim`. To not clutter the __init__ too much
# they are set to `cross_attention_dim` here as this is exactly the required dimension for the currently only use # they are set to `cross_attention_dim` here as this is exactly the required dimension for the currently only use
# case when `addition_embed_type == "text_image_proj"` (Kadinsky 2.1)` # case when `addition_embed_type == "text_image_proj"` (Kandinsky 2.1)`
self.encoder_hid_proj = TextImageProjection( self.encoder_hid_proj = TextImageProjection(
text_embed_dim=encoder_hid_dim, text_embed_dim=encoder_hid_dim,
image_embed_dim=cross_attention_dim, image_embed_dim=cross_attention_dim,
...@@ -660,7 +660,7 @@ class UNet2DConditionModel(ModelMixin, ConfigMixin, UNet2DConditionLoadersMixin, ...@@ -660,7 +660,7 @@ class UNet2DConditionModel(ModelMixin, ConfigMixin, UNet2DConditionLoadersMixin,
elif addition_embed_type == "text_image": elif addition_embed_type == "text_image":
# text_embed_dim and image_embed_dim DON'T have to be `cross_attention_dim`. To not clutter the __init__ too much # text_embed_dim and image_embed_dim DON'T have to be `cross_attention_dim`. To not clutter the __init__ too much
# they are set to `cross_attention_dim` here as this is exactly the required dimension for the currently only use # they are set to `cross_attention_dim` here as this is exactly the required dimension for the currently only use
# case when `addition_embed_type == "text_image"` (Kadinsky 2.1)` # case when `addition_embed_type == "text_image"` (Kandinsky 2.1)`
self.add_embedding = TextImageTimeEmbedding( self.add_embedding = TextImageTimeEmbedding(
text_embed_dim=cross_attention_dim, image_embed_dim=cross_attention_dim, time_embed_dim=time_embed_dim text_embed_dim=cross_attention_dim, image_embed_dim=cross_attention_dim, time_embed_dim=time_embed_dim
) )
...@@ -1010,7 +1010,7 @@ class UNet2DConditionModel(ModelMixin, ConfigMixin, UNet2DConditionLoadersMixin, ...@@ -1010,7 +1010,7 @@ class UNet2DConditionModel(ModelMixin, ConfigMixin, UNet2DConditionLoadersMixin,
if self.encoder_hid_proj is not None and self.config.encoder_hid_dim_type == "text_proj": if self.encoder_hid_proj is not None and self.config.encoder_hid_dim_type == "text_proj":
encoder_hidden_states = self.encoder_hid_proj(encoder_hidden_states) encoder_hidden_states = self.encoder_hid_proj(encoder_hidden_states)
elif self.encoder_hid_proj is not None and self.config.encoder_hid_dim_type == "text_image_proj": elif self.encoder_hid_proj is not None and self.config.encoder_hid_dim_type == "text_image_proj":
# Kadinsky 2.1 - style # Kandinsky 2.1 - style
if "image_embeds" not in added_cond_kwargs: if "image_embeds" not in added_cond_kwargs:
raise ValueError( raise ValueError(
f"{self.__class__} has the config param `encoder_hid_dim_type` set to 'text_image_proj' which requires the keyword argument `image_embeds` to be passed in `added_conditions`" f"{self.__class__} has the config param `encoder_hid_dim_type` set to 'text_image_proj' which requires the keyword argument `image_embeds` to be passed in `added_conditions`"
......
...@@ -1171,7 +1171,7 @@ class StableDiffusionControlNetInpaintPipeline( ...@@ -1171,7 +1171,7 @@ class StableDiffusionControlNetInpaintPipeline(
`padding_mask_crop` is not `None`, it will first find a rectangular region with the same aspect ration of the image and `padding_mask_crop` is not `None`, it will first find a rectangular region with the same aspect ration of the image and
contains all masked area, and then expand that area based on `padding_mask_crop`. The image and mask_image will then be cropped based on contains all masked area, and then expand that area based on `padding_mask_crop`. The image and mask_image will then be cropped based on
the expanded area before resizing to the original image size for inpainting. This is useful when the masked area is small while the image is large the expanded area before resizing to the original image size for inpainting. This is useful when the masked area is small while the image is large
and contain information inreleant for inpainging, such as background. and contain information irrelevant for inpainting, such as background.
strength (`float`, *optional*, defaults to 1.0): strength (`float`, *optional*, defaults to 1.0):
Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a
starting point and more noise is added the higher the `strength`. The number of denoising steps depends starting point and more noise is added the higher the `strength`. The number of denoising steps depends
......
...@@ -1198,7 +1198,7 @@ class StableDiffusionXLControlNetInpaintPipeline( ...@@ -1198,7 +1198,7 @@ class StableDiffusionXLControlNetInpaintPipeline(
`padding_mask_crop` is not `None`, it will first find a rectangular region with the same aspect ration of the image and `padding_mask_crop` is not `None`, it will first find a rectangular region with the same aspect ration of the image and
contains all masked area, and then expand that area based on `padding_mask_crop`. The image and mask_image will then be cropped based on contains all masked area, and then expand that area based on `padding_mask_crop`. The image and mask_image will then be cropped based on
the expanded area before resizing to the original image size for inpainting. This is useful when the masked area is small while the image is large the expanded area before resizing to the original image size for inpainting. This is useful when the masked area is small while the image is large
and contain information inreleant for inpainging, such as background. and contain information irrelevant for inpainting, such as background.
strength (`float`, *optional*, defaults to 0.9999): strength (`float`, *optional*, defaults to 0.9999):
Conceptually, indicates how much to transform the masked portion of the reference `image`. Must be Conceptually, indicates how much to transform the masked portion of the reference `image`. Must be
between 0 and 1. `image` will be used as a starting point, adding more noise to it the larger the between 0 and 1. `image` will be used as a starting point, adding more noise to it the larger the
......
...@@ -531,7 +531,7 @@ class UNetFlatConditionModel(ModelMixin, ConfigMixin): ...@@ -531,7 +531,7 @@ class UNetFlatConditionModel(ModelMixin, ConfigMixin):
elif encoder_hid_dim_type == "text_image_proj": elif encoder_hid_dim_type == "text_image_proj":
# image_embed_dim DOESN'T have to be `cross_attention_dim`. To not clutter the __init__ too much # image_embed_dim DOESN'T have to be `cross_attention_dim`. To not clutter the __init__ too much
# they are set to `cross_attention_dim` here as this is exactly the required dimension for the currently only use # they are set to `cross_attention_dim` here as this is exactly the required dimension for the currently only use
# case when `addition_embed_type == "text_image_proj"` (Kadinsky 2.1)` # case when `addition_embed_type == "text_image_proj"` (Kandinsky 2.1)`
self.encoder_hid_proj = TextImageProjection( self.encoder_hid_proj = TextImageProjection(
text_embed_dim=encoder_hid_dim, text_embed_dim=encoder_hid_dim,
image_embed_dim=cross_attention_dim, image_embed_dim=cross_attention_dim,
...@@ -591,7 +591,7 @@ class UNetFlatConditionModel(ModelMixin, ConfigMixin): ...@@ -591,7 +591,7 @@ class UNetFlatConditionModel(ModelMixin, ConfigMixin):
elif addition_embed_type == "text_image": elif addition_embed_type == "text_image":
# text_embed_dim and image_embed_dim DON'T have to be `cross_attention_dim`. To not clutter the __init__ too much # text_embed_dim and image_embed_dim DON'T have to be `cross_attention_dim`. To not clutter the __init__ too much
# they are set to `cross_attention_dim` here as this is exactly the required dimension for the currently only use # they are set to `cross_attention_dim` here as this is exactly the required dimension for the currently only use
# case when `addition_embed_type == "text_image"` (Kadinsky 2.1)` # case when `addition_embed_type == "text_image"` (Kandinsky 2.1)`
self.add_embedding = TextImageTimeEmbedding( self.add_embedding = TextImageTimeEmbedding(
text_embed_dim=cross_attention_dim, image_embed_dim=cross_attention_dim, time_embed_dim=time_embed_dim text_embed_dim=cross_attention_dim, image_embed_dim=cross_attention_dim, time_embed_dim=time_embed_dim
) )
...@@ -1257,7 +1257,7 @@ class UNetFlatConditionModel(ModelMixin, ConfigMixin): ...@@ -1257,7 +1257,7 @@ class UNetFlatConditionModel(ModelMixin, ConfigMixin):
if self.encoder_hid_proj is not None and self.config.encoder_hid_dim_type == "text_proj": if self.encoder_hid_proj is not None and self.config.encoder_hid_dim_type == "text_proj":
encoder_hidden_states = self.encoder_hid_proj(encoder_hidden_states) encoder_hidden_states = self.encoder_hid_proj(encoder_hidden_states)
elif self.encoder_hid_proj is not None and self.config.encoder_hid_dim_type == "text_image_proj": elif self.encoder_hid_proj is not None and self.config.encoder_hid_dim_type == "text_image_proj":
# Kadinsky 2.1 - style # Kandinsky 2.1 - style
if "image_embeds" not in added_cond_kwargs: if "image_embeds" not in added_cond_kwargs:
raise ValueError( raise ValueError(
f"{self.__class__} has the config param `encoder_hid_dim_type` set to 'text_image_proj' which requires the keyword argument `image_embeds` to be passed in `added_conditions`" f"{self.__class__} has the config param `encoder_hid_dim_type` set to 'text_image_proj' which requires the keyword argument `image_embeds` to be passed in `added_conditions`"
......
...@@ -1026,7 +1026,7 @@ class StableDiffusionInpaintPipeline( ...@@ -1026,7 +1026,7 @@ class StableDiffusionInpaintPipeline(
`padding_mask_crop` is not `None`, it will first find a rectangular region with the same aspect ration of the image and `padding_mask_crop` is not `None`, it will first find a rectangular region with the same aspect ration of the image and
contains all masked area, and then expand that area based on `padding_mask_crop`. The image and mask_image will then be cropped based on contains all masked area, and then expand that area based on `padding_mask_crop`. The image and mask_image will then be cropped based on
the expanded area before resizing to the original image size for inpainting. This is useful when the masked area is small while the image is large the expanded area before resizing to the original image size for inpainting. This is useful when the masked area is small while the image is large
and contain information inreleant for inpainging, such as background. and contain information irrelevant for inpainting, such as background.
strength (`float`, *optional*, defaults to 1.0): strength (`float`, *optional*, defaults to 1.0):
Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a
starting point and more noise is added the higher the `strength`. The number of denoising steps depends starting point and more noise is added the higher the `strength`. The number of denoising steps depends
......
...@@ -1259,7 +1259,7 @@ class StableDiffusionXLInpaintPipeline( ...@@ -1259,7 +1259,7 @@ class StableDiffusionXLInpaintPipeline(
`padding_mask_crop` is not `None`, it will first find a rectangular region with the same aspect ration of the image and `padding_mask_crop` is not `None`, it will first find a rectangular region with the same aspect ration of the image and
contains all masked area, and then expand that area based on `padding_mask_crop`. The image and mask_image will then be cropped based on contains all masked area, and then expand that area based on `padding_mask_crop`. The image and mask_image will then be cropped based on
the expanded area before resizing to the original image size for inpainting. This is useful when the masked area is small while the image is large the expanded area before resizing to the original image size for inpainting. This is useful when the masked area is small while the image is large
and contain information inreleant for inpainging, such as background. and contain information irrelevant for inpainting, such as background.
strength (`float`, *optional*, defaults to 0.9999): strength (`float`, *optional*, defaults to 0.9999):
Conceptually, indicates how much to transform the masked portion of the reference `image`. Must be Conceptually, indicates how much to transform the masked portion of the reference `image`. Must be
between 0 and 1. `image` will be used as a starting point, adding more noise to it the larger the between 0 and 1. `image` will be used as a starting point, adding more noise to it the larger the
......
...@@ -45,7 +45,7 @@ def betas_for_alpha_bar( ...@@ -45,7 +45,7 @@ def betas_for_alpha_bar(
return math.exp(t * -12.0) return math.exp(t * -12.0)
else: else:
raise ValueError(f"Unsupported alpha_tranform_type: {alpha_transform_type}") raise ValueError(f"Unsupported alpha_transform_type: {alpha_transform_type}")
betas = [] betas = []
for i in range(num_diffusion_timesteps): for i in range(num_diffusion_timesteps):
......
...@@ -104,7 +104,7 @@ class CMStochasticIterativeScheduler(SchedulerMixin, ConfigMixin): ...@@ -104,7 +104,7 @@ class CMStochasticIterativeScheduler(SchedulerMixin, ConfigMixin):
@property @property
def step_index(self): def step_index(self):
""" """
The index counter for current timestep. It will increae 1 after each scheduler step. The index counter for current timestep. It will increase 1 after each scheduler step.
""" """
return self._step_index return self._step_index
......
...@@ -82,7 +82,7 @@ def betas_for_alpha_bar( ...@@ -82,7 +82,7 @@ def betas_for_alpha_bar(
return math.exp(t * -12.0) return math.exp(t * -12.0)
else: else:
raise ValueError(f"Unsupported alpha_tranform_type: {alpha_transform_type}") raise ValueError(f"Unsupported alpha_transform_type: {alpha_transform_type}")
betas = [] betas = []
for i in range(num_diffusion_timesteps): for i in range(num_diffusion_timesteps):
......
...@@ -80,7 +80,7 @@ def betas_for_alpha_bar( ...@@ -80,7 +80,7 @@ def betas_for_alpha_bar(
return math.exp(t * -12.0) return math.exp(t * -12.0)
else: else:
raise ValueError(f"Unsupported alpha_tranform_type: {alpha_transform_type}") raise ValueError(f"Unsupported alpha_transform_type: {alpha_transform_type}")
betas = [] betas = []
for i in range(num_diffusion_timesteps): for i in range(num_diffusion_timesteps):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment