Unverified Commit 45f6d52b authored by YiYi Xu's avatar YiYi Xu Committed by GitHub
Browse files

Add Shap-E (#3742)



* refactor prior_transformer

adding conversion script

add pipeline

add step_index from pipeline, + remove permute

add zero pad token

remove copy from statement for betas_for_alpha_bar function

* add

* add

* update conversion script for renderer model

* refactor camera a little bit

* clean up

* style

* fix copies

* Update src/diffusers/schedulers/scheduling_heun_discrete.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* alpha_transform_type

* remove step_index argument

* remove get_sigmas_karras

* remove _yiyi_sigma_to_t

* move the rescale prompt_embeds from prior_transformer to pipeline

* replace baddbmm with einsum to match origial repo

* Revert "replace baddbmm with einsum to match origial repo"

This reverts commit 3f6b435d65dad3e5514cad2f5dd9e4419ca78e0b.

* add step_index to scale_model_input

* Revert "move the rescale prompt_embeds from prior_transformer to pipeline"

This reverts commit 5b5a8e6be918fefd114a2945ed89d8e8fa8be21b.

* move rescale from prior_transformer to pipeline

* correct step_index in scale_model_input

* remove print lines

* refactor prior - reduce arguments

* make style

* add prior_image

* arg embedding_proj_norm -> norm_embedding_proj

* add pre-norm for proj_embedding

* move rescale prompt from pipeline to _encode_prompt

* add img2img pipeline

* style

* copies

* Update src/diffusers/models/prior_transformer.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/models/prior_transformer.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/models/prior_transformer.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/models/prior_transformer.py

add arg: encoder_hid_proj
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/models/prior_transformer.py

add new config: norm_in_type
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/models/prior_transformer.py

add new config: added_emb_type
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/models/prior_transformer.py

rename out_dim -> clip_embed_dim
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/models/prior_transformer.py

rename config: out_dim -> clip_embed_dim
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/models/prior_transformer.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/models/prior_transformer.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* finish refactor prior_tranformer

* make style

* refactor renderer

* fix

* make style

* refactor img2img

* remove params_proj

* add test

* add upcast_softmax to prior_transformer

* enable num_images_per_prompt, add save_gif utility

* add

* add fast test

* make style

* add slow test

* style

* add test for img2img

* refactor

* enable batching

* style

* refactor scheduler

* update test

* style

* attempt to solve batch related tests timeout

* add doc

* Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/shap_e/pipeline_shap_e_img2img.py
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* hardcode rendering related config

* update betas_for_alpha_bar on ddpm_scheduler

* fix copies

* fix

* export_to_gif

* style

* second attempt to speed up batching tests

* add doc page to index

* Remove intermediate clipping

* 3rd attempt to speed up batching tests

* Remvoe time index

* simplify scheduler

* Fix more

* Fix more

* fix more

* make style

* fix schedulers

* fix some more tests

* finish

* add one more test

* Apply suggestions from code review
Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* style

* apply feedbacks

* style

* fix copies

* add one example

* style

* add example for img2img

* fix doc

* fix more doc strings

* size -> frame_size

* style

* update doc

* style

* fix on doc

* update repo name

* improve the usage example in shap-e img2img

* add usage examples in the shap-e docs.

* consolidate examples.

* minor fix.

* update doc

* Apply suggestions from code review

* Apply suggestions from code review

* remove upcast

* Make sure background is white

* Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py

* Apply suggestions from code review

* Finish

* Apply suggestions from code review

* Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py

* Make style

---------
Co-authored-by: default avataryiyixuxu <yixu310@gmail,com>
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
parent 74621567
...@@ -29,7 +29,11 @@ logger = logging.get_logger(__name__) # pylint: disable=invalid-name ...@@ -29,7 +29,11 @@ logger = logging.get_logger(__name__) # pylint: disable=invalid-name
# Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar # Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar
def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999): def betas_for_alpha_bar(
num_diffusion_timesteps,
max_beta=0.999,
alpha_transform_type="cosine",
):
""" """
Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of
(1-beta) over time from t = [0,1]. (1-beta) over time from t = [0,1].
...@@ -42,19 +46,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999): ...@@ -42,19 +46,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999):
num_diffusion_timesteps (`int`): the number of betas to produce. num_diffusion_timesteps (`int`): the number of betas to produce.
max_beta (`float`): the maximum beta to use; use values lower than 1 to max_beta (`float`): the maximum beta to use; use values lower than 1 to
prevent singularities. prevent singularities.
alpha_transform_type (`str`, *optional*, default to `cosine`): the type of noise schedule for alpha_bar.
Choose from `cosine` or `exp`
Returns: Returns:
betas (`np.ndarray`): the betas used by the scheduler to step the model outputs betas (`np.ndarray`): the betas used by the scheduler to step the model outputs
""" """
if alpha_transform_type == "cosine":
def alpha_bar(time_step): def alpha_bar_fn(t):
return math.cos((time_step + 0.008) / 1.008 * math.pi / 2) ** 2 return math.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2
elif alpha_transform_type == "exp":
def alpha_bar_fn(t):
return math.exp(t * -12.0)
else:
raise ValueError(f"Unsupported alpha_tranform_type: {alpha_transform_type}")
betas = [] betas = []
for i in range(num_diffusion_timesteps): for i in range(num_diffusion_timesteps):
t1 = i / num_diffusion_timesteps t1 = i / num_diffusion_timesteps
t2 = (i + 1) / num_diffusion_timesteps t2 = (i + 1) / num_diffusion_timesteps
betas.append(min(1 - alpha_bar(t2) / alpha_bar(t1), max_beta)) betas.append(min(1 - alpha_bar_fn(t2) / alpha_bar_fn(t1), max_beta))
return torch.tensor(betas, dtype=torch.float32) return torch.tensor(betas, dtype=torch.float32)
......
...@@ -47,7 +47,11 @@ class EulerAncestralDiscreteSchedulerOutput(BaseOutput): ...@@ -47,7 +47,11 @@ class EulerAncestralDiscreteSchedulerOutput(BaseOutput):
# Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar # Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar
def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999) -> torch.Tensor: def betas_for_alpha_bar(
num_diffusion_timesteps,
max_beta=0.999,
alpha_transform_type="cosine",
):
""" """
Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of
(1-beta) over time from t = [0,1]. (1-beta) over time from t = [0,1].
...@@ -60,19 +64,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999) -> torch.Tensor ...@@ -60,19 +64,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999) -> torch.Tensor
num_diffusion_timesteps (`int`): the number of betas to produce. num_diffusion_timesteps (`int`): the number of betas to produce.
max_beta (`float`): the maximum beta to use; use values lower than 1 to max_beta (`float`): the maximum beta to use; use values lower than 1 to
prevent singularities. prevent singularities.
alpha_transform_type (`str`, *optional*, default to `cosine`): the type of noise schedule for alpha_bar.
Choose from `cosine` or `exp`
Returns: Returns:
betas (`np.ndarray`): the betas used by the scheduler to step the model outputs betas (`np.ndarray`): the betas used by the scheduler to step the model outputs
""" """
if alpha_transform_type == "cosine":
def alpha_bar(time_step): def alpha_bar_fn(t):
return math.cos((time_step + 0.008) / 1.008 * math.pi / 2) ** 2 return math.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2
elif alpha_transform_type == "exp":
def alpha_bar_fn(t):
return math.exp(t * -12.0)
else:
raise ValueError(f"Unsupported alpha_tranform_type: {alpha_transform_type}")
betas = [] betas = []
for i in range(num_diffusion_timesteps): for i in range(num_diffusion_timesteps):
t1 = i / num_diffusion_timesteps t1 = i / num_diffusion_timesteps
t2 = (i + 1) / num_diffusion_timesteps t2 = (i + 1) / num_diffusion_timesteps
betas.append(min(1 - alpha_bar(t2) / alpha_bar(t1), max_beta)) betas.append(min(1 - alpha_bar_fn(t2) / alpha_bar_fn(t1), max_beta))
return torch.tensor(betas, dtype=torch.float32) return torch.tensor(betas, dtype=torch.float32)
......
...@@ -47,7 +47,11 @@ class EulerDiscreteSchedulerOutput(BaseOutput): ...@@ -47,7 +47,11 @@ class EulerDiscreteSchedulerOutput(BaseOutput):
# Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar # Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar
def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999): def betas_for_alpha_bar(
num_diffusion_timesteps,
max_beta=0.999,
alpha_transform_type="cosine",
):
""" """
Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of
(1-beta) over time from t = [0,1]. (1-beta) over time from t = [0,1].
...@@ -60,19 +64,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999): ...@@ -60,19 +64,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999):
num_diffusion_timesteps (`int`): the number of betas to produce. num_diffusion_timesteps (`int`): the number of betas to produce.
max_beta (`float`): the maximum beta to use; use values lower than 1 to max_beta (`float`): the maximum beta to use; use values lower than 1 to
prevent singularities. prevent singularities.
alpha_transform_type (`str`, *optional*, default to `cosine`): the type of noise schedule for alpha_bar.
Choose from `cosine` or `exp`
Returns: Returns:
betas (`np.ndarray`): the betas used by the scheduler to step the model outputs betas (`np.ndarray`): the betas used by the scheduler to step the model outputs
""" """
if alpha_transform_type == "cosine":
def alpha_bar(time_step): def alpha_bar_fn(t):
return math.cos((time_step + 0.008) / 1.008 * math.pi / 2) ** 2 return math.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2
elif alpha_transform_type == "exp":
def alpha_bar_fn(t):
return math.exp(t * -12.0)
else:
raise ValueError(f"Unsupported alpha_tranform_type: {alpha_transform_type}")
betas = [] betas = []
for i in range(num_diffusion_timesteps): for i in range(num_diffusion_timesteps):
t1 = i / num_diffusion_timesteps t1 = i / num_diffusion_timesteps
t2 = (i + 1) / num_diffusion_timesteps t2 = (i + 1) / num_diffusion_timesteps
betas.append(min(1 - alpha_bar(t2) / alpha_bar(t1), max_beta)) betas.append(min(1 - alpha_bar_fn(t2) / alpha_bar_fn(t1), max_beta))
return torch.tensor(betas, dtype=torch.float32) return torch.tensor(betas, dtype=torch.float32)
......
...@@ -13,6 +13,7 @@ ...@@ -13,6 +13,7 @@
# limitations under the License. # limitations under the License.
import math import math
from collections import defaultdict
from typing import List, Optional, Tuple, Union from typing import List, Optional, Tuple, Union
import numpy as np import numpy as np
...@@ -23,7 +24,11 @@ from .scheduling_utils import KarrasDiffusionSchedulers, SchedulerMixin, Schedul ...@@ -23,7 +24,11 @@ from .scheduling_utils import KarrasDiffusionSchedulers, SchedulerMixin, Schedul
# Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar # Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar
def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999) -> torch.Tensor: def betas_for_alpha_bar(
num_diffusion_timesteps,
max_beta=0.999,
alpha_transform_type="cosine",
):
""" """
Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of
(1-beta) over time from t = [0,1]. (1-beta) over time from t = [0,1].
...@@ -36,19 +41,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999) -> torch.Tensor ...@@ -36,19 +41,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999) -> torch.Tensor
num_diffusion_timesteps (`int`): the number of betas to produce. num_diffusion_timesteps (`int`): the number of betas to produce.
max_beta (`float`): the maximum beta to use; use values lower than 1 to max_beta (`float`): the maximum beta to use; use values lower than 1 to
prevent singularities. prevent singularities.
alpha_transform_type (`str`, *optional*, default to `cosine`): the type of noise schedule for alpha_bar.
Choose from `cosine` or `exp`
Returns: Returns:
betas (`np.ndarray`): the betas used by the scheduler to step the model outputs betas (`np.ndarray`): the betas used by the scheduler to step the model outputs
""" """
if alpha_transform_type == "cosine":
def alpha_bar(time_step): def alpha_bar_fn(t):
return math.cos((time_step + 0.008) / 1.008 * math.pi / 2) ** 2 return math.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2
elif alpha_transform_type == "exp":
def alpha_bar_fn(t):
return math.exp(t * -12.0)
else:
raise ValueError(f"Unsupported alpha_tranform_type: {alpha_transform_type}")
betas = [] betas = []
for i in range(num_diffusion_timesteps): for i in range(num_diffusion_timesteps):
t1 = i / num_diffusion_timesteps t1 = i / num_diffusion_timesteps
t2 = (i + 1) / num_diffusion_timesteps t2 = (i + 1) / num_diffusion_timesteps
betas.append(min(1 - alpha_bar(t2) / alpha_bar(t1), max_beta)) betas.append(min(1 - alpha_bar_fn(t2) / alpha_bar_fn(t1), max_beta))
return torch.tensor(betas, dtype=torch.float32) return torch.tensor(betas, dtype=torch.float32)
...@@ -74,6 +90,10 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -74,6 +90,10 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin):
prediction type of the scheduler function, one of `epsilon` (predicting the noise of the diffusion prediction type of the scheduler function, one of `epsilon` (predicting the noise of the diffusion
process), `sample` (directly predicting the noisy sample`) or `v_prediction` (see section 2.4 process), `sample` (directly predicting the noisy sample`) or `v_prediction` (see section 2.4
https://imagen.research.google/video/paper.pdf). https://imagen.research.google/video/paper.pdf).
clip_sample (`bool`, default `True`):
option to clip predicted sample for numerical stability.
clip_sample_range (`float`, default `1.0`):
the maximum magnitude for sample clipping. Valid only when `clip_sample=True`.
use_karras_sigmas (`bool`, *optional*, defaults to `False`): use_karras_sigmas (`bool`, *optional*, defaults to `False`):
This parameter controls whether to use Karras sigmas (Karras et al. (2022) scheme) for step sizes in the This parameter controls whether to use Karras sigmas (Karras et al. (2022) scheme) for step sizes in the
noise schedule during the sampling process. If True, the sigmas will be determined according to a sequence noise schedule during the sampling process. If True, the sigmas will be determined according to a sequence
...@@ -100,6 +120,8 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -100,6 +120,8 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin):
trained_betas: Optional[Union[np.ndarray, List[float]]] = None, trained_betas: Optional[Union[np.ndarray, List[float]]] = None,
prediction_type: str = "epsilon", prediction_type: str = "epsilon",
use_karras_sigmas: Optional[bool] = False, use_karras_sigmas: Optional[bool] = False,
clip_sample: Optional[bool] = False,
clip_sample_range: float = 1.0,
timestep_spacing: str = "linspace", timestep_spacing: str = "linspace",
steps_offset: int = 0, steps_offset: int = 0,
): ):
...@@ -114,7 +136,9 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -114,7 +136,9 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin):
) )
elif beta_schedule == "squaredcos_cap_v2": elif beta_schedule == "squaredcos_cap_v2":
# Glide cosine schedule # Glide cosine schedule
self.betas = betas_for_alpha_bar(num_train_timesteps) self.betas = betas_for_alpha_bar(num_train_timesteps, alpha_transform_type="cosine")
elif beta_schedule == "exp":
self.betas = betas_for_alpha_bar(num_train_timesteps, alpha_transform_type="exp")
else: else:
raise NotImplementedError(f"{beta_schedule} does is not implemented for {self.__class__}") raise NotImplementedError(f"{beta_schedule} does is not implemented for {self.__class__}")
...@@ -131,10 +155,16 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -131,10 +155,16 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin):
indices = (schedule_timesteps == timestep).nonzero() indices = (schedule_timesteps == timestep).nonzero()
if self.state_in_first_order: # The sigma index that is taken for the **very** first `step`
pos = -1 # is always the second index (or the last index if there is only 1)
# This way we can ensure we don't accidentally skip a sigma in
# case we start in the middle of the denoising schedule (e.g. for image-to-image)
if len(self._index_counter) == 0:
pos = 1 if len(indices) > 1 else 0
else: else:
pos = 0 timestep_int = timestep.cpu().item() if torch.is_tensor(timestep) else timestep
pos = self._index_counter[timestep_int]
return indices[pos].item() return indices[pos].item()
@property @property
...@@ -207,7 +237,7 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -207,7 +237,7 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin):
log_sigmas = np.log(sigmas) log_sigmas = np.log(sigmas)
sigmas = np.interp(timesteps, np.arange(0, len(sigmas)), sigmas) sigmas = np.interp(timesteps, np.arange(0, len(sigmas)), sigmas)
if self.use_karras_sigmas: if self.config.use_karras_sigmas:
sigmas = self._convert_to_karras(in_sigmas=sigmas, num_inference_steps=self.num_inference_steps) sigmas = self._convert_to_karras(in_sigmas=sigmas, num_inference_steps=self.num_inference_steps)
timesteps = np.array([self._sigma_to_t(sigma, log_sigmas) for sigma in sigmas]) timesteps = np.array([self._sigma_to_t(sigma, log_sigmas) for sigma in sigmas])
...@@ -228,6 +258,10 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -228,6 +258,10 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin):
self.prev_derivative = None self.prev_derivative = None
self.dt = None self.dt = None
# for exp beta schedules, such as the one for `pipeline_shap_e.py`
# we need an index counter
self._index_counter = defaultdict(int)
# Copied from diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler._sigma_to_t # Copied from diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler._sigma_to_t
def _sigma_to_t(self, sigma, log_sigmas): def _sigma_to_t(self, sigma, log_sigmas):
# get log sigma # get log sigma
...@@ -292,6 +326,10 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -292,6 +326,10 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin):
""" """
step_index = self.index_for_timestep(timestep) step_index = self.index_for_timestep(timestep)
# advance index counter by 1
timestep_int = timestep.cpu().item() if torch.is_tensor(timestep) else timestep
self._index_counter[timestep_int] += 1
if self.state_in_first_order: if self.state_in_first_order:
sigma = self.sigmas[step_index] sigma = self.sigmas[step_index]
sigma_next = self.sigmas[step_index + 1] sigma_next = self.sigmas[step_index + 1]
...@@ -316,12 +354,17 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -316,12 +354,17 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin):
sample / (sigma_input**2 + 1) sample / (sigma_input**2 + 1)
) )
elif self.config.prediction_type == "sample": elif self.config.prediction_type == "sample":
raise NotImplementedError("prediction_type not implemented yet: sample") pred_original_sample = model_output
else: else:
raise ValueError( raise ValueError(
f"prediction_type given as {self.config.prediction_type} must be one of `epsilon`, or `v_prediction`" f"prediction_type given as {self.config.prediction_type} must be one of `epsilon`, or `v_prediction`"
) )
if self.config.clip_sample:
pred_original_sample = pred_original_sample.clamp(
-self.config.clip_sample_range, self.config.clip_sample_range
)
if self.state_in_first_order: if self.state_in_first_order:
# 2. Convert to an ODE derivative for 1st order # 2. Convert to an ODE derivative for 1st order
derivative = (sample - pred_original_sample) / sigma_hat derivative = (sample - pred_original_sample) / sigma_hat
......
...@@ -13,6 +13,7 @@ ...@@ -13,6 +13,7 @@
# limitations under the License. # limitations under the License.
import math import math
from collections import defaultdict
from typing import List, Optional, Tuple, Union from typing import List, Optional, Tuple, Union
import numpy as np import numpy as np
...@@ -24,7 +25,11 @@ from .scheduling_utils import KarrasDiffusionSchedulers, SchedulerMixin, Schedul ...@@ -24,7 +25,11 @@ from .scheduling_utils import KarrasDiffusionSchedulers, SchedulerMixin, Schedul
# Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar # Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar
def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999) -> torch.Tensor: def betas_for_alpha_bar(
num_diffusion_timesteps,
max_beta=0.999,
alpha_transform_type="cosine",
):
""" """
Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of
(1-beta) over time from t = [0,1]. (1-beta) over time from t = [0,1].
...@@ -37,19 +42,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999) -> torch.Tensor ...@@ -37,19 +42,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999) -> torch.Tensor
num_diffusion_timesteps (`int`): the number of betas to produce. num_diffusion_timesteps (`int`): the number of betas to produce.
max_beta (`float`): the maximum beta to use; use values lower than 1 to max_beta (`float`): the maximum beta to use; use values lower than 1 to
prevent singularities. prevent singularities.
alpha_transform_type (`str`, *optional*, default to `cosine`): the type of noise schedule for alpha_bar.
Choose from `cosine` or `exp`
Returns: Returns:
betas (`np.ndarray`): the betas used by the scheduler to step the model outputs betas (`np.ndarray`): the betas used by the scheduler to step the model outputs
""" """
if alpha_transform_type == "cosine":
def alpha_bar(time_step): def alpha_bar_fn(t):
return math.cos((time_step + 0.008) / 1.008 * math.pi / 2) ** 2 return math.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2
elif alpha_transform_type == "exp":
def alpha_bar_fn(t):
return math.exp(t * -12.0)
else:
raise ValueError(f"Unsupported alpha_tranform_type: {alpha_transform_type}")
betas = [] betas = []
for i in range(num_diffusion_timesteps): for i in range(num_diffusion_timesteps):
t1 = i / num_diffusion_timesteps t1 = i / num_diffusion_timesteps
t2 = (i + 1) / num_diffusion_timesteps t2 = (i + 1) / num_diffusion_timesteps
betas.append(min(1 - alpha_bar(t2) / alpha_bar(t1), max_beta)) betas.append(min(1 - alpha_bar_fn(t2) / alpha_bar_fn(t1), max_beta))
return torch.tensor(betas, dtype=torch.float32) return torch.tensor(betas, dtype=torch.float32)
...@@ -130,10 +146,16 @@ class KDPM2AncestralDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -130,10 +146,16 @@ class KDPM2AncestralDiscreteScheduler(SchedulerMixin, ConfigMixin):
indices = (schedule_timesteps == timestep).nonzero() indices = (schedule_timesteps == timestep).nonzero()
if self.state_in_first_order: # The sigma index that is taken for the **very** first `step`
pos = -1 # is always the second index (or the last index if there is only 1)
# This way we can ensure we don't accidentally skip a sigma in
# case we start in the middle of the denoising schedule (e.g. for image-to-image)
if len(self._index_counter) == 0:
pos = 1 if len(indices) > 1 else 0
else: else:
pos = 0 timestep_int = timestep.cpu().item() if torch.is_tensor(timestep) else timestep
pos = self._index_counter[timestep_int]
return indices[pos].item() return indices[pos].item()
@property @property
...@@ -245,6 +267,10 @@ class KDPM2AncestralDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -245,6 +267,10 @@ class KDPM2AncestralDiscreteScheduler(SchedulerMixin, ConfigMixin):
self.sample = None self.sample = None
# for exp beta schedules, such as the one for `pipeline_shap_e.py`
# we need an index counter
self._index_counter = defaultdict(int)
def sigma_to_t(self, sigma): def sigma_to_t(self, sigma):
# get log sigma # get log sigma
log_sigma = sigma.log() log_sigma = sigma.log()
...@@ -295,6 +321,10 @@ class KDPM2AncestralDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -295,6 +321,10 @@ class KDPM2AncestralDiscreteScheduler(SchedulerMixin, ConfigMixin):
""" """
step_index = self.index_for_timestep(timestep) step_index = self.index_for_timestep(timestep)
# advance index counter by 1
timestep_int = timestep.cpu().item() if torch.is_tensor(timestep) else timestep
self._index_counter[timestep_int] += 1
if self.state_in_first_order: if self.state_in_first_order:
sigma = self.sigmas[step_index] sigma = self.sigmas[step_index]
sigma_interpol = self.sigmas_interpol[step_index] sigma_interpol = self.sigmas_interpol[step_index]
......
...@@ -13,6 +13,7 @@ ...@@ -13,6 +13,7 @@
# limitations under the License. # limitations under the License.
import math import math
from collections import defaultdict
from typing import List, Optional, Tuple, Union from typing import List, Optional, Tuple, Union
import numpy as np import numpy as np
...@@ -23,7 +24,11 @@ from .scheduling_utils import KarrasDiffusionSchedulers, SchedulerMixin, Schedul ...@@ -23,7 +24,11 @@ from .scheduling_utils import KarrasDiffusionSchedulers, SchedulerMixin, Schedul
# Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar # Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar
def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999) -> torch.Tensor: def betas_for_alpha_bar(
num_diffusion_timesteps,
max_beta=0.999,
alpha_transform_type="cosine",
):
""" """
Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of
(1-beta) over time from t = [0,1]. (1-beta) over time from t = [0,1].
...@@ -36,19 +41,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999) -> torch.Tensor ...@@ -36,19 +41,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999) -> torch.Tensor
num_diffusion_timesteps (`int`): the number of betas to produce. num_diffusion_timesteps (`int`): the number of betas to produce.
max_beta (`float`): the maximum beta to use; use values lower than 1 to max_beta (`float`): the maximum beta to use; use values lower than 1 to
prevent singularities. prevent singularities.
alpha_transform_type (`str`, *optional*, default to `cosine`): the type of noise schedule for alpha_bar.
Choose from `cosine` or `exp`
Returns: Returns:
betas (`np.ndarray`): the betas used by the scheduler to step the model outputs betas (`np.ndarray`): the betas used by the scheduler to step the model outputs
""" """
if alpha_transform_type == "cosine":
def alpha_bar(time_step): def alpha_bar_fn(t):
return math.cos((time_step + 0.008) / 1.008 * math.pi / 2) ** 2 return math.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2
elif alpha_transform_type == "exp":
def alpha_bar_fn(t):
return math.exp(t * -12.0)
else:
raise ValueError(f"Unsupported alpha_tranform_type: {alpha_transform_type}")
betas = [] betas = []
for i in range(num_diffusion_timesteps): for i in range(num_diffusion_timesteps):
t1 = i / num_diffusion_timesteps t1 = i / num_diffusion_timesteps
t2 = (i + 1) / num_diffusion_timesteps t2 = (i + 1) / num_diffusion_timesteps
betas.append(min(1 - alpha_bar(t2) / alpha_bar(t1), max_beta)) betas.append(min(1 - alpha_bar_fn(t2) / alpha_bar_fn(t1), max_beta))
return torch.tensor(betas, dtype=torch.float32) return torch.tensor(betas, dtype=torch.float32)
...@@ -129,10 +145,16 @@ class KDPM2DiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -129,10 +145,16 @@ class KDPM2DiscreteScheduler(SchedulerMixin, ConfigMixin):
indices = (schedule_timesteps == timestep).nonzero() indices = (schedule_timesteps == timestep).nonzero()
if self.state_in_first_order: # The sigma index that is taken for the **very** first `step`
pos = -1 # is always the second index (or the last index if there is only 1)
# This way we can ensure we don't accidentally skip a sigma in
# case we start in the middle of the denoising schedule (e.g. for image-to-image)
if len(self._index_counter) == 0:
pos = 1 if len(indices) > 1 else 0
else: else:
pos = 0 timestep_int = timestep.cpu().item() if torch.is_tensor(timestep) else timestep
pos = self._index_counter[timestep_int]
return indices[pos].item() return indices[pos].item()
@property @property
...@@ -234,6 +256,10 @@ class KDPM2DiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -234,6 +256,10 @@ class KDPM2DiscreteScheduler(SchedulerMixin, ConfigMixin):
self.sample = None self.sample = None
# for exp beta schedules, such as the one for `pipeline_shap_e.py`
# we need an index counter
self._index_counter = defaultdict(int)
def sigma_to_t(self, sigma): def sigma_to_t(self, sigma):
# get log sigma # get log sigma
log_sigma = sigma.log() log_sigma = sigma.log()
...@@ -283,6 +309,10 @@ class KDPM2DiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -283,6 +309,10 @@ class KDPM2DiscreteScheduler(SchedulerMixin, ConfigMixin):
""" """
step_index = self.index_for_timestep(timestep) step_index = self.index_for_timestep(timestep)
# advance index counter by 1
timestep_int = timestep.cpu().item() if torch.is_tensor(timestep) else timestep
self._index_counter[timestep_int] += 1
if self.state_in_first_order: if self.state_in_first_order:
sigma = self.sigmas[step_index] sigma = self.sigmas[step_index]
sigma_interpol = self.sigmas_interpol[step_index + 1] sigma_interpol = self.sigmas_interpol[step_index + 1]
......
...@@ -45,7 +45,11 @@ class LMSDiscreteSchedulerOutput(BaseOutput): ...@@ -45,7 +45,11 @@ class LMSDiscreteSchedulerOutput(BaseOutput):
# Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar # Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar
def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999): def betas_for_alpha_bar(
num_diffusion_timesteps,
max_beta=0.999,
alpha_transform_type="cosine",
):
""" """
Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of
(1-beta) over time from t = [0,1]. (1-beta) over time from t = [0,1].
...@@ -58,19 +62,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999): ...@@ -58,19 +62,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999):
num_diffusion_timesteps (`int`): the number of betas to produce. num_diffusion_timesteps (`int`): the number of betas to produce.
max_beta (`float`): the maximum beta to use; use values lower than 1 to max_beta (`float`): the maximum beta to use; use values lower than 1 to
prevent singularities. prevent singularities.
alpha_transform_type (`str`, *optional*, default to `cosine`): the type of noise schedule for alpha_bar.
Choose from `cosine` or `exp`
Returns: Returns:
betas (`np.ndarray`): the betas used by the scheduler to step the model outputs betas (`np.ndarray`): the betas used by the scheduler to step the model outputs
""" """
if alpha_transform_type == "cosine":
def alpha_bar(time_step): def alpha_bar_fn(t):
return math.cos((time_step + 0.008) / 1.008 * math.pi / 2) ** 2 return math.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2
elif alpha_transform_type == "exp":
def alpha_bar_fn(t):
return math.exp(t * -12.0)
else:
raise ValueError(f"Unsupported alpha_tranform_type: {alpha_transform_type}")
betas = [] betas = []
for i in range(num_diffusion_timesteps): for i in range(num_diffusion_timesteps):
t1 = i / num_diffusion_timesteps t1 = i / num_diffusion_timesteps
t2 = (i + 1) / num_diffusion_timesteps t2 = (i + 1) / num_diffusion_timesteps
betas.append(min(1 - alpha_bar(t2) / alpha_bar(t1), max_beta)) betas.append(min(1 - alpha_bar_fn(t2) / alpha_bar_fn(t1), max_beta))
return torch.tensor(betas, dtype=torch.float32) return torch.tensor(betas, dtype=torch.float32)
......
...@@ -25,7 +25,11 @@ from .scheduling_utils import KarrasDiffusionSchedulers, SchedulerMixin, Schedul ...@@ -25,7 +25,11 @@ from .scheduling_utils import KarrasDiffusionSchedulers, SchedulerMixin, Schedul
# Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar # Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar
def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999): def betas_for_alpha_bar(
num_diffusion_timesteps,
max_beta=0.999,
alpha_transform_type="cosine",
):
""" """
Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of
(1-beta) over time from t = [0,1]. (1-beta) over time from t = [0,1].
...@@ -38,19 +42,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999): ...@@ -38,19 +42,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999):
num_diffusion_timesteps (`int`): the number of betas to produce. num_diffusion_timesteps (`int`): the number of betas to produce.
max_beta (`float`): the maximum beta to use; use values lower than 1 to max_beta (`float`): the maximum beta to use; use values lower than 1 to
prevent singularities. prevent singularities.
alpha_transform_type (`str`, *optional*, default to `cosine`): the type of noise schedule for alpha_bar.
Choose from `cosine` or `exp`
Returns: Returns:
betas (`np.ndarray`): the betas used by the scheduler to step the model outputs betas (`np.ndarray`): the betas used by the scheduler to step the model outputs
""" """
if alpha_transform_type == "cosine":
def alpha_bar(time_step): def alpha_bar_fn(t):
return math.cos((time_step + 0.008) / 1.008 * math.pi / 2) ** 2 return math.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2
elif alpha_transform_type == "exp":
def alpha_bar_fn(t):
return math.exp(t * -12.0)
else:
raise ValueError(f"Unsupported alpha_tranform_type: {alpha_transform_type}")
betas = [] betas = []
for i in range(num_diffusion_timesteps): for i in range(num_diffusion_timesteps):
t1 = i / num_diffusion_timesteps t1 = i / num_diffusion_timesteps
t2 = (i + 1) / num_diffusion_timesteps t2 = (i + 1) / num_diffusion_timesteps
betas.append(min(1 - alpha_bar(t2) / alpha_bar(t1), max_beta)) betas.append(min(1 - alpha_bar_fn(t2) / alpha_bar_fn(t1), max_beta))
return torch.tensor(betas, dtype=torch.float32) return torch.tensor(betas, dtype=torch.float32)
......
...@@ -43,7 +43,11 @@ class RePaintSchedulerOutput(BaseOutput): ...@@ -43,7 +43,11 @@ class RePaintSchedulerOutput(BaseOutput):
# Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar # Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar
def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999): def betas_for_alpha_bar(
num_diffusion_timesteps,
max_beta=0.999,
alpha_transform_type="cosine",
):
""" """
Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of
(1-beta) over time from t = [0,1]. (1-beta) over time from t = [0,1].
...@@ -56,19 +60,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999): ...@@ -56,19 +60,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999):
num_diffusion_timesteps (`int`): the number of betas to produce. num_diffusion_timesteps (`int`): the number of betas to produce.
max_beta (`float`): the maximum beta to use; use values lower than 1 to max_beta (`float`): the maximum beta to use; use values lower than 1 to
prevent singularities. prevent singularities.
alpha_transform_type (`str`, *optional*, default to `cosine`): the type of noise schedule for alpha_bar.
Choose from `cosine` or `exp`
Returns: Returns:
betas (`np.ndarray`): the betas used by the scheduler to step the model outputs betas (`np.ndarray`): the betas used by the scheduler to step the model outputs
""" """
if alpha_transform_type == "cosine":
def alpha_bar(time_step): def alpha_bar_fn(t):
return math.cos((time_step + 0.008) / 1.008 * math.pi / 2) ** 2 return math.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2
elif alpha_transform_type == "exp":
def alpha_bar_fn(t):
return math.exp(t * -12.0)
else:
raise ValueError(f"Unsupported alpha_tranform_type: {alpha_transform_type}")
betas = [] betas = []
for i in range(num_diffusion_timesteps): for i in range(num_diffusion_timesteps):
t1 = i / num_diffusion_timesteps t1 = i / num_diffusion_timesteps
t2 = (i + 1) / num_diffusion_timesteps t2 = (i + 1) / num_diffusion_timesteps
betas.append(min(1 - alpha_bar(t2) / alpha_bar(t1), max_beta)) betas.append(min(1 - alpha_bar_fn(t2) / alpha_bar_fn(t1), max_beta))
return torch.tensor(betas, dtype=torch.float32) return torch.tensor(betas, dtype=torch.float32)
......
...@@ -44,7 +44,11 @@ class UnCLIPSchedulerOutput(BaseOutput): ...@@ -44,7 +44,11 @@ class UnCLIPSchedulerOutput(BaseOutput):
# Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar # Copied from diffusers.schedulers.scheduling_ddpm.betas_for_alpha_bar
def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999): def betas_for_alpha_bar(
num_diffusion_timesteps,
max_beta=0.999,
alpha_transform_type="cosine",
):
""" """
Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of Create a beta schedule that discretizes the given alpha_t_bar function, which defines the cumulative product of
(1-beta) over time from t = [0,1]. (1-beta) over time from t = [0,1].
...@@ -57,19 +61,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999): ...@@ -57,19 +61,30 @@ def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999):
num_diffusion_timesteps (`int`): the number of betas to produce. num_diffusion_timesteps (`int`): the number of betas to produce.
max_beta (`float`): the maximum beta to use; use values lower than 1 to max_beta (`float`): the maximum beta to use; use values lower than 1 to
prevent singularities. prevent singularities.
alpha_transform_type (`str`, *optional*, default to `cosine`): the type of noise schedule for alpha_bar.
Choose from `cosine` or `exp`
Returns: Returns:
betas (`np.ndarray`): the betas used by the scheduler to step the model outputs betas (`np.ndarray`): the betas used by the scheduler to step the model outputs
""" """
if alpha_transform_type == "cosine":
def alpha_bar(time_step): def alpha_bar_fn(t):
return math.cos((time_step + 0.008) / 1.008 * math.pi / 2) ** 2 return math.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2
elif alpha_transform_type == "exp":
def alpha_bar_fn(t):
return math.exp(t * -12.0)
else:
raise ValueError(f"Unsupported alpha_tranform_type: {alpha_transform_type}")
betas = [] betas = []
for i in range(num_diffusion_timesteps): for i in range(num_diffusion_timesteps):
t1 = i / num_diffusion_timesteps t1 = i / num_diffusion_timesteps
t2 = (i + 1) / num_diffusion_timesteps t2 = (i + 1) / num_diffusion_timesteps
betas.append(min(1 - alpha_bar(t2) / alpha_bar(t1), max_beta)) betas.append(min(1 - alpha_bar_fn(t2) / alpha_bar_fn(t1), max_beta))
return torch.tensor(betas, dtype=torch.float32) return torch.tensor(betas, dtype=torch.float32)
......
...@@ -104,7 +104,7 @@ if is_torch_available(): ...@@ -104,7 +104,7 @@ if is_torch_available():
) )
from .torch_utils import maybe_allow_in_graph from .torch_utils import maybe_allow_in_graph
from .testing_utils import export_to_video from .testing_utils import export_to_gif, export_to_video
logger = get_logger(__name__) logger = get_logger(__name__)
......
...@@ -377,6 +377,36 @@ class SemanticStableDiffusionPipeline(metaclass=DummyObject): ...@@ -377,6 +377,36 @@ class SemanticStableDiffusionPipeline(metaclass=DummyObject):
requires_backends(cls, ["torch", "transformers"]) requires_backends(cls, ["torch", "transformers"])
class ShapEImg2ImgPipeline(metaclass=DummyObject):
_backends = ["torch", "transformers"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch", "transformers"])
@classmethod
def from_config(cls, *args, **kwargs):
requires_backends(cls, ["torch", "transformers"])
@classmethod
def from_pretrained(cls, *args, **kwargs):
requires_backends(cls, ["torch", "transformers"])
class ShapEPipeline(metaclass=DummyObject):
_backends = ["torch", "transformers"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch", "transformers"])
@classmethod
def from_config(cls, *args, **kwargs):
requires_backends(cls, ["torch", "transformers"])
@classmethod
def from_pretrained(cls, *args, **kwargs):
requires_backends(cls, ["torch", "transformers"])
class StableDiffusionAttendAndExcitePipeline(metaclass=DummyObject): class StableDiffusionAttendAndExcitePipeline(metaclass=DummyObject):
_backends = ["torch", "transformers"] _backends = ["torch", "transformers"]
......
...@@ -300,6 +300,21 @@ def preprocess_image(image: PIL.Image, batch_size: int): ...@@ -300,6 +300,21 @@ def preprocess_image(image: PIL.Image, batch_size: int):
return 2.0 * image - 1.0 return 2.0 * image - 1.0
def export_to_gif(image: List[PIL.Image.Image], output_gif_path: str = None) -> str:
if output_gif_path is None:
output_gif_path = tempfile.NamedTemporaryFile(suffix=".gif").name
image[0].save(
output_gif_path,
save_all=True,
append_images=image[1:],
optimize=False,
duration=100,
loop=0,
)
return output_gif_path
def export_to_video(video_frames: List[np.ndarray], output_video_path: str = None) -> str: def export_to_video(video_frames: List[np.ndarray], output_video_path: str = None) -> str:
if is_opencv_available(): if is_opencv_available():
import cv2 import cv2
......
# Copyright 2023 HuggingFace Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import gc
import unittest
import numpy as np
import torch
from transformers import CLIPTextConfig, CLIPTextModelWithProjection, CLIPTokenizer
from diffusers import HeunDiscreteScheduler, PriorTransformer, ShapEPipeline
from diffusers.pipelines.shap_e import ShapERenderer
from diffusers.utils import load_numpy, slow
from diffusers.utils.testing_utils import require_torch_gpu, torch_device
from ..test_pipelines_common import PipelineTesterMixin, assert_mean_pixel_difference
class ShapEPipelineFastTests(PipelineTesterMixin, unittest.TestCase):
pipeline_class = ShapEPipeline
params = ["prompt"]
batch_params = ["prompt"]
required_optional_params = [
"num_images_per_prompt",
"num_inference_steps",
"generator",
"latents",
"guidance_scale",
"frame_size",
"output_type",
"return_dict",
]
test_xformers_attention = False
@property
def text_embedder_hidden_size(self):
return 32
@property
def time_input_dim(self):
return 32
@property
def time_embed_dim(self):
return self.time_input_dim * 4
@property
def renderer_dim(self):
return 8
@property
def dummy_tokenizer(self):
tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
return tokenizer
@property
def dummy_text_encoder(self):
torch.manual_seed(0)
config = CLIPTextConfig(
bos_token_id=0,
eos_token_id=2,
hidden_size=self.text_embedder_hidden_size,
projection_dim=self.text_embedder_hidden_size,
intermediate_size=37,
layer_norm_eps=1e-05,
num_attention_heads=4,
num_hidden_layers=5,
pad_token_id=1,
vocab_size=1000,
)
return CLIPTextModelWithProjection(config)
@property
def dummy_prior(self):
torch.manual_seed(0)
model_kwargs = {
"num_attention_heads": 2,
"attention_head_dim": 16,
"embedding_dim": self.time_input_dim,
"num_embeddings": 32,
"embedding_proj_dim": self.text_embedder_hidden_size,
"time_embed_dim": self.time_embed_dim,
"num_layers": 1,
"clip_embed_dim": self.time_input_dim * 2,
"additional_embeddings": 0,
"time_embed_act_fn": "gelu",
"norm_in_type": "layer",
"encoder_hid_proj_type": None,
"added_emb_type": None,
}
model = PriorTransformer(**model_kwargs)
return model
@property
def dummy_renderer(self):
torch.manual_seed(0)
model_kwargs = {
"param_shapes": (
(self.renderer_dim, 93),
(self.renderer_dim, 8),
(self.renderer_dim, 8),
(self.renderer_dim, 8),
),
"d_latent": self.time_input_dim,
"d_hidden": self.renderer_dim,
"n_output": 12,
"background": (
0.1,
0.1,
0.1,
),
}
model = ShapERenderer(**model_kwargs)
return model
def get_dummy_components(self):
prior = self.dummy_prior
text_encoder = self.dummy_text_encoder
tokenizer = self.dummy_tokenizer
renderer = self.dummy_renderer
scheduler = HeunDiscreteScheduler(
beta_schedule="exp",
num_train_timesteps=1024,
prediction_type="sample",
use_karras_sigmas=True,
clip_sample=True,
clip_sample_range=1.0,
)
components = {
"prior": prior,
"text_encoder": text_encoder,
"tokenizer": tokenizer,
"renderer": renderer,
"scheduler": scheduler,
}
return components
def get_dummy_inputs(self, device, seed=0):
if str(device).startswith("mps"):
generator = torch.manual_seed(seed)
else:
generator = torch.Generator(device=device).manual_seed(seed)
inputs = {
"prompt": "horse",
"generator": generator,
"num_inference_steps": 1,
"frame_size": 32,
"output_type": "np",
}
return inputs
def test_shap_e(self):
device = "cpu"
components = self.get_dummy_components()
pipe = self.pipeline_class(**components)
pipe = pipe.to(device)
pipe.set_progress_bar_config(disable=None)
output = pipe(**self.get_dummy_inputs(device))
image = output.images[0]
image_slice = image[0, -3:, -3:, -1]
assert image.shape == (20, 32, 32, 3)
expected_slice = np.array(
[
0.00039216,
0.00039216,
0.00039216,
0.00039216,
0.00039216,
0.00039216,
0.00039216,
0.00039216,
0.00039216,
]
)
assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2
def test_inference_batch_consistent(self):
# NOTE: Larger batch sizes cause this test to timeout, only test on smaller batches
self._test_inference_batch_consistent(batch_sizes=[1, 2])
def test_inference_batch_single_identical(self):
test_max_difference = torch_device == "cpu"
relax_max_difference = True
self._test_inference_batch_single_identical(
batch_size=2,
test_max_difference=test_max_difference,
relax_max_difference=relax_max_difference,
)
def test_num_images_per_prompt(self):
components = self.get_dummy_components()
pipe = self.pipeline_class(**components)
pipe = pipe.to(torch_device)
pipe.set_progress_bar_config(disable=None)
batch_size = 1
num_images_per_prompt = 2
inputs = self.get_dummy_inputs(torch_device)
for key in inputs.keys():
if key in self.batch_params:
inputs[key] = batch_size * [inputs[key]]
images = pipe(**inputs, num_images_per_prompt=num_images_per_prompt)[0]
assert images.shape[0] == batch_size * num_images_per_prompt
@slow
@require_torch_gpu
class ShapEPipelineIntegrationTests(unittest.TestCase):
def tearDown(self):
# clean up the VRAM after each test
super().tearDown()
gc.collect()
torch.cuda.empty_cache()
def test_shap_e(self):
expected_image = load_numpy(
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main"
"/shap_e/test_shap_e_np_out.npy"
)
pipe = ShapEPipeline.from_pretrained("openai/shap-e")
pipe = pipe.to(torch_device)
pipe.set_progress_bar_config(disable=None)
generator = torch.Generator(device=torch_device).manual_seed(0)
images = pipe(
"a shark",
generator=generator,
guidance_scale=15.0,
num_inference_steps=64,
frame_size=64,
output_type="np",
).images[0]
assert images.shape == (20, 64, 64, 3)
assert_mean_pixel_difference(images, expected_image)
# Copyright 2023 HuggingFace Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import gc
import random
import unittest
import numpy as np
import torch
from transformers import CLIPImageProcessor, CLIPVisionConfig, CLIPVisionModel
from diffusers import HeunDiscreteScheduler, PriorTransformer, ShapEImg2ImgPipeline
from diffusers.pipelines.shap_e import ShapERenderer
from diffusers.utils import floats_tensor, load_image, load_numpy, slow
from diffusers.utils.testing_utils import require_torch_gpu, torch_device
from ..test_pipelines_common import PipelineTesterMixin, assert_mean_pixel_difference
class ShapEImg2ImgPipelineFastTests(PipelineTesterMixin, unittest.TestCase):
pipeline_class = ShapEImg2ImgPipeline
params = ["image"]
batch_params = ["image"]
required_optional_params = [
"num_images_per_prompt",
"num_inference_steps",
"generator",
"latents",
"guidance_scale",
"frame_size",
"output_type",
"return_dict",
]
test_xformers_attention = False
@property
def text_embedder_hidden_size(self):
return 32
@property
def time_input_dim(self):
return 32
@property
def time_embed_dim(self):
return self.time_input_dim * 4
@property
def renderer_dim(self):
return 8
@property
def dummy_image_encoder(self):
torch.manual_seed(0)
config = CLIPVisionConfig(
hidden_size=self.text_embedder_hidden_size,
image_size=64,
projection_dim=self.text_embedder_hidden_size,
intermediate_size=37,
num_attention_heads=4,
num_channels=3,
num_hidden_layers=5,
patch_size=1,
)
model = CLIPVisionModel(config)
return model
@property
def dummy_image_processor(self):
image_processor = CLIPImageProcessor(
crop_size=224,
do_center_crop=True,
do_normalize=True,
do_resize=True,
image_mean=[0.48145466, 0.4578275, 0.40821073],
image_std=[0.26862954, 0.26130258, 0.27577711],
resample=3,
size=224,
)
return image_processor
@property
def dummy_prior(self):
torch.manual_seed(0)
model_kwargs = {
"num_attention_heads": 2,
"attention_head_dim": 16,
"embedding_dim": self.time_input_dim,
"num_embeddings": 32,
"embedding_proj_dim": self.text_embedder_hidden_size,
"time_embed_dim": self.time_embed_dim,
"num_layers": 1,
"clip_embed_dim": self.time_input_dim * 2,
"additional_embeddings": 0,
"time_embed_act_fn": "gelu",
"norm_in_type": "layer",
"embedding_proj_norm_type": "layer",
"encoder_hid_proj_type": None,
"added_emb_type": None,
}
model = PriorTransformer(**model_kwargs)
return model
@property
def dummy_renderer(self):
torch.manual_seed(0)
model_kwargs = {
"param_shapes": (
(self.renderer_dim, 93),
(self.renderer_dim, 8),
(self.renderer_dim, 8),
(self.renderer_dim, 8),
),
"d_latent": self.time_input_dim,
"d_hidden": self.renderer_dim,
"n_output": 12,
"background": (
0.1,
0.1,
0.1,
),
}
model = ShapERenderer(**model_kwargs)
return model
def get_dummy_components(self):
prior = self.dummy_prior
image_encoder = self.dummy_image_encoder
image_processor = self.dummy_image_processor
renderer = self.dummy_renderer
scheduler = HeunDiscreteScheduler(
beta_schedule="exp",
num_train_timesteps=1024,
prediction_type="sample",
use_karras_sigmas=True,
clip_sample=True,
clip_sample_range=1.0,
)
components = {
"prior": prior,
"image_encoder": image_encoder,
"image_processor": image_processor,
"renderer": renderer,
"scheduler": scheduler,
}
return components
def get_dummy_inputs(self, device, seed=0):
input_image = floats_tensor((1, 3, 64, 64), rng=random.Random(seed)).to(device)
if str(device).startswith("mps"):
generator = torch.manual_seed(seed)
else:
generator = torch.Generator(device=device).manual_seed(seed)
inputs = {
"image": input_image,
"generator": generator,
"num_inference_steps": 1,
"frame_size": 32,
"output_type": "np",
}
return inputs
def test_shap_e(self):
device = "cpu"
components = self.get_dummy_components()
pipe = self.pipeline_class(**components)
pipe = pipe.to(device)
pipe.set_progress_bar_config(disable=None)
output = pipe(**self.get_dummy_inputs(device))
image = output.images[0]
image_slice = image[0, -3:, -3:, -1]
assert image.shape == (20, 32, 32, 3)
expected_slice = np.array(
[
0.00039216,
0.00039216,
0.00039216,
0.00039216,
0.00039216,
0.00039216,
0.00039216,
0.00039216,
0.00039216,
]
)
assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2
def test_inference_batch_consistent(self):
# NOTE: Larger batch sizes cause this test to timeout, only test on smaller batches
self._test_inference_batch_consistent(batch_sizes=[1, 2])
def test_inference_batch_single_identical(self):
test_max_difference = torch_device == "cpu"
relax_max_difference = True
self._test_inference_batch_single_identical(
batch_size=2,
test_max_difference=test_max_difference,
relax_max_difference=relax_max_difference,
)
def test_num_images_per_prompt(self):
components = self.get_dummy_components()
pipe = self.pipeline_class(**components)
pipe = pipe.to(torch_device)
pipe.set_progress_bar_config(disable=None)
batch_size = 1
num_images_per_prompt = 2
inputs = self.get_dummy_inputs(torch_device)
for key in inputs.keys():
if key in self.batch_params:
inputs[key] = batch_size * [inputs[key]]
images = pipe(**inputs, num_images_per_prompt=num_images_per_prompt)[0]
assert images.shape[0] == batch_size * num_images_per_prompt
@slow
@require_torch_gpu
class ShapEImg2ImgPipelineIntegrationTests(unittest.TestCase):
def tearDown(self):
# clean up the VRAM after each test
super().tearDown()
gc.collect()
torch.cuda.empty_cache()
def test_shap_e_img2img(self):
input_image = load_image(
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main" "/shap_e/corgi.png"
)
expected_image = load_numpy(
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main"
"/shap_e/test_shap_e_img2img_out.npy"
)
pipe = ShapEImg2ImgPipeline.from_pretrained("openai/shap-e-img2img")
pipe = pipe.to(torch_device)
pipe.set_progress_bar_config(disable=None)
generator = torch.Generator(device=torch_device).manual_seed(0)
images = pipe(
input_image,
generator=generator,
guidance_scale=3.0,
num_inference_steps=64,
frame_size=64,
output_type="np",
).images[0]
assert images.shape == (20, 64, 64, 3)
assert_mean_pixel_difference(images, expected_image)
...@@ -30,11 +30,15 @@ class HeunDiscreteSchedulerTest(SchedulerCommonTest): ...@@ -30,11 +30,15 @@ class HeunDiscreteSchedulerTest(SchedulerCommonTest):
self.check_over_configs(beta_start=beta_start, beta_end=beta_end) self.check_over_configs(beta_start=beta_start, beta_end=beta_end)
def test_schedules(self): def test_schedules(self):
for schedule in ["linear", "scaled_linear"]: for schedule in ["linear", "scaled_linear", "exp"]:
self.check_over_configs(beta_schedule=schedule) self.check_over_configs(beta_schedule=schedule)
def test_clip_sample(self):
for clip_sample_range in [1.0, 2.0, 3.0]:
self.check_over_configs(clip_sample_range=clip_sample_range, clip_sample=True)
def test_prediction_type(self): def test_prediction_type(self):
for prediction_type in ["epsilon", "v_prediction"]: for prediction_type in ["epsilon", "v_prediction", "sample"]:
self.check_over_configs(prediction_type=prediction_type) self.check_over_configs(prediction_type=prediction_type)
def test_full_loop_no_noise(self): def test_full_loop_no_noise(self):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment