Unverified Commit 16ad13b6 authored by Steven Liu's avatar Steven Liu Committed by GitHub
Browse files

[docs] Clean scheduler api (#4204)

* clean scheduler mixin

* up to dpmsolvermultistep

* finish cleaning

* first draft

* fix overview table

* apply feedback

* update reference code
parent da0e2fce
...@@ -10,15 +10,17 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o ...@@ -10,15 +10,17 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License. specific language governing permissions and limitations under the License.
--> -->
# Variance Preserving Stochastic Differential Equation (VP-SDE) scheduler # ScoreSdeVpScheduler
## Overview `ScoreSdeVpScheduler` is a variance preserving stochastic differential equation (SDE) scheduler. It was introduced in the [Score-Based Generative Modeling through Stochastic Differential Equations](https://huggingface.co/papers/2011.13456) paper by Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole.
Original paper can be found [here](https://arxiv.org/abs/2011.13456). The abstract from the paper is:
*Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model*.
<Tip warning={true}> <Tip warning={true}>
Score SDE-VP is under construction. 🚧 This scheduler is under construction!
</Tip> </Tip>
......
...@@ -10,11 +10,26 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o ...@@ -10,11 +10,26 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License. specific language governing permissions and limitations under the License.
--> -->
# Singlestep DPM-Solver # DPMSolverSinglestepScheduler
## Overview `DPMSolverSinglestepScheduler` is a single step scheduler from [DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps](https://huggingface.co/papers/2206.00927) and [DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models](https://huggingface.co/papers/2211.01095) by Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu.
Original paper can be found [here](https://arxiv.org/abs/2206.00927) and the [improved version](https://arxiv.org/abs/2211.01095). The original implementation can be found [here](https://github.com/LuChengTHU/dpm-solver). DPMSolver (and the improved version DPMSolver++) is a fast dedicated high-order solver for diffusion ODEs with convergence order guarantee. Empirically, DPMSolver sampling with only 20 steps can generate high-quality
samples, and it can generate quite good samples even in 10 steps.
The original implementation can be found at [LuChengTHU/dpm-solver](https://github.com/LuChengTHU/dpm-solver).
## Tips
It is recommended to set `solver_order` to 2 for guide sampling, and `solver_order=3` for unconditional sampling.
Dynamic thresholding from Imagen (https://huggingface.co/papers/2205.11487) is supported, and for pixel-space
diffusion models, you can set both `algorithm_type="dpmsolver++"` and `thresholding=True` to use dynamic
thresholding. This thresholding method is unsuitable for latent-space diffusion models such as
Stable Diffusion.
## DPMSolverSinglestepScheduler ## DPMSolverSinglestepScheduler
[[autodoc]] DPMSolverSinglestepScheduler [[autodoc]] DPMSolverSinglestepScheduler
## SchedulerOutput
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
\ No newline at end of file
...@@ -10,11 +10,12 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o ...@@ -10,11 +10,12 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License. specific language governing permissions and limitations under the License.
--> -->
# Variance exploding, stochastic sampling from Karras et. al # KarrasVeScheduler
## Overview `KarrasVeScheduler` is a stochastic sampler tailored o variance-expanding (VE) models. It is based on the [Elucidating the Design Space of Diffusion-Based Generative Models](https://huggingface.co/papers/2206.00364) and [Score-based generative modeling through stochastic differential equations](https://huggingface.co/papers/2011.13456) papers.
Original paper can be found [here](https://arxiv.org/abs/2206.00364).
## KarrasVeScheduler ## KarrasVeScheduler
[[autodoc]] KarrasVeScheduler [[autodoc]] KarrasVeScheduler
## KarrasVeOutput
[[autodoc]] schedulers.scheduling_karras_ve.KarrasVeOutput
\ No newline at end of file
...@@ -10,15 +10,28 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o ...@@ -10,15 +10,28 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License. specific language governing permissions and limitations under the License.
--> -->
# UniPC # UniPCMultistepScheduler
## Overview `UniPCMultistepScheduler` is a training-free framework designed for fast sampling of diffusion models. It was introduced in [UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models](https://huggingface.co/papers/2302.04867) by Wenliang Zhao, Lujia Bai, Yongming Rao, Jie Zhou, Jiwen Lu.
UniPC is a training-free framework designed for the fast sampling of diffusion models, which consists of a corrector (UniC) and a predictor (UniP) that share a unified analytical form and support arbitrary orders. It consists of a corrector (UniC) and a predictor (UniP) that share a unified analytical form and support arbitrary orders.
UniPC is by design model-agnostic, supporting pixel-space/latent-space DPMs on unconditional/conditional sampling. It can also be applied to both noise prediction and data prediction models. The corrector UniC can be also applied after any off-the-shelf solvers to increase the order of accuracy.
For more details about the method, please refer to the [paper](https://arxiv.org/abs/2302.04867) and the [code](https://github.com/wl-zhao/UniPC). The abstract from the paper is:
Fast Sampling of Diffusion Models with Exponential Integrator. *Diffusion probabilistic models (DPMs) have demonstrated a very promising ability in high-resolution image synthesis. However, sampling from a pre-trained DPM usually requires hundreds of model evaluations, which is computationally expensive. Despite recent progress in designing high-order solvers for DPMs, there still exists room for further speedup, especially in extremely few steps (e.g., 5~10 steps). Inspired by the predictor-corrector for ODE solvers, we develop a unified corrector (UniC) that can be applied after any existing DPM sampler to increase the order of accuracy without extra model evaluations, and derive a unified predictor (UniP) that supports arbitrary order as a byproduct. Combining UniP and UniC, we propose a unified predictor-corrector framework called UniPC for the fast sampling of DPMs, which has a unified analytical form for any order and can significantly improve the sampling quality over previous methods. We evaluate our methods through extensive experiments including both unconditional and conditional sampling using pixel-space and latent-space DPMs. Our UniPC can achieve 3.87 FID on CIFAR10 (unconditional) and 7.51 FID on ImageNet 256times256 (conditional) with only 10 function evaluations. Code is available at https://github.com/wl-zhao/UniPC*.
The original codebase can be found at [wl-zhao/UniPC](https://github.com/wl-zhao/UniPC).
## Tips
It is recommended to set `solver_order` to 2 for guide sampling, and `solver_order=3` for unconditional sampling.
Dynamic thresholding from Imagen (https://huggingface.co/papers/2205.11487) is supported, and for pixel-space
diffusion models, you can set both `predict_x0=True` and `thresholding=True` to use dynamic thresholding. This thresholding method is unsuitable for latent-space diffusion models such as Stable Diffusion.
## UniPCMultistepScheduler ## UniPCMultistepScheduler
[[autodoc]] UniPCMultistepScheduler [[autodoc]] UniPCMultistepScheduler
## SchedulerOutput
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
\ No newline at end of file
...@@ -12,9 +12,14 @@ specific language governing permissions and limitations under the License. ...@@ -12,9 +12,14 @@ specific language governing permissions and limitations under the License.
# VQDiffusionScheduler # VQDiffusionScheduler
## Overview `VQDiffusionScheduler` converts the transformer model's output into a sample for the unnoised image at the previous diffusion timestep. It was introduced in [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://huggingface.co/papers/2111.14822) by Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, Baining Guo.
Original paper can be found [here](https://arxiv.org/abs/2111.14822) The abstract from the paper is:
*We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation. This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model (DDPM). We find that this latent-space method is well-suited for text-to-image generation tasks because it not only eliminates the unidirectional bias with existing methods but also allows us to incorporate a mask-and-replace diffusion strategy to avoid the accumulation of errors, which is a serious problem with existing methods. Our experiments show that the VQ-Diffusion produces significantly better text-to-image generation results when compared with conventional autoregressive (AR) models with similar numbers of parameters. Compared with previous GAN-based text-to-image methods, our VQ-Diffusion can handle more complex scenes and improve the synthesized image quality by a large margin. Finally, we show that the image generation computation in our method can be made highly efficient by reparameterization. With traditional AR methods, the text-to-image generation time increases linearly with the output image resolution and hence is quite time consuming even for normal size images. The VQ-Diffusion allows us to achieve a better trade-off between quality and speed. Our experiments indicate that the VQ-Diffusion model with the reparameterization is fifteen times faster than traditional AR methods while achieving a better image quality.*
## VQDiffusionScheduler ## VQDiffusionScheduler
[[autodoc]] VQDiffusionScheduler [[autodoc]] VQDiffusionScheduler
## VQDiffusionSchedulerOutput
[[autodoc]] schedulers.scheduling_vq_diffusion.VQDiffusionSchedulerOutput
\ No newline at end of file
...@@ -29,11 +29,11 @@ logger = logging.get_logger(__name__) # pylint: disable=invalid-name ...@@ -29,11 +29,11 @@ logger = logging.get_logger(__name__) # pylint: disable=invalid-name
@dataclass @dataclass
class CMStochasticIterativeSchedulerOutput(BaseOutput): class CMStochasticIterativeSchedulerOutput(BaseOutput):
""" """
Output class for the scheduler's step function output. Output class for the scheduler's `step` function.
Args: Args:
prev_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images): prev_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):
Computed sample (x_{t-1}) of previous timestep. `prev_sample` should be used as next model input in the Computed sample `(x_{t-1})` of previous timestep. `prev_sample` should be used as next model input in the
denoising loop. denoising loop.
""" """
...@@ -42,38 +42,32 @@ class CMStochasticIterativeSchedulerOutput(BaseOutput): ...@@ -42,38 +42,32 @@ class CMStochasticIterativeSchedulerOutput(BaseOutput):
class CMStochasticIterativeScheduler(SchedulerMixin, ConfigMixin): class CMStochasticIterativeScheduler(SchedulerMixin, ConfigMixin):
""" """
Multistep and onestep sampling for consistency models from Song et al. 2023 [1]. This implements Algorithm 1 in the Multistep and onestep sampling for consistency models.
paper [1].
[1] Song, Yang and Dhariwal, Prafulla and Chen, Mark and Sutskever, Ilya. "Consistency Models" This model inherits from [`SchedulerMixin`] and [`ConfigMixin`]. Check the superclass documentation for the generic
https://arxiv.org/pdf/2303.01469 [2] Karras, Tero, et al. "Elucidating the Design Space of Diffusion-Based methods the library implements for all schedulers such as loading and saving.
Generative Models." https://arxiv.org/abs/2206.00364
[`~ConfigMixin`] takes care of storing all config attributes that are passed in the scheduler's `__init__`
function, such as `num_train_timesteps`. They can be accessed via `scheduler.config.num_train_timesteps`.
[`SchedulerMixin`] provides general loading and saving functionality via the [`SchedulerMixin.save_pretrained`] and
[`~SchedulerMixin.from_pretrained`] functions.
Args: Args:
num_train_timesteps (`int`): number of diffusion steps used to train the model. num_train_timesteps (`int`, defaults to 40):
sigma_min (`float`): The number of diffusion steps to train the model.
Minimum noise magnitude in the sigma schedule. This was set to 0.002 in the original implementation. sigma_min (`float`, defaults to 0.002):
sigma_max (`float`): Minimum noise magnitude in the sigma schedule. Defaults to 0.002 from the original implementation.
Maximum noise magnitude in the sigma schedule. This was set to 80.0 in the original implementation. sigma_max (`float`, defaults to 80.0):
sigma_data (`float`): Maximum noise magnitude in the sigma schedule. Defaults to 80.0 from the original implementation.
The standard deviation of the data distribution, following the EDM paper [2]. This was set to 0.5 in the sigma_data (`float`, defaults to 0.5):
original implementation, which is also the original value suggested in the EDM paper. The standard deviation of the data distribution from the EDM
s_noise (`float`): [paper](https://huggingface.co/papers/2206.00364). Defaults to 0.5 from the original implementation.
s_noise (`float`, defaults to 1.0):
The amount of additional noise to counteract loss of detail during sampling. A reasonable range is [1.000, The amount of additional noise to counteract loss of detail during sampling. A reasonable range is [1.000,
1.011]. This was set to 1.0 in the original implementation. 1.011]. Defaults to 1.0 from the original implementation.
rho (`float`): rho (`float`, defaults to 7.0):
The rho parameter used for calculating the Karras sigma schedule, introduced in the EDM paper [2]. This was The parameter for calculating the Karras sigma schedule from the EDM
set to 7.0 in the original implementation, which is also the original value suggested in the EDM paper. [paper](https://huggingface.co/papers/2206.00364). Defaults to 7.0 from the original implementation.
clip_denoised (`bool`): clip_denoised (`bool`, defaults to `True`):
Whether to clip the denoised outputs to `(-1, 1)`. Defaults to `True`. Whether to clip the denoised outputs to `(-1, 1)`.
timesteps (`List` or `np.ndarray` or `torch.Tensor`, *optional*): timesteps (`List` or `np.ndarray` or `torch.Tensor`, *optional*):
Optionally, an explicit timestep schedule can be specified. The timesteps are expected to be in increasing An explicit timestep schedule that can be optionally specified. The timesteps are expected to be in
order. increasing order.
""" """
order = 1 order = 1
...@@ -114,13 +108,17 @@ class CMStochasticIterativeScheduler(SchedulerMixin, ConfigMixin): ...@@ -114,13 +108,17 @@ class CMStochasticIterativeScheduler(SchedulerMixin, ConfigMixin):
self, sample: torch.FloatTensor, timestep: Union[float, torch.FloatTensor] self, sample: torch.FloatTensor, timestep: Union[float, torch.FloatTensor]
) -> torch.FloatTensor: ) -> torch.FloatTensor:
""" """
Scales the consistency model input by `(sigma**2 + sigma_data**2) ** 0.5`, following the EDM model. Scales the consistency model input by `(sigma**2 + sigma_data**2) ** 0.5`.
Args: Args:
sample (`torch.FloatTensor`): input sample sample (`torch.FloatTensor`):
timestep (`float` or `torch.FloatTensor`): the current timestep in the diffusion chain The input sample.
timestep (`float` or `torch.FloatTensor`):
The current timestep in the diffusion chain.
Returns: Returns:
`torch.FloatTensor`: scaled input sample `torch.FloatTensor`:
A scaled input sample.
""" """
# Get sigma corresponding to timestep # Get sigma corresponding to timestep
if isinstance(timestep, torch.Tensor): if isinstance(timestep, torch.Tensor):
...@@ -135,12 +133,15 @@ class CMStochasticIterativeScheduler(SchedulerMixin, ConfigMixin): ...@@ -135,12 +133,15 @@ class CMStochasticIterativeScheduler(SchedulerMixin, ConfigMixin):
def sigma_to_t(self, sigmas: Union[float, np.ndarray]): def sigma_to_t(self, sigmas: Union[float, np.ndarray]):
""" """
Gets scaled timesteps from the Karras sigmas, for input to the consistency model. Gets scaled timesteps from the Karras sigmas for input to the consistency model.
Args: Args:
sigmas (`float` or `np.ndarray`): single Karras sigma or array of Karras sigmas sigmas (`float` or `np.ndarray`):
A single Karras sigma or an array of Karras sigmas.
Returns: Returns:
`float` or `np.ndarray`: scaled input timestep or scaled input timestep array `float` or `np.ndarray`:
A scaled input timestep or scaled input timestep array.
""" """
if not isinstance(sigmas, np.ndarray): if not isinstance(sigmas, np.ndarray):
sigmas = np.array(sigmas, dtype=np.float64) sigmas = np.array(sigmas, dtype=np.float64)
...@@ -156,17 +157,17 @@ class CMStochasticIterativeScheduler(SchedulerMixin, ConfigMixin): ...@@ -156,17 +157,17 @@ class CMStochasticIterativeScheduler(SchedulerMixin, ConfigMixin):
timesteps: Optional[List[int]] = None, timesteps: Optional[List[int]] = None,
): ):
""" """
Sets the timesteps used for the diffusion chain. Supporting function to be run before inference. Sets the timesteps used for the diffusion chain (to be run before inference).
Args: Args:
num_inference_steps (`int`): num_inference_steps (`int`):
the number of diffusion steps used when generating samples with a pre-trained model. The number of diffusion steps used when generating samples with a pre-trained model.
device (`str` or `torch.device`, optional): device (`str` or `torch.device`, *optional*):
the device to which the timesteps should be moved to. If `None`, the timesteps are not moved. The device to which the timesteps should be moved to. If `None`, the timesteps are not moved.
timesteps (`List[int]`, optional): timesteps (`List[int]`, *optional*):
custom timesteps used to support arbitrary spacing between timesteps. If `None`, then the default Custom timesteps used to support arbitrary spacing between timesteps. If `None`, then the default
timestep spacing strategy of equal spacing between timesteps is used. If passed, `num_inference_steps` timestep spacing strategy of equal spacing between timesteps is used. If `timesteps` is passed,
must be `None`. `num_inference_steps` must be `None`.
""" """
if num_inference_steps is None and timesteps is None: if num_inference_steps is None and timesteps is None:
raise ValueError("Exactly one of `num_inference_steps` or `timesteps` must be supplied.") raise ValueError("Exactly one of `num_inference_steps` or `timesteps` must be supplied.")
...@@ -241,17 +242,22 @@ class CMStochasticIterativeScheduler(SchedulerMixin, ConfigMixin): ...@@ -241,17 +242,22 @@ class CMStochasticIterativeScheduler(SchedulerMixin, ConfigMixin):
def get_scalings_for_boundary_condition(self, sigma): def get_scalings_for_boundary_condition(self, sigma):
""" """
Gets the scalings used in the consistency model parameterization, following Appendix C of the original paper. Gets the scalings used in the consistency model parameterization (from Appendix C of the
This enforces the consistency model boundary condition. [paper](https://huggingface.co/papers/2303.01469)) to enforce boundary condition.
Note that `epsilon` in the equations for c_skip and c_out is set to sigma_min. <Tip>
`epsilon` in the equations for `c_skip` and `c_out` is set to `sigma_min`.
</Tip>
Args: Args:
sigma (`torch.FloatTensor`): sigma (`torch.FloatTensor`):
The current sigma in the Karras sigma schedule. The current sigma in the Karras sigma schedule.
Returns: Returns:
`tuple`: `tuple`:
A two-element tuple where c_skip (which weights the current sample) is the first element and c_out A two-element tuple where `c_skip` (which weights the current sample) is the first element and `c_out`
(which weights the consistency model output) is the second element. (which weights the consistency model output) is the second element.
""" """
sigma_min = self.config.sigma_min sigma_min = self.config.sigma_min
...@@ -270,20 +276,27 @@ class CMStochasticIterativeScheduler(SchedulerMixin, ConfigMixin): ...@@ -270,20 +276,27 @@ class CMStochasticIterativeScheduler(SchedulerMixin, ConfigMixin):
return_dict: bool = True, return_dict: bool = True,
) -> Union[CMStochasticIterativeSchedulerOutput, Tuple]: ) -> Union[CMStochasticIterativeSchedulerOutput, Tuple]:
""" """
Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion Predict the sample from the previous timestep by reversing the SDE. This function propagates the diffusion
process from the learned model outputs (most often the predicted noise). process from the learned model outputs (most often the predicted noise).
Args: Args:
model_output (`torch.FloatTensor`): direct output from learned diffusion model. model_output (`torch.FloatTensor`):
timestep (`float`): current timestep in the diffusion chain. The direct output from the learned diffusion model.
timestep (`float`):
The current timestep in the diffusion chain.
sample (`torch.FloatTensor`): sample (`torch.FloatTensor`):
current instance of sample being created by diffusion process. A current instance of a sample created by the diffusion process.
generator (`torch.Generator`, *optional*): Random number generator. generator (`torch.Generator`, *optional*):
return_dict (`bool`): option for returning tuple rather than EulerDiscreteSchedulerOutput class A random number generator.
return_dict (`bool`, *optional*, defaults to `True`):
Whether or not to return a
[`~schedulers.scheduling_consistency_models.CMStochasticIterativeSchedulerOutput`] or `tuple`.
Returns: Returns:
[`~schedulers.scheduling_utils.CMStochasticIterativeSchedulerOutput`] or `tuple`: [`~schedulers.scheduling_consistency_models.CMStochasticIterativeSchedulerOutput`] or `tuple`:
[`~schedulers.scheduling_utils.CMStochasticIterativeSchedulerOutput`] if `return_dict` is True, otherwise a If return_dict is `True`,
`tuple`. When returning a tuple, the first element is the sample tensor. [`~schedulers.scheduling_consistency_models.CMStochasticIterativeSchedulerOutput`] is returned,
otherwise a tuple is returned where the first element is the sample tensor.
""" """
if ( if (
......
...@@ -31,14 +31,14 @@ from .scheduling_utils import KarrasDiffusionSchedulers, SchedulerMixin ...@@ -31,14 +31,14 @@ from .scheduling_utils import KarrasDiffusionSchedulers, SchedulerMixin
# Copied from diffusers.schedulers.scheduling_ddpm.DDPMSchedulerOutput with DDPM->DDIM # Copied from diffusers.schedulers.scheduling_ddpm.DDPMSchedulerOutput with DDPM->DDIM
class DDIMSchedulerOutput(BaseOutput): class DDIMSchedulerOutput(BaseOutput):
""" """
Output class for the scheduler's step function output. Output class for the scheduler's `step` function output.
Args: Args:
prev_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images): prev_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):
Computed sample (x_{t-1}) of previous timestep. `prev_sample` should be used as next model input in the Computed sample `(x_{t-1})` of previous timestep. `prev_sample` should be used as next model input in the
denoising loop. denoising loop.
pred_original_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images): pred_original_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):
The predicted denoised sample (x_{0}) based on the model output from the current timestep. The predicted denoised sample `(x_{0})` based on the model output from the current timestep.
`pred_original_sample` can be used to preview progress or for guidance. `pred_original_sample` can be used to preview progress or for guidance.
""" """
...@@ -129,57 +129,53 @@ def rescale_zero_terminal_snr(betas): ...@@ -129,57 +129,53 @@ def rescale_zero_terminal_snr(betas):
class DDIMScheduler(SchedulerMixin, ConfigMixin): class DDIMScheduler(SchedulerMixin, ConfigMixin):
""" """
Denoising diffusion implicit models is a scheduler that extends the denoising procedure introduced in denoising `DDIMScheduler` extends the denoising procedure introduced in denoising diffusion probabilistic models (DDPMs) with
diffusion probabilistic models (DDPMs) with non-Markovian guidance. non-Markovian guidance.
[`~ConfigMixin`] takes care of storing all config attributes that are passed in the scheduler's `__init__` This model inherits from [`SchedulerMixin`] and [`ConfigMixin`]. Check the superclass documentation for the generic
function, such as `num_train_timesteps`. They can be accessed via `scheduler.config.num_train_timesteps`. methods the library implements for all schedulers such as loading and saving.
[`SchedulerMixin`] provides general loading and saving functionality via the [`SchedulerMixin.save_pretrained`] and
[`~SchedulerMixin.from_pretrained`] functions.
For more details, see the original paper: https://arxiv.org/abs/2010.02502
Args: Args:
num_train_timesteps (`int`): number of diffusion steps used to train the model. num_train_timesteps (`int`, defaults to 1000):
beta_start (`float`): the starting `beta` value of inference. The number of diffusion steps to train the model.
beta_end (`float`): the final `beta` value. beta_start (`float`, defaults to 0.0001):
beta_schedule (`str`): The starting `beta` value of inference.
the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from beta_end (`float`, defaults to 0.02):
The final `beta` value.
beta_schedule (`str`, defaults to `"linear"`):
The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from
`linear`, `scaled_linear`, or `squaredcos_cap_v2`. `linear`, `scaled_linear`, or `squaredcos_cap_v2`.
trained_betas (`np.ndarray`, optional): trained_betas (`np.ndarray`, *optional*):
option to pass an array of betas directly to the constructor to bypass `beta_start`, `beta_end` etc. Pass an array of betas directly to the constructor to bypass `beta_start` and `beta_end`.
clip_sample (`bool`, default `True`): clip_sample (`bool`, defaults to `True`):
option to clip predicted sample for numerical stability. Clip the predicted sample for numerical stability.
clip_sample_range (`float`, default `1.0`): clip_sample_range (`float`, defaults to 1.0):
the maximum magnitude for sample clipping. Valid only when `clip_sample=True`. The maximum magnitude for sample clipping. Valid only when `clip_sample=True`.
set_alpha_to_one (`bool`, default `True`): set_alpha_to_one (`bool`, defaults to `True`):
each diffusion step uses the value of alphas product at that step and at the previous one. For the final Each diffusion step uses the alphas product value at that step and at the previous one. For the final step
step there is no previous alpha. When this option is `True` the previous alpha product is fixed to `1`, there is no previous alpha. When this option is `True` the previous alpha product is fixed to `1`,
otherwise it uses the value of alpha at step 0. otherwise it uses the alpha value at step 0.
steps_offset (`int`, default `0`): steps_offset (`int`, defaults to 0):
an offset added to the inference steps. You can use a combination of `offset=1` and An offset added to the inference steps. You can use a combination of `offset=1` and
`set_alpha_to_one=False`, to make the last step use step 0 for the previous alpha product, as done in `set_alpha_to_one=False` to make the last step use step 0 for the previous alpha product like in Stable
stable diffusion. Diffusion.
prediction_type (`str`, default `epsilon`, optional): prediction_type (`str`, defaults to `epsilon`, *optional*):
prediction type of the scheduler function, one of `epsilon` (predicting the noise of the diffusion Prediction type of the scheduler function; can be `epsilon` (predicts the noise of the diffusion process),
process), `sample` (directly predicting the noisy sample`) or `v_prediction` (see section 2.4 `sample` (directly predicts the noisy sample`) or `v_prediction` (see section 2.4 of [Imagen
https://imagen.research.google/video/paper.pdf) Video](https://imagen.research.google/video/paper.pdf) paper).
thresholding (`bool`, default `False`): thresholding (`bool`, defaults to `False`):
whether to use the "dynamic thresholding" method (introduced by Imagen, https://arxiv.org/abs/2205.11487). Whether to use the "dynamic thresholding" method. This is unsuitable for latent-space diffusion models such
Note that the thresholding method is unsuitable for latent-space diffusion models (such as as Stable Diffusion.
stable-diffusion). dynamic_thresholding_ratio (`float`, defaults to 0.995):
dynamic_thresholding_ratio (`float`, default `0.995`): The ratio for the dynamic thresholding method. Valid only when `thresholding=True`.
the ratio for the dynamic thresholding method. Default is `0.995`, the same as Imagen sample_max_value (`float`, defaults to 1.0):
(https://arxiv.org/abs/2205.11487). Valid only when `thresholding=True`. The threshold value for dynamic thresholding. Valid only when `thresholding=True`.
sample_max_value (`float`, default `1.0`): timestep_spacing (`str`, defaults to `"leading"`):
the threshold value for dynamic thresholding. Valid only when `thresholding=True`. The way the timesteps should be scaled. Refer to Table 2 of the [Common Diffusion Noise Schedules and
timestep_spacing (`str`, default `"leading"`): Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) for more information.
The way the timesteps should be scaled. Refer to Table 2. of [Common Diffusion Noise Schedules and Sample rescale_betas_zero_snr (`bool`, defaults to `False`):
Steps are Flawed](https://arxiv.org/abs/2305.08891) for more information. Whether to rescale the betas to have zero terminal SNR. This enables the model to generate very bright and
rescale_betas_zero_snr (`bool`, default `False`): dark samples instead of limiting it to samples with medium brightness. Loosely related to
whether to rescale the betas to have zero terminal SNR (proposed by https://arxiv.org/pdf/2305.08891.pdf).
This can enable the model to generate very bright and dark samples instead of limiting it to samples with
medium brightness. Loosely related to
[`--offset_noise`](https://github.com/huggingface/diffusers/blob/74fd735eb073eb1d774b1ab4154a0876eb82f055/examples/dreambooth/train_dreambooth.py#L506). [`--offset_noise`](https://github.com/huggingface/diffusers/blob/74fd735eb073eb1d774b1ab4154a0876eb82f055/examples/dreambooth/train_dreambooth.py#L506).
""" """
...@@ -246,11 +242,14 @@ class DDIMScheduler(SchedulerMixin, ConfigMixin): ...@@ -246,11 +242,14 @@ class DDIMScheduler(SchedulerMixin, ConfigMixin):
current timestep. current timestep.
Args: Args:
sample (`torch.FloatTensor`): input sample sample (`torch.FloatTensor`):
timestep (`int`, optional): current timestep The input sample.
timestep (`int`, *optional*):
The current timestep in the diffusion chain.
Returns: Returns:
`torch.FloatTensor`: scaled input sample `torch.FloatTensor`:
A scaled input sample.
""" """
return sample return sample
...@@ -301,11 +300,11 @@ class DDIMScheduler(SchedulerMixin, ConfigMixin): ...@@ -301,11 +300,11 @@ class DDIMScheduler(SchedulerMixin, ConfigMixin):
def set_timesteps(self, num_inference_steps: int, device: Union[str, torch.device] = None): def set_timesteps(self, num_inference_steps: int, device: Union[str, torch.device] = None):
""" """
Sets the discrete timesteps used for the diffusion chain. Supporting function to be run before inference. Sets the discrete timesteps used for the diffusion chain (to be run before inference).
Args: Args:
num_inference_steps (`int`): num_inference_steps (`int`):
the number of diffusion steps used when generating samples with a pre-trained model. The number of diffusion steps used when generating samples with a pre-trained model.
""" """
if num_inference_steps > self.config.num_train_timesteps: if num_inference_steps > self.config.num_train_timesteps:
...@@ -356,29 +355,35 @@ class DDIMScheduler(SchedulerMixin, ConfigMixin): ...@@ -356,29 +355,35 @@ class DDIMScheduler(SchedulerMixin, ConfigMixin):
return_dict: bool = True, return_dict: bool = True,
) -> Union[DDIMSchedulerOutput, Tuple]: ) -> Union[DDIMSchedulerOutput, Tuple]:
""" """
Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion Predict the sample from the previous timestep by reversing the SDE. This function propagates the diffusion
process from the learned model outputs (most often the predicted noise). process from the learned model outputs (most often the predicted noise).
Args: Args:
model_output (`torch.FloatTensor`): direct output from learned diffusion model. model_output (`torch.FloatTensor`):
timestep (`int`): current discrete timestep in the diffusion chain. The direct output from learned diffusion model.
timestep (`float`):
The current discrete timestep in the diffusion chain.
sample (`torch.FloatTensor`): sample (`torch.FloatTensor`):
current instance of sample being created by diffusion process. A current instance of a sample created by the diffusion process.
eta (`float`): weight of noise for added noise in diffusion step. eta (`float`):
use_clipped_model_output (`bool`): if `True`, compute "corrected" `model_output` from the clipped The weight of noise for added noise in diffusion step.
predicted original sample. Necessary because predicted original sample is clipped to [-1, 1] when use_clipped_model_output (`bool`, defaults to `False`):
`self.config.clip_sample` is `True`. If no clipping has happened, "corrected" `model_output` would If `True`, computes "corrected" `model_output` from the clipped predicted original sample. Necessary
coincide with the one provided as input and `use_clipped_model_output` will have not effect. because predicted original sample is clipped to [-1, 1] when `self.config.clip_sample` is `True`. If no
generator: random number generator. clipping has happened, "corrected" `model_output` would coincide with the one provided as input and
variance_noise (`torch.FloatTensor`): instead of generating noise for the variance using `generator`, we `use_clipped_model_output` has no effect.
can directly provide the noise for the variance itself. This is useful for methods such as generator (`torch.Generator`, *optional*):
CycleDiffusion. (https://arxiv.org/abs/2210.05559) A random number generator.
return_dict (`bool`): option for returning tuple rather than DDIMSchedulerOutput class variance_noise (`torch.FloatTensor`):
Alternative to generating noise with `generator` by directly providing the noise for the variance
itself. Useful for methods such as [`CycleDiffusion`].
return_dict (`bool`, *optional*, defaults to `True`):
Whether or not to return a [`~schedulers.scheduling_ddim.DDIMSchedulerOutput`] or `tuple`.
Returns: Returns:
[`~schedulers.scheduling_utils.DDIMSchedulerOutput`] or `tuple`: [`~schedulers.scheduling_utils.DDIMSchedulerOutput`] or `tuple`:
[`~schedulers.scheduling_utils.DDIMSchedulerOutput`] if `return_dict` is True, otherwise a `tuple`. When If return_dict is `True`, [`~schedulers.scheduling_ddim.DDIMSchedulerOutput`] is returned, otherwise a
returning a tuple, the first element is the sample tensor. tuple is returned where the first element is the sample tensor.
""" """
if self.num_inference_steps is None: if self.num_inference_steps is None:
......
...@@ -30,14 +30,14 @@ from diffusers.utils import BaseOutput, deprecate ...@@ -30,14 +30,14 @@ from diffusers.utils import BaseOutput, deprecate
# Copied from diffusers.schedulers.scheduling_ddpm.DDPMSchedulerOutput with DDPM->DDIM # Copied from diffusers.schedulers.scheduling_ddpm.DDPMSchedulerOutput with DDPM->DDIM
class DDIMSchedulerOutput(BaseOutput): class DDIMSchedulerOutput(BaseOutput):
""" """
Output class for the scheduler's step function output. Output class for the scheduler's `step` function output.
Args: Args:
prev_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images): prev_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):
Computed sample (x_{t-1}) of previous timestep. `prev_sample` should be used as next model input in the Computed sample `(x_{t-1})` of previous timestep. `prev_sample` should be used as next model input in the
denoising loop. denoising loop.
pred_original_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images): pred_original_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):
The predicted denoised sample (x_{0}) based on the model output from the current timestep. The predicted denoised sample `(x_{0})` based on the model output from the current timestep.
`pred_original_sample` can be used to preview progress or for guidance. `pred_original_sample` can be used to preview progress or for guidance.
""" """
...@@ -129,47 +129,45 @@ def rescale_zero_terminal_snr(betas): ...@@ -129,47 +129,45 @@ def rescale_zero_terminal_snr(betas):
class DDIMInverseScheduler(SchedulerMixin, ConfigMixin): class DDIMInverseScheduler(SchedulerMixin, ConfigMixin):
""" """
DDIMInverseScheduler is the reverse scheduler of [`DDIMScheduler`]. `DDIMInverseScheduler` is the reverse scheduler of [`DDIMScheduler`].
[`~ConfigMixin`] takes care of storing all config attributes that are passed in the scheduler's `__init__` This model inherits from [`SchedulerMixin`] and [`ConfigMixin`]. Check the superclass documentation for the generic
function, such as `num_train_timesteps`. They can be accessed via `scheduler.config.num_train_timesteps`. methods the library implements for all schedulers such as loading and saving.
[`SchedulerMixin`] provides general loading and saving functionality via the [`SchedulerMixin.save_pretrained`] and
[`~SchedulerMixin.from_pretrained`] functions.
For more details, see the original paper: https://arxiv.org/abs/2010.02502
Args: Args:
num_train_timesteps (`int`): number of diffusion steps used to train the model. num_train_timesteps (`int`, defaults to 1000):
beta_start (`float`): the starting `beta` value of inference. The number of diffusion steps to train the model.
beta_end (`float`): the final `beta` value. beta_start (`float`, defaults to 0.0001):
beta_schedule (`str`): The starting `beta` value of inference.
the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from beta_end (`float`, defaults to 0.02):
The final `beta` value.
beta_schedule (`str`, defaults to `"linear"`):
The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from
`linear`, `scaled_linear`, or `squaredcos_cap_v2`. `linear`, `scaled_linear`, or `squaredcos_cap_v2`.
trained_betas (`np.ndarray`, optional): trained_betas (`np.ndarray`, *optional*):
option to pass an array of betas directly to the constructor to bypass `beta_start`, `beta_end` etc. Pass an array of betas directly to the constructor to bypass `beta_start` and `beta_end`.
clip_sample (`bool`, default `True`): clip_sample (`bool`, defaults to `True`):
option to clip predicted sample for numerical stability. Clip the predicted sample for numerical stability.
clip_sample_range (`float`, default `1.0`): clip_sample_range (`float`, defaults to 1.0):
the maximum magnitude for sample clipping. Valid only when `clip_sample=True`. The maximum magnitude for sample clipping. Valid only when `clip_sample=True`.
set_alpha_to_zero (`bool`, default `True`): set_alpha_to_one (`bool`, defaults to `True`):
each diffusion step uses the value of alphas product at that step and at the previous one. For the final Each diffusion step uses the alphas product value at that step and at the previous one. For the final step
step there is no previous alpha. When this option is `True` the previous alpha product is fixed to `0`, there is no previous alpha. When this option is `True` the previous alpha product is fixed to 0, otherwise
otherwise it uses the value of alpha at step `num_train_timesteps - 1`. it uses the alpha value at step `num_train_timesteps - 1`.
steps_offset (`int`, default `0`): steps_offset (`int`, defaults to 0):
an offset added to the inference steps. You can use a combination of `offset=1` and An offset added to the inference steps. You can use a combination of `offset=1` and
`set_alpha_to_zero=False`, to make the last step use step `num_train_timesteps - 1` for the previous alpha `set_alpha_to_one=False` to make the last step use `num_train_timesteps - 1` for the previous alpha
product. product.
prediction_type (`str`, default `epsilon`, optional): prediction_type (`str`, defaults to `epsilon`, *optional*):
prediction type of the scheduler function, one of `epsilon` (predicting the noise of the diffusion Prediction type of the scheduler function; can be `epsilon` (predicts the noise of the diffusion process),
process), `sample` (directly predicting the noisy sample`) or `v_prediction` (see section 2.4 `sample` (directly predicts the noisy sample`) or `v_prediction` (see section 2.4 of [Imagen
https://imagen.research.google/video/paper.pdf) Video](https://imagen.research.google/video/paper.pdf) paper).
timestep_spacing (`str`, default `"leading"`): timestep_spacing (`str`, defaults to `"leading"`):
The way the timesteps should be scaled. Refer to Table 2. of [Common Diffusion Noise Schedules and Sample The way the timesteps should be scaled. Refer to Table 2 of the [Common Diffusion Noise Schedules and
Steps are Flawed](https://arxiv.org/abs/2305.08891) for more information. Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) for more information.
rescale_betas_zero_snr (`bool`, default `False`): rescale_betas_zero_snr (`bool`, defaults to `False`):
whether to rescale the betas to have zero terminal SNR (proposed by https://arxiv.org/pdf/2305.08891.pdf). Whether to rescale the betas to have zero terminal SNR. This enables the model to generate very bright and
This can enable the model to generate very bright and dark samples instead of limiting it to samples with dark samples instead of limiting it to samples with medium brightness. Loosely related to
medium brightness. Loosely related to
[`--offset_noise`](https://github.com/huggingface/diffusers/blob/74fd735eb073eb1d774b1ab4154a0876eb82f055/examples/dreambooth/train_dreambooth.py#L506). [`--offset_noise`](https://github.com/huggingface/diffusers/blob/74fd735eb073eb1d774b1ab4154a0876eb82f055/examples/dreambooth/train_dreambooth.py#L506).
""" """
...@@ -243,21 +241,24 @@ class DDIMInverseScheduler(SchedulerMixin, ConfigMixin): ...@@ -243,21 +241,24 @@ class DDIMInverseScheduler(SchedulerMixin, ConfigMixin):
current timestep. current timestep.
Args: Args:
sample (`torch.FloatTensor`): input sample sample (`torch.FloatTensor`):
timestep (`int`, optional): current timestep The input sample.
timestep (`int`, *optional*):
The current timestep in the diffusion chain.
Returns: Returns:
`torch.FloatTensor`: scaled input sample `torch.FloatTensor`:
A scaled input sample.
""" """
return sample return sample
def set_timesteps(self, num_inference_steps: int, device: Union[str, torch.device] = None): def set_timesteps(self, num_inference_steps: int, device: Union[str, torch.device] = None):
""" """
Sets the discrete timesteps used for the diffusion chain. Supporting function to be run before inference. Sets the discrete timesteps used for the diffusion chain (to be run before inference).
Args: Args:
num_inference_steps (`int`): num_inference_steps (`int`):
the number of diffusion steps used when generating samples with a pre-trained model. The number of diffusion steps used when generating samples with a pre-trained model.
""" """
if num_inference_steps > self.config.num_train_timesteps: if num_inference_steps > self.config.num_train_timesteps:
...@@ -302,6 +303,37 @@ class DDIMInverseScheduler(SchedulerMixin, ConfigMixin): ...@@ -302,6 +303,37 @@ class DDIMInverseScheduler(SchedulerMixin, ConfigMixin):
variance_noise: Optional[torch.FloatTensor] = None, variance_noise: Optional[torch.FloatTensor] = None,
return_dict: bool = True, return_dict: bool = True,
) -> Union[DDIMSchedulerOutput, Tuple]: ) -> Union[DDIMSchedulerOutput, Tuple]:
"""
Predict the sample from the previous timestep by reversing the SDE. This function propagates the diffusion
process from the learned model outputs (most often the predicted noise).
Args:
model_output (`torch.FloatTensor`):
The direct output from learned diffusion model.
timestep (`float`):
The current discrete timestep in the diffusion chain.
sample (`torch.FloatTensor`):
A current instance of a sample created by the diffusion process.
eta (`float`):
The weight of noise for added noise in diffusion step.
use_clipped_model_output (`bool`, defaults to `False`):
If `True`, computes "corrected" `model_output` from the clipped predicted original sample. Necessary
because predicted original sample is clipped to [-1, 1] when `self.config.clip_sample` is `True`. If no
clipping has happened, "corrected" `model_output` would coincide with the one provided as input and
`use_clipped_model_output` has no effect.
variance_noise (`torch.FloatTensor`):
Alternative to generating noise with `generator` by directly providing the noise for the variance
itself. Useful for methods such as [`CycleDiffusion`].
return_dict (`bool`, *optional*, defaults to `True`):
Whether or not to return a [`~schedulers.scheduling_ddim_inverse.DDIMInverseSchedulerOutput`] or
`tuple`.
Returns:
[`~schedulers.scheduling_ddim_inverse.DDIMInverseSchedulerOutput`] or `tuple`:
If return_dict is `True`, [`~schedulers.scheduling_ddim_inverse.DDIMInverseSchedulerOutput`] is
returned, otherwise a tuple is returned where the first element is the sample tensor.
"""
# 1. get previous step value (=t+1) # 1. get previous step value (=t+1)
prev_timestep = timestep + self.config.num_train_timesteps // self.num_inference_steps prev_timestep = timestep + self.config.num_train_timesteps // self.num_inference_steps
......
...@@ -31,14 +31,14 @@ from .scheduling_utils import KarrasDiffusionSchedulers, SchedulerMixin ...@@ -31,14 +31,14 @@ from .scheduling_utils import KarrasDiffusionSchedulers, SchedulerMixin
# Copied from diffusers.schedulers.scheduling_ddpm.DDPMSchedulerOutput # Copied from diffusers.schedulers.scheduling_ddpm.DDPMSchedulerOutput
class DDIMParallelSchedulerOutput(BaseOutput): class DDIMParallelSchedulerOutput(BaseOutput):
""" """
Output class for the scheduler's step function output. Output class for the scheduler's `step` function output.
Args: Args:
prev_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images): prev_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):
Computed sample (x_{t-1}) of previous timestep. `prev_sample` should be used as next model input in the Computed sample `(x_{t-1})` of previous timestep. `prev_sample` should be used as next model input in the
denoising loop. denoising loop.
pred_original_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images): pred_original_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):
The predicted denoised sample (x_{0}) based on the model output from the current timestep. The predicted denoised sample `(x_{0})` based on the model output from the current timestep.
`pred_original_sample` can be used to preview progress or for guidance. `pred_original_sample` can be used to preview progress or for guidance.
""" """
...@@ -250,11 +250,14 @@ class DDIMParallelScheduler(SchedulerMixin, ConfigMixin): ...@@ -250,11 +250,14 @@ class DDIMParallelScheduler(SchedulerMixin, ConfigMixin):
current timestep. current timestep.
Args: Args:
sample (`torch.FloatTensor`): input sample sample (`torch.FloatTensor`):
timestep (`int`, optional): current timestep The input sample.
timestep (`int`, *optional*):
The current timestep in the diffusion chain.
Returns: Returns:
`torch.FloatTensor`: scaled input sample `torch.FloatTensor`:
A scaled input sample.
""" """
return sample return sample
...@@ -320,11 +323,11 @@ class DDIMParallelScheduler(SchedulerMixin, ConfigMixin): ...@@ -320,11 +323,11 @@ class DDIMParallelScheduler(SchedulerMixin, ConfigMixin):
# Copied from diffusers.schedulers.scheduling_ddim.DDIMScheduler.set_timesteps # Copied from diffusers.schedulers.scheduling_ddim.DDIMScheduler.set_timesteps
def set_timesteps(self, num_inference_steps: int, device: Union[str, torch.device] = None): def set_timesteps(self, num_inference_steps: int, device: Union[str, torch.device] = None):
""" """
Sets the discrete timesteps used for the diffusion chain. Supporting function to be run before inference. Sets the discrete timesteps used for the diffusion chain (to be run before inference).
Args: Args:
num_inference_steps (`int`): num_inference_steps (`int`):
the number of diffusion steps used when generating samples with a pre-trained model. The number of diffusion steps used when generating samples with a pre-trained model.
""" """
if num_inference_steps > self.config.num_train_timesteps: if num_inference_steps > self.config.num_train_timesteps:
......
...@@ -29,14 +29,14 @@ from .scheduling_utils import KarrasDiffusionSchedulers, SchedulerMixin ...@@ -29,14 +29,14 @@ from .scheduling_utils import KarrasDiffusionSchedulers, SchedulerMixin
@dataclass @dataclass
class DDPMSchedulerOutput(BaseOutput): class DDPMSchedulerOutput(BaseOutput):
""" """
Output class for the scheduler's step function output. Output class for the scheduler's `step` function output.
Args: Args:
prev_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images): prev_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):
Computed sample (x_{t-1}) of previous timestep. `prev_sample` should be used as next model input in the Computed sample `(x_{t-1})` of previous timestep. `prev_sample` should be used as next model input in the
denoising loop. denoising loop.
pred_original_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images): pred_original_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):
The predicted denoised sample (x_{0}) based on the model output from the current timestep. The predicted denoised sample `(x_{0})` based on the model output from the current timestep.
`pred_original_sample` can be used to preview progress or for guidance. `pred_original_sample` can be used to preview progress or for guidance.
""" """
...@@ -90,52 +90,46 @@ def betas_for_alpha_bar( ...@@ -90,52 +90,46 @@ def betas_for_alpha_bar(
class DDPMScheduler(SchedulerMixin, ConfigMixin): class DDPMScheduler(SchedulerMixin, ConfigMixin):
""" """
Denoising diffusion probabilistic models (DDPMs) explores the connections between denoising score matching and `DDPMScheduler` explores the connections between denoising score matching and Langevin dynamics sampling.
Langevin dynamics sampling.
[`~ConfigMixin`] takes care of storing all config attributes that are passed in the scheduler's `__init__` This model inherits from [`SchedulerMixin`] and [`ConfigMixin`]. Check the superclass documentation for the generic
function, such as `num_train_timesteps`. They can be accessed via `scheduler.config.num_train_timesteps`. methods the library implements for all schedulers such as loading and saving.
[`SchedulerMixin`] provides general loading and saving functionality via the [`SchedulerMixin.save_pretrained`] and
[`~SchedulerMixin.from_pretrained`] functions.
For more details, see the original paper: https://arxiv.org/abs/2006.11239
Args: Args:
num_train_timesteps (`int`): number of diffusion steps used to train the model. num_train_timesteps (`int`, defaults to 1000):
beta_start (`float`): the starting `beta` value of inference. The number of diffusion steps to train the model.
beta_end (`float`): the final `beta` value. beta_start (`float`, defaults to 0.0001):
beta_schedule (`str`): The starting `beta` value of inference.
the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from beta_end (`float`, defaults to 0.02):
`linear`, `scaled_linear`, `squaredcos_cap_v2` or `sigmoid`. The final `beta` value.
trained_betas (`np.ndarray`, optional): beta_schedule (`str`, defaults to `"linear"`):
option to pass an array of betas directly to the constructor to bypass `beta_start`, `beta_end` etc. The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from
variance_type (`str`): `linear`, `scaled_linear`, or `squaredcos_cap_v2`.
options to clip the variance used when adding noise to the denoised sample. Choose from `fixed_small`, variance_type (`str`, defaults to `"fixed_small"`):
`fixed_small_log`, `fixed_large`, `fixed_large_log`, `learned` or `learned_range`. Clip the variance when adding noise to the denoised sample. Choose from `fixed_small`, `fixed_small_log`,
clip_sample (`bool`, default `True`): `fixed_large`, `fixed_large_log`, `learned` or `learned_range`.
option to clip predicted sample for numerical stability. clip_sample (`bool`, defaults to `True`):
clip_sample_range (`float`, default `1.0`): Clip the predicted sample for numerical stability.
the maximum magnitude for sample clipping. Valid only when `clip_sample=True`. clip_sample_range (`float`, defaults to 1.0):
prediction_type (`str`, default `epsilon`, optional): The maximum magnitude for sample clipping. Valid only when `clip_sample=True`.
prediction type of the scheduler function, one of `epsilon` (predicting the noise of the diffusion prediction_type (`str`, defaults to `epsilon`, *optional*):
process), `sample` (directly predicting the noisy sample`) or `v_prediction` (see section 2.4 Prediction type of the scheduler function; can be `epsilon` (predicts the noise of the diffusion process),
https://imagen.research.google/video/paper.pdf) `sample` (directly predicts the noisy sample`) or `v_prediction` (see section 2.4 of [Imagen
thresholding (`bool`, default `False`): Video](https://imagen.research.google/video/paper.pdf) paper).
whether to use the "dynamic thresholding" method (introduced by Imagen, https://arxiv.org/abs/2205.11487). thresholding (`bool`, defaults to `False`):
Note that the thresholding method is unsuitable for latent-space diffusion models (such as Whether to use the "dynamic thresholding" method. This is unsuitable for latent-space diffusion models such
stable-diffusion). as Stable Diffusion.
dynamic_thresholding_ratio (`float`, default `0.995`): dynamic_thresholding_ratio (`float`, defaults to 0.995):
the ratio for the dynamic thresholding method. Default is `0.995`, the same as Imagen The ratio for the dynamic thresholding method. Valid only when `thresholding=True`.
(https://arxiv.org/abs/2205.11487). Valid only when `thresholding=True`. sample_max_value (`float`, defaults to 1.0):
sample_max_value (`float`, default `1.0`): The threshold value for dynamic thresholding. Valid only when `thresholding=True`.
the threshold value for dynamic thresholding. Valid only when `thresholding=True`. timestep_spacing (`str`, defaults to `"leading"`):
timestep_spacing (`str`, default `"leading"`): The way the timesteps should be scaled. Refer to Table 2 of the [Common Diffusion Noise Schedules and
The way the timesteps should be scaled. Refer to Table 2. of [Common Diffusion Noise Schedules and Sample Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) for more information.
Steps are Flawed](https://arxiv.org/abs/2305.08891) for more information. steps_offset (`int`, defaults to 0):
steps_offset (`int`, default `0`): An offset added to the inference steps. You can use a combination of `offset=1` and
an offset added to the inference steps. You can use a combination of `offset=1` and `set_alpha_to_one=False` to make the last step use step 0 for the previous alpha product like in Stable
`set_alpha_to_one=False`, to make the last step use step 0 for the previous alpha product, as done in Diffusion.
stable diffusion.
""" """
_compatibles = [e.name for e in KarrasDiffusionSchedulers] _compatibles = [e.name for e in KarrasDiffusionSchedulers]
...@@ -198,11 +192,14 @@ class DDPMScheduler(SchedulerMixin, ConfigMixin): ...@@ -198,11 +192,14 @@ class DDPMScheduler(SchedulerMixin, ConfigMixin):
current timestep. current timestep.
Args: Args:
sample (`torch.FloatTensor`): input sample sample (`torch.FloatTensor`):
timestep (`int`, optional): current timestep The input sample.
timestep (`int`, *optional*):
The current timestep in the diffusion chain.
Returns: Returns:
`torch.FloatTensor`: scaled input sample `torch.FloatTensor`:
A scaled input sample.
""" """
return sample return sample
...@@ -213,18 +210,18 @@ class DDPMScheduler(SchedulerMixin, ConfigMixin): ...@@ -213,18 +210,18 @@ class DDPMScheduler(SchedulerMixin, ConfigMixin):
timesteps: Optional[List[int]] = None, timesteps: Optional[List[int]] = None,
): ):
""" """
Sets the discrete timesteps used for the diffusion chain. Supporting function to be run before inference. Sets the discrete timesteps used for the diffusion chain (to be run before inference).
Args: Args:
num_inference_steps (`Optional[int]`): num_inference_steps (`int`):
the number of diffusion steps used when generating samples with a pre-trained model. If passed, then The number of diffusion steps used when generating samples with a pre-trained model. If used,
`timesteps` must be `None`. `timesteps` must be `None`.
device (`str` or `torch.device`, optional): device (`str` or `torch.device`, *optional*):
the device to which the timesteps are moved to. The device to which the timesteps should be moved to. If `None`, the timesteps are not moved.
custom_timesteps (`List[int]`, optional): timesteps (`List[int]`, *optional*):
custom timesteps used to support arbitrary spacing between timesteps. If `None`, then the default Custom timesteps used to support arbitrary spacing between timesteps. If `None`, then the default
timestep spacing strategy of equal spacing between timesteps is used. If passed, `num_inference_steps` timestep spacing strategy of equal spacing between timesteps is used. If `timesteps` is passed,
must be `None`. `num_inference_steps` must be `None`.
""" """
if num_inference_steps is not None and timesteps is not None: if num_inference_steps is not None and timesteps is not None:
...@@ -364,21 +361,25 @@ class DDPMScheduler(SchedulerMixin, ConfigMixin): ...@@ -364,21 +361,25 @@ class DDPMScheduler(SchedulerMixin, ConfigMixin):
return_dict: bool = True, return_dict: bool = True,
) -> Union[DDPMSchedulerOutput, Tuple]: ) -> Union[DDPMSchedulerOutput, Tuple]:
""" """
Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion Predict the sample from the previous timestep by reversing the SDE. This function propagates the diffusion
process from the learned model outputs (most often the predicted noise). process from the learned model outputs (most often the predicted noise).
Args: Args:
model_output (`torch.FloatTensor`): direct output from learned diffusion model. model_output (`torch.FloatTensor`):
timestep (`int`): current discrete timestep in the diffusion chain. The direct output from learned diffusion model.
timestep (`float`):
The current discrete timestep in the diffusion chain.
sample (`torch.FloatTensor`): sample (`torch.FloatTensor`):
current instance of sample being created by diffusion process. A current instance of a sample created by the diffusion process.
generator: random number generator. generator (`torch.Generator`, *optional*):
return_dict (`bool`): option for returning tuple rather than DDPMSchedulerOutput class A random number generator.
return_dict (`bool`, *optional*, defaults to `True`):
Whether or not to return a [`~schedulers.scheduling_ddpm.DDPMSchedulerOutput`] or `tuple`.
Returns: Returns:
[`~schedulers.scheduling_utils.DDPMSchedulerOutput`] or `tuple`: [`~schedulers.scheduling_ddpm.DDPMSchedulerOutput`] or `tuple`:
[`~schedulers.scheduling_utils.DDPMSchedulerOutput`] if `return_dict` is True, otherwise a `tuple`. When If return_dict is `True`, [`~schedulers.scheduling_ddpm.DDPMSchedulerOutput`] is returned, otherwise a
returning a tuple, the first element is the sample tensor. tuple is returned where the first element is the sample tensor.
""" """
t = timestep t = timestep
......
...@@ -30,14 +30,14 @@ from .scheduling_utils import KarrasDiffusionSchedulers, SchedulerMixin ...@@ -30,14 +30,14 @@ from .scheduling_utils import KarrasDiffusionSchedulers, SchedulerMixin
# Copied from diffusers.schedulers.scheduling_ddpm.DDPMSchedulerOutput # Copied from diffusers.schedulers.scheduling_ddpm.DDPMSchedulerOutput
class DDPMParallelSchedulerOutput(BaseOutput): class DDPMParallelSchedulerOutput(BaseOutput):
""" """
Output class for the scheduler's step function output. Output class for the scheduler's `step` function output.
Args: Args:
prev_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images): prev_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):
Computed sample (x_{t-1}) of previous timestep. `prev_sample` should be used as next model input in the Computed sample `(x_{t-1})` of previous timestep. `prev_sample` should be used as next model input in the
denoising loop. denoising loop.
pred_original_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images): pred_original_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):
The predicted denoised sample (x_{0}) based on the model output from the current timestep. The predicted denoised sample `(x_{0})` based on the model output from the current timestep.
`pred_original_sample` can be used to preview progress or for guidance. `pred_original_sample` can be used to preview progress or for guidance.
""" """
...@@ -203,11 +203,14 @@ class DDPMParallelScheduler(SchedulerMixin, ConfigMixin): ...@@ -203,11 +203,14 @@ class DDPMParallelScheduler(SchedulerMixin, ConfigMixin):
current timestep. current timestep.
Args: Args:
sample (`torch.FloatTensor`): input sample sample (`torch.FloatTensor`):
timestep (`int`, optional): current timestep The input sample.
timestep (`int`, *optional*):
The current timestep in the diffusion chain.
Returns: Returns:
`torch.FloatTensor`: scaled input sample `torch.FloatTensor`:
A scaled input sample.
""" """
return sample return sample
...@@ -219,18 +222,18 @@ class DDPMParallelScheduler(SchedulerMixin, ConfigMixin): ...@@ -219,18 +222,18 @@ class DDPMParallelScheduler(SchedulerMixin, ConfigMixin):
timesteps: Optional[List[int]] = None, timesteps: Optional[List[int]] = None,
): ):
""" """
Sets the discrete timesteps used for the diffusion chain. Supporting function to be run before inference. Sets the discrete timesteps used for the diffusion chain (to be run before inference).
Args: Args:
num_inference_steps (`Optional[int]`): num_inference_steps (`int`):
the number of diffusion steps used when generating samples with a pre-trained model. If passed, then The number of diffusion steps used when generating samples with a pre-trained model. If used,
`timesteps` must be `None`. `timesteps` must be `None`.
device (`str` or `torch.device`, optional): device (`str` or `torch.device`, *optional*):
the device to which the timesteps are moved to. The device to which the timesteps should be moved to. If `None`, the timesteps are not moved.
custom_timesteps (`List[int]`, optional): timesteps (`List[int]`, *optional*):
custom timesteps used to support arbitrary spacing between timesteps. If `None`, then the default Custom timesteps used to support arbitrary spacing between timesteps. If `None`, then the default
timestep spacing strategy of equal spacing between timesteps is used. If passed, `num_inference_steps` timestep spacing strategy of equal spacing between timesteps is used. If `timesteps` is passed,
must be `None`. `num_inference_steps` must be `None`.
""" """
if num_inference_steps is not None and timesteps is not None: if num_inference_steps is not None and timesteps is not None:
......
...@@ -72,63 +72,51 @@ def betas_for_alpha_bar( ...@@ -72,63 +72,51 @@ def betas_for_alpha_bar(
class DEISMultistepScheduler(SchedulerMixin, ConfigMixin): class DEISMultistepScheduler(SchedulerMixin, ConfigMixin):
""" """
DEIS (https://arxiv.org/abs/2204.13902) is a fast high order solver for diffusion ODEs. We slightly modify the `DEISMultistepScheduler` is a fast high order solver for diffusion ordinary differential equations (ODEs).
polynomial fitting formula in log-rho space instead of the original linear t space in DEIS paper. The modification
enjoys closed-form coefficients for exponential multistep update instead of replying on the numerical solver. More
variants of DEIS can be found in https://github.com/qsh-zh/deis.
Currently, we support the log-rho multistep DEIS. We recommend to use `solver_order=2 / 3` while `solver_order=1` This model inherits from [`SchedulerMixin`] and [`ConfigMixin`]. Check the superclass documentation for the generic
reduces to DDIM. methods the library implements for all schedulers such as loading and saving.
We also support the "dynamic thresholding" method in Imagen (https://arxiv.org/abs/2205.11487). For pixel-space
diffusion models, you can set `thresholding=True` to use the dynamic thresholding.
[`~ConfigMixin`] takes care of storing all config attributes that are passed in the scheduler's `__init__`
function, such as `num_train_timesteps`. They can be accessed via `scheduler.config.num_train_timesteps`.
[`SchedulerMixin`] provides general loading and saving functionality via the [`SchedulerMixin.save_pretrained`] and
[`~SchedulerMixin.from_pretrained`] functions.
Args: Args:
num_train_timesteps (`int`): number of diffusion steps used to train the model. num_train_timesteps (`int`, defaults to 1000):
beta_start (`float`): the starting `beta` value of inference. The number of diffusion steps to train the model.
beta_end (`float`): the final `beta` value. beta_start (`float`, defaults to 0.0001):
beta_schedule (`str`): The starting `beta` value of inference.
the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from beta_end (`float`, defaults to 0.02):
The final `beta` value.
beta_schedule (`str`, defaults to `"linear"`):
The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from
`linear`, `scaled_linear`, or `squaredcos_cap_v2`. `linear`, `scaled_linear`, or `squaredcos_cap_v2`.
trained_betas (`np.ndarray`, optional): trained_betas (`np.ndarray`, *optional*):
option to pass an array of betas directly to the constructor to bypass `beta_start`, `beta_end` etc. Pass an array of betas directly to the constructor to bypass `beta_start` and `beta_end`.
solver_order (`int`, default `2`): solver_order (`int`, defaults to 2):
the order of DEIS; can be `1` or `2` or `3`. We recommend to use `solver_order=2` for guided sampling, and The DEIS order which can be `1` or `2` or `3`. It is recommended to use `solver_order=2` for guided
`solver_order=3` for unconditional sampling. sampling, and `solver_order=3` for unconditional sampling.
prediction_type (`str`, default `epsilon`): prediction_type (`str`, defaults to `epsilon`):
indicates whether the model predicts the noise (epsilon), or the data / `x0`. One of `epsilon`, `sample`, Prediction type of the scheduler function; can be `epsilon` (predicts the noise of the diffusion process),
or `v-prediction`. `sample` (directly predicts the noisy sample`) or `v_prediction` (see section 2.4 of [Imagen
thresholding (`bool`, default `False`): Video](https://imagen.research.google/video/paper.pdf) paper).
whether to use the "dynamic thresholding" method (introduced by Imagen, https://arxiv.org/abs/2205.11487). thresholding (`bool`, defaults to `False`):
Note that the thresholding method is unsuitable for latent-space diffusion models (such as Whether to use the "dynamic thresholding" method. This is unsuitable for latent-space diffusion models such
stable-diffusion). as Stable Diffusion.
dynamic_thresholding_ratio (`float`, default `0.995`): dynamic_thresholding_ratio (`float`, defaults to 0.995):
the ratio for the dynamic thresholding method. Default is `0.995`, the same as Imagen The ratio for the dynamic thresholding method. Valid only when `thresholding=True`.
(https://arxiv.org/abs/2205.11487). sample_max_value (`float`, defaults to 1.0):
sample_max_value (`float`, default `1.0`): The threshold value for dynamic thresholding. Valid only when `thresholding=True`.
the threshold value for dynamic thresholding. Valid only when `thresholding=True` algorithm_type (`str`, defaults to `deis`):
algorithm_type (`str`, default `deis`): The algorithm type for the solver.
the algorithm type for the solver. current we support multistep deis, we will add other variants of DEIS in lower_order_final (`bool`, defaults to `True`):
the future Whether to use lower-order solvers in the final steps. Only valid for < 15 inference steps.
lower_order_final (`bool`, default `True`):
whether to use lower-order solvers in the final steps. Only valid for < 15 inference steps. We empirically
find this trick can stabilize the sampling of DEIS for steps < 15, especially for steps <= 10.
use_karras_sigmas (`bool`, *optional*, defaults to `False`): use_karras_sigmas (`bool`, *optional*, defaults to `False`):
This parameter controls whether to use Karras sigmas (Karras et al. (2022) scheme) for step sizes in the Whether to use Karras sigmas for step sizes in the noise schedule during the sampling process. If `True`,
noise schedule during the sampling process. If True, the sigmas will be determined according to a sequence the sigmas are determined according to a sequence of noise levels {σi}.
of noise levels {σi} as defined in Equation (5) of the paper https://arxiv.org/pdf/2206.00364.pdf. timestep_spacing (`str`, defaults to `"linspace"`):
timestep_spacing (`str`, default `"linspace"`): The way the timesteps should be scaled. Refer to Table 2 of the [Common Diffusion Noise Schedules and
The way the timesteps should be scaled. Refer to Table 2. of [Common Diffusion Noise Schedules and Sample Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) for more information.
Steps are Flawed](https://arxiv.org/abs/2305.08891) for more information. steps_offset (`int`, defaults to 0):
steps_offset (`int`, default `0`): An offset added to the inference steps. You can use a combination of `offset=1` and
an offset added to the inference steps. You can use a combination of `offset=1` and `set_alpha_to_one=False` to make the last step use step 0 for the previous alpha product like in Stable
`set_alpha_to_one=False`, to make the last step use step 0 for the previous alpha product, as done in Diffusion.
stable diffusion.
""" """
_compatibles = [e.name for e in KarrasDiffusionSchedulers] _compatibles = [e.name for e in KarrasDiffusionSchedulers]
...@@ -201,13 +189,13 @@ class DEISMultistepScheduler(SchedulerMixin, ConfigMixin): ...@@ -201,13 +189,13 @@ class DEISMultistepScheduler(SchedulerMixin, ConfigMixin):
def set_timesteps(self, num_inference_steps: int, device: Union[str, torch.device] = None): def set_timesteps(self, num_inference_steps: int, device: Union[str, torch.device] = None):
""" """
Sets the timesteps used for the diffusion chain. Supporting function to be run before inference. Sets the discrete timesteps used for the diffusion chain (to be run before inference).
Args: Args:
num_inference_steps (`int`): num_inference_steps (`int`):
the number of diffusion steps used when generating samples with a pre-trained model. The number of diffusion steps used when generating samples with a pre-trained model.
device (`str` or `torch.device`, optional): device (`str` or `torch.device`, *optional*):
the device to which the timesteps should be moved to. If `None`, the timesteps are not moved. The device to which the timesteps should be moved to. If `None`, the timesteps are not moved.
""" """
# "linspace", "leading", "trailing" corresponds to annotation of Table 2. of https://arxiv.org/abs/2305.08891 # "linspace", "leading", "trailing" corresponds to annotation of Table 2. of https://arxiv.org/abs/2305.08891
if self.config.timestep_spacing == "linspace": if self.config.timestep_spacing == "linspace":
...@@ -296,16 +284,19 @@ class DEISMultistepScheduler(SchedulerMixin, ConfigMixin): ...@@ -296,16 +284,19 @@ class DEISMultistepScheduler(SchedulerMixin, ConfigMixin):
self, model_output: torch.FloatTensor, timestep: int, sample: torch.FloatTensor self, model_output: torch.FloatTensor, timestep: int, sample: torch.FloatTensor
) -> torch.FloatTensor: ) -> torch.FloatTensor:
""" """
Convert the model output to the corresponding type that the algorithm DEIS needs. Convert the model output to the corresponding type the DEIS algorithm needs.
Args: Args:
model_output (`torch.FloatTensor`): direct output from learned diffusion model. model_output (`torch.FloatTensor`):
timestep (`int`): current discrete timestep in the diffusion chain. The direct output from the learned diffusion model.
timestep (`int`):
The current discrete timestep in the diffusion chain.
sample (`torch.FloatTensor`): sample (`torch.FloatTensor`):
current instance of sample being created by diffusion process. A current instance of a sample created by the diffusion process.
Returns: Returns:
`torch.FloatTensor`: the converted model output. `torch.FloatTensor`:
The converted model output.
""" """
if self.config.prediction_type == "epsilon": if self.config.prediction_type == "epsilon":
alpha_t, sigma_t = self.alpha_t[timestep], self.sigma_t[timestep] alpha_t, sigma_t = self.alpha_t[timestep], self.sigma_t[timestep]
...@@ -341,14 +332,18 @@ class DEISMultistepScheduler(SchedulerMixin, ConfigMixin): ...@@ -341,14 +332,18 @@ class DEISMultistepScheduler(SchedulerMixin, ConfigMixin):
One step for the first-order DEIS (equivalent to DDIM). One step for the first-order DEIS (equivalent to DDIM).
Args: Args:
model_output (`torch.FloatTensor`): direct output from learned diffusion model. model_output (`torch.FloatTensor`):
timestep (`int`): current discrete timestep in the diffusion chain. The direct output from the learned diffusion model.
prev_timestep (`int`): previous discrete timestep in the diffusion chain. timestep (`int`):
The current discrete timestep in the diffusion chain.
prev_timestep (`int`):
The previous discrete timestep in the diffusion chain.
sample (`torch.FloatTensor`): sample (`torch.FloatTensor`):
current instance of sample being created by diffusion process. A current instance of a sample created by the diffusion process.
Returns: Returns:
`torch.FloatTensor`: the sample tensor at the previous timestep. `torch.FloatTensor`:
The sample tensor at the previous timestep.
""" """
lambda_t, lambda_s = self.lambda_t[prev_timestep], self.lambda_t[timestep] lambda_t, lambda_s = self.lambda_t[prev_timestep], self.lambda_t[timestep]
alpha_t, alpha_s = self.alpha_t[prev_timestep], self.alpha_t[timestep] alpha_t, alpha_s = self.alpha_t[prev_timestep], self.alpha_t[timestep]
...@@ -372,14 +367,17 @@ class DEISMultistepScheduler(SchedulerMixin, ConfigMixin): ...@@ -372,14 +367,17 @@ class DEISMultistepScheduler(SchedulerMixin, ConfigMixin):
Args: Args:
model_output_list (`List[torch.FloatTensor]`): model_output_list (`List[torch.FloatTensor]`):
direct outputs from learned diffusion model at current and latter timesteps. The direct outputs from learned diffusion model at current and latter timesteps.
timestep (`int`): current and latter discrete timestep in the diffusion chain. timestep (`int`):
prev_timestep (`int`): previous discrete timestep in the diffusion chain. The current and latter discrete timestep in the diffusion chain.
prev_timestep (`int`):
The previous discrete timestep in the diffusion chain.
sample (`torch.FloatTensor`): sample (`torch.FloatTensor`):
current instance of sample being created by diffusion process. A current instance of a sample created by the diffusion process.
Returns: Returns:
`torch.FloatTensor`: the sample tensor at the previous timestep. `torch.FloatTensor`:
The sample tensor at the previous timestep.
""" """
t, s0, s1 = prev_timestep, timestep_list[-1], timestep_list[-2] t, s0, s1 = prev_timestep, timestep_list[-1], timestep_list[-2]
m0, m1 = model_output_list[-1], model_output_list[-2] m0, m1 = model_output_list[-1], model_output_list[-2]
...@@ -414,14 +412,17 @@ class DEISMultistepScheduler(SchedulerMixin, ConfigMixin): ...@@ -414,14 +412,17 @@ class DEISMultistepScheduler(SchedulerMixin, ConfigMixin):
Args: Args:
model_output_list (`List[torch.FloatTensor]`): model_output_list (`List[torch.FloatTensor]`):
direct outputs from learned diffusion model at current and latter timesteps. The direct outputs from learned diffusion model at current and latter timesteps.
timestep (`int`): current and latter discrete timestep in the diffusion chain. timestep (`int`):
prev_timestep (`int`): previous discrete timestep in the diffusion chain. The current and latter discrete timestep in the diffusion chain.
prev_timestep (`int`):
The previous discrete timestep in the diffusion chain.
sample (`torch.FloatTensor`): sample (`torch.FloatTensor`):
current instance of sample being created by diffusion process. A current instance of a sample created by diffusion process.
Returns: Returns:
`torch.FloatTensor`: the sample tensor at the previous timestep. `torch.FloatTensor`:
The sample tensor at the previous timestep.
""" """
t, s0, s1, s2 = prev_timestep, timestep_list[-1], timestep_list[-2], timestep_list[-3] t, s0, s1, s2 = prev_timestep, timestep_list[-1], timestep_list[-2], timestep_list[-3]
m0, m1, m2 = model_output_list[-1], model_output_list[-2], model_output_list[-3] m0, m1, m2 = model_output_list[-1], model_output_list[-2], model_output_list[-3]
...@@ -467,18 +468,23 @@ class DEISMultistepScheduler(SchedulerMixin, ConfigMixin): ...@@ -467,18 +468,23 @@ class DEISMultistepScheduler(SchedulerMixin, ConfigMixin):
return_dict: bool = True, return_dict: bool = True,
) -> Union[SchedulerOutput, Tuple]: ) -> Union[SchedulerOutput, Tuple]:
""" """
Step function propagating the sample with the multistep DEIS. Predict the sample from the previous timestep by reversing the SDE. This function propagates the sample with
the multistep DEIS.
Args: Args:
model_output (`torch.FloatTensor`): direct output from learned diffusion model. model_output (`torch.FloatTensor`):
timestep (`int`): current discrete timestep in the diffusion chain. The direct output from learned diffusion model.
timestep (`float`):
The current discrete timestep in the diffusion chain.
sample (`torch.FloatTensor`): sample (`torch.FloatTensor`):
current instance of sample being created by diffusion process. A current instance of a sample created by the diffusion process.
return_dict (`bool`): option for returning tuple rather than SchedulerOutput class return_dict (`bool`):
Whether or not to return a [`~schedulers.scheduling_utils.SchedulerOutput`] or `tuple`.
Returns: Returns:
[`~scheduling_utils.SchedulerOutput`] or `tuple`: [`~scheduling_utils.SchedulerOutput`] if `return_dict` is [`~schedulers.scheduling_utils.SchedulerOutput`] or `tuple`:
True, otherwise a `tuple`. When returning a tuple, the first element is the sample tensor. If return_dict is `True`, [`~schedulers.scheduling_utils.SchedulerOutput`] is returned, otherwise a
tuple is returned where the first element is the sample tensor.
""" """
if self.num_inference_steps is None: if self.num_inference_steps is None:
...@@ -533,10 +539,12 @@ class DEISMultistepScheduler(SchedulerMixin, ConfigMixin): ...@@ -533,10 +539,12 @@ class DEISMultistepScheduler(SchedulerMixin, ConfigMixin):
current timestep. current timestep.
Args: Args:
sample (`torch.FloatTensor`): input sample sample (`torch.FloatTensor`):
The input sample.
Returns: Returns:
`torch.FloatTensor`: scaled input sample `torch.FloatTensor`:
A scaled input sample.
""" """
return sample return sample
......
...@@ -123,39 +123,40 @@ def betas_for_alpha_bar( ...@@ -123,39 +123,40 @@ def betas_for_alpha_bar(
class DPMSolverSDEScheduler(SchedulerMixin, ConfigMixin): class DPMSolverSDEScheduler(SchedulerMixin, ConfigMixin):
""" """
Implements Stochastic Sampler (Algorithm 2) from Karras et al. (2022). Based on the original k-diffusion DPMSolverSDEScheduler implements the stochastic sampler from the [Elucidating the Design Space of Diffusion-Based
implementation by Katherine Crowson: Generative Models](https://huggingface.co/papers/2206.00364) paper.
https://github.com/crowsonkb/k-diffusion/blob/41b4cb6df0506694a7776af31349acf082bf6091/k_diffusion/sampling.py#L543
[`~ConfigMixin`] takes care of storing all config attributes that are passed in the scheduler's `__init__` This model inherits from [`SchedulerMixin`] and [`ConfigMixin`]. Check the superclass documentation for the generic
function, such as `num_train_timesteps`. They can be accessed via `scheduler.config.num_train_timesteps`. methods the library implements for all schedulers such as loading and saving.
[`SchedulerMixin`] provides general loading and saving functionality via the [`SchedulerMixin.save_pretrained`] and
[`~SchedulerMixin.from_pretrained`] functions.
Args: Args:
num_train_timesteps (`int`): number of diffusion steps used to train the model. beta_start (`float`): the num_train_timesteps (`int`, defaults to 1000):
starting `beta` value of inference. beta_end (`float`): the final `beta` value. beta_schedule (`str`): The number of diffusion steps to train the model.
the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from beta_start (`float`, defaults to 0.00085):
The starting `beta` value of inference.
beta_end (`float`, defaults to 0.012):
The final `beta` value.
beta_schedule (`str`, defaults to `"linear"`):
The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from
`linear` or `scaled_linear`. `linear` or `scaled_linear`.
trained_betas (`np.ndarray`, optional): trained_betas (`np.ndarray`, *optional*):
option to pass an array of betas directly to the constructor to bypass `beta_start`, `beta_end` etc. Pass an array of betas directly to the constructor to bypass `beta_start` and `beta_end`.
prediction_type (`str`, default `epsilon`, optional): prediction_type (`str`, defaults to `epsilon`, *optional*):
prediction type of the scheduler function, one of `epsilon` (predicting the noise of the diffusion Prediction type of the scheduler function; can be `epsilon` (predicts the noise of the diffusion process),
process), `sample` (directly predicting the noisy sample`) or `v_prediction` (see section 2.4 `sample` (directly predicts the noisy sample`) or `v_prediction` (see section 2.4 of [Imagen
https://imagen.research.google/video/paper.pdf) Video](https://imagen.research.google/video/paper.pdf) paper).
use_karras_sigmas (`bool`, *optional*, defaults to `False`): use_karras_sigmas (`bool`, *optional*, defaults to `False`):
This parameter controls whether to use Karras sigmas (Karras et al. (2022) scheme) for step sizes in the Whether to use Karras sigmas for step sizes in the noise schedule during the sampling process. If `True`,
noise schedule during the sampling process. If True, the sigmas will be determined according to a sequence the sigmas are determined according to a sequence of noise levels {σi}.
of noise levels {σi} as defined in Equation (5) of the paper https://arxiv.org/pdf/2206.00364.pdf.
noise_sampler_seed (`int`, *optional*, defaults to `None`): noise_sampler_seed (`int`, *optional*, defaults to `None`):
The random seed to use for the noise sampler. If `None`, a random seed will be generated. The random seed to use for the noise sampler. If `None`, a random seed is generated.
timestep_spacing (`str`, default `"linspace"`): timestep_spacing (`str`, defaults to `"linspace"`):
The way the timesteps should be scaled. Refer to Table 2. of [Common Diffusion Noise Schedules and Sample The way the timesteps should be scaled. Refer to Table 2 of the [Common Diffusion Noise Schedules and
Steps are Flawed](https://arxiv.org/abs/2305.08891) for more information. Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) for more information.
steps_offset (`int`, default `0`): steps_offset (`int`, defaults to 0):
an offset added to the inference steps. You can use a combination of `offset=1` and An offset added to the inference steps. You can use a combination of `offset=1` and
`set_alpha_to_one=False`, to make the last step use step 0 for the previous alpha product, as done in `set_alpha_to_one=False` to make the last step use step 0 for the previous alpha product like in Stable
stable diffusion. Diffusion.
""" """
_compatibles = [e.name for e in KarrasDiffusionSchedulers] _compatibles = [e.name for e in KarrasDiffusionSchedulers]
...@@ -232,12 +233,18 @@ class DPMSolverSDEScheduler(SchedulerMixin, ConfigMixin): ...@@ -232,12 +233,18 @@ class DPMSolverSDEScheduler(SchedulerMixin, ConfigMixin):
timestep: Union[float, torch.FloatTensor], timestep: Union[float, torch.FloatTensor],
) -> torch.FloatTensor: ) -> torch.FloatTensor:
""" """
Args:
Ensures interchangeability with schedulers that need to scale the denoising model input depending on the Ensures interchangeability with schedulers that need to scale the denoising model input depending on the
current timestep. current timestep.
sample (`torch.FloatTensor`): input sample timestep (`int`, optional): current timestep
Args:
sample (`torch.FloatTensor`):
The input sample.
timestep (`int`, *optional*):
The current timestep in the diffusion chain.
Returns: Returns:
`torch.FloatTensor`: scaled input sample `torch.FloatTensor`:
A scaled input sample.
""" """
step_index = self.index_for_timestep(timestep) step_index = self.index_for_timestep(timestep)
...@@ -253,13 +260,13 @@ class DPMSolverSDEScheduler(SchedulerMixin, ConfigMixin): ...@@ -253,13 +260,13 @@ class DPMSolverSDEScheduler(SchedulerMixin, ConfigMixin):
num_train_timesteps: Optional[int] = None, num_train_timesteps: Optional[int] = None,
): ):
""" """
Sets the timesteps used for the diffusion chain. Supporting function to be run before inference. Sets the discrete timesteps used for the diffusion chain (to be run before inference).
Args: Args:
num_inference_steps (`int`): num_inference_steps (`int`):
the number of diffusion steps used when generating samples with a pre-trained model. The number of diffusion steps used when generating samples with a pre-trained model.
device (`str` or `torch.device`, optional): device (`str` or `torch.device`, *optional*):
the device to which the timesteps should be moved to. If `None`, the timesteps are not moved. The device to which the timesteps should be moved to. If `None`, the timesteps are not moved.
""" """
self.num_inference_steps = num_inference_steps self.num_inference_steps = num_inference_steps
...@@ -384,18 +391,25 @@ class DPMSolverSDEScheduler(SchedulerMixin, ConfigMixin): ...@@ -384,18 +391,25 @@ class DPMSolverSDEScheduler(SchedulerMixin, ConfigMixin):
s_noise: float = 1.0, s_noise: float = 1.0,
) -> Union[SchedulerOutput, Tuple]: ) -> Union[SchedulerOutput, Tuple]:
""" """
Args: Predict the sample from the previous timestep by reversing the SDE. This function propagates the diffusion
Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion
process from the learned model outputs (most often the predicted noise). process from the learned model outputs (most often the predicted noise).
model_output (Union[torch.FloatTensor, np.ndarray]): Direct output from learned diffusion model.
timestep (Union[float, torch.FloatTensor]): Current discrete timestep in the diffusion chain. Args:
sample (Union[torch.FloatTensor, np.ndarray]): Current instance of sample being created by diffusion process. model_output (`torch.FloatTensor` or `np.ndarray`):
return_dict (bool, optional): Option for returning tuple rather than SchedulerOutput class. Defaults to True. The direct output from learned diffusion model.
s_noise (float, optional): Scaling factor for the noise added to the sample. Defaults to 1.0. timestep (`float` or `torch.FloatTensor`):
The current discrete timestep in the diffusion chain.
sample (`torch.FloatTensor` or `np.ndarray`):
A current instance of a sample created by the diffusion process.
return_dict (`bool`, *optional*, defaults to `True`):
Whether or not to return a [`~schedulers.scheduling_utils.SchedulerOutput`] or tuple.
s_noise (`float`, *optional*, defaults to 1.0):
Scaling factor for noise added to the sample.
Returns: Returns:
[`~schedulers.scheduling_utils.SchedulerOutput`] or `tuple`: [`~schedulers.scheduling_utils.SchedulerOutput`] or `tuple`:
[`~schedulers.scheduling_utils.SchedulerOutput`] if `return_dict` is True, otherwise a `tuple`. When If return_dict is `True`, [`~schedulers.scheduling_utils.SchedulerOutput`] is returned, otherwise a
returning a tuple, the first element is the sample tensor. tuple is returned where the first element is the sample tensor.
""" """
step_index = self.index_for_timestep(timestep) step_index = self.index_for_timestep(timestep)
......
...@@ -31,14 +31,14 @@ logger = logging.get_logger(__name__) # pylint: disable=invalid-name ...@@ -31,14 +31,14 @@ logger = logging.get_logger(__name__) # pylint: disable=invalid-name
# Copied from diffusers.schedulers.scheduling_ddpm.DDPMSchedulerOutput with DDPM->EulerAncestralDiscrete # Copied from diffusers.schedulers.scheduling_ddpm.DDPMSchedulerOutput with DDPM->EulerAncestralDiscrete
class EulerAncestralDiscreteSchedulerOutput(BaseOutput): class EulerAncestralDiscreteSchedulerOutput(BaseOutput):
""" """
Output class for the scheduler's step function output. Output class for the scheduler's `step` function output.
Args: Args:
prev_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images): prev_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):
Computed sample (x_{t-1}) of previous timestep. `prev_sample` should be used as next model input in the Computed sample `(x_{t-1})` of previous timestep. `prev_sample` should be used as next model input in the
denoising loop. denoising loop.
pred_original_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images): pred_original_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):
The predicted denoised sample (x_{0}) based on the model output from the current timestep. The predicted denoised sample `(x_{0})` based on the model output from the current timestep.
`pred_original_sample` can be used to preview progress or for guidance. `pred_original_sample` can be used to preview progress or for guidance.
""" """
...@@ -93,34 +93,34 @@ def betas_for_alpha_bar( ...@@ -93,34 +93,34 @@ def betas_for_alpha_bar(
class EulerAncestralDiscreteScheduler(SchedulerMixin, ConfigMixin): class EulerAncestralDiscreteScheduler(SchedulerMixin, ConfigMixin):
""" """
Ancestral sampling with Euler method steps. Based on the original k-diffusion implementation by Katherine Crowson: Ancestral sampling with Euler method steps.
https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L72
[`~ConfigMixin`] takes care of storing all config attributes that are passed in the scheduler's `__init__` This model inherits from [`SchedulerMixin`] and [`ConfigMixin`]. Check the superclass documentation for the generic
function, such as `num_train_timesteps`. They can be accessed via `scheduler.config.num_train_timesteps`. methods the library implements for all schedulers such as loading and saving.
[`SchedulerMixin`] provides general loading and saving functionality via the [`SchedulerMixin.save_pretrained`] and
[`~SchedulerMixin.from_pretrained`] functions.
Args: Args:
num_train_timesteps (`int`): number of diffusion steps used to train the model. num_train_timesteps (`int`, defaults to 1000):
beta_start (`float`): the starting `beta` value of inference. The number of diffusion steps to train the model.
beta_end (`float`): the final `beta` value. beta_start (`float`, defaults to 0.0001):
beta_schedule (`str`): The starting `beta` value of inference.
the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from beta_end (`float`, defaults to 0.02):
The final `beta` value.
beta_schedule (`str`, defaults to `"linear"`):
The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from
`linear` or `scaled_linear`. `linear` or `scaled_linear`.
trained_betas (`np.ndarray`, optional): trained_betas (`np.ndarray`, *optional*):
option to pass an array of betas directly to the constructor to bypass `beta_start`, `beta_end` etc. Pass an array of betas directly to the constructor to bypass `beta_start` and `beta_end`.
prediction_type (`str`, default `epsilon`, optional): prediction_type (`str`, defaults to `epsilon`, *optional*):
prediction type of the scheduler function, one of `epsilon` (predicting the noise of the diffusion Prediction type of the scheduler function; can be `epsilon` (predicts the noise of the diffusion process),
process), `sample` (directly predicting the noisy sample`) or `v_prediction` (see section 2.4 `sample` (directly predicts the noisy sample`) or `v_prediction` (see section 2.4 of [Imagen
https://imagen.research.google/video/paper.pdf) Video](https://imagen.research.google/video/paper.pdf) paper).
timestep_spacing (`str`, default `"linspace"`): timestep_spacing (`str`, defaults to `"linspace"`):
The way the timesteps should be scaled. Refer to Table 2. of [Common Diffusion Noise Schedules and Sample The way the timesteps should be scaled. Refer to Table 2 of the [Common Diffusion Noise Schedules and
Steps are Flawed](https://arxiv.org/abs/2305.08891) for more information. Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) for more information.
steps_offset (`int`, default `0`): steps_offset (`int`, defaults to 0):
an offset added to the inference steps. You can use a combination of `offset=1` and An offset added to the inference steps. You can use a combination of `offset=1` and
`set_alpha_to_one=False`, to make the last step use step 0 for the previous alpha product, as done in `set_alpha_to_one=False` to make the last step use step 0 for the previous alpha product like in Stable
stable diffusion. Diffusion.
""" """
_compatibles = [e.name for e in KarrasDiffusionSchedulers] _compatibles = [e.name for e in KarrasDiffusionSchedulers]
...@@ -178,14 +178,18 @@ class EulerAncestralDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -178,14 +178,18 @@ class EulerAncestralDiscreteScheduler(SchedulerMixin, ConfigMixin):
self, sample: torch.FloatTensor, timestep: Union[float, torch.FloatTensor] self, sample: torch.FloatTensor, timestep: Union[float, torch.FloatTensor]
) -> torch.FloatTensor: ) -> torch.FloatTensor:
""" """
Scales the denoising model input by `(sigma**2 + 1) ** 0.5` to match the Euler algorithm. Ensures interchangeability with schedulers that need to scale the denoising model input depending on the
current timestep. Scales the denoising model input by `(sigma**2 + 1) ** 0.5` to match the Euler algorithm.
Args: Args:
sample (`torch.FloatTensor`): input sample sample (`torch.FloatTensor`):
timestep (`float` or `torch.FloatTensor`): the current timestep in the diffusion chain The input sample.
timestep (`int`, *optional*):
The current timestep in the diffusion chain.
Returns: Returns:
`torch.FloatTensor`: scaled input sample `torch.FloatTensor`:
A scaled input sample.
""" """
if isinstance(timestep, torch.Tensor): if isinstance(timestep, torch.Tensor):
timestep = timestep.to(self.timesteps.device) timestep = timestep.to(self.timesteps.device)
...@@ -197,13 +201,13 @@ class EulerAncestralDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -197,13 +201,13 @@ class EulerAncestralDiscreteScheduler(SchedulerMixin, ConfigMixin):
def set_timesteps(self, num_inference_steps: int, device: Union[str, torch.device] = None): def set_timesteps(self, num_inference_steps: int, device: Union[str, torch.device] = None):
""" """
Sets the timesteps used for the diffusion chain. Supporting function to be run before inference. Sets the discrete timesteps used for the diffusion chain (to be run before inference).
Args: Args:
num_inference_steps (`int`): num_inference_steps (`int`):
the number of diffusion steps used when generating samples with a pre-trained model. The number of diffusion steps used when generating samples with a pre-trained model.
device (`str` or `torch.device`, optional): device (`str` or `torch.device`, *optional*):
the device to which the timesteps should be moved to. If `None`, the timesteps are not moved. The device to which the timesteps should be moved to. If `None`, the timesteps are not moved.
""" """
self.num_inference_steps = num_inference_steps self.num_inference_steps = num_inference_steps
...@@ -248,21 +252,27 @@ class EulerAncestralDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -248,21 +252,27 @@ class EulerAncestralDiscreteScheduler(SchedulerMixin, ConfigMixin):
return_dict: bool = True, return_dict: bool = True,
) -> Union[EulerAncestralDiscreteSchedulerOutput, Tuple]: ) -> Union[EulerAncestralDiscreteSchedulerOutput, Tuple]:
""" """
Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion Predict the sample from the previous timestep by reversing the SDE. This function propagates the diffusion
process from the learned model outputs (most often the predicted noise). process from the learned model outputs (most often the predicted noise).
Args: Args:
model_output (`torch.FloatTensor`): direct output from learned diffusion model. model_output (`torch.FloatTensor`):
timestep (`float`): current timestep in the diffusion chain. The direct output from learned diffusion model.
timestep (`float`):
The current discrete timestep in the diffusion chain.
sample (`torch.FloatTensor`): sample (`torch.FloatTensor`):
current instance of sample being created by diffusion process. A current instance of a sample created by the diffusion process.
generator (`torch.Generator`, optional): Random number generator. generator (`torch.Generator`, *optional*):
return_dict (`bool`): option for returning tuple rather than EulerAncestralDiscreteSchedulerOutput class A random number generator.
return_dict (`bool`):
Whether or not to return a
[`~schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteSchedulerOutput`] or tuple.
Returns: Returns:
[`~schedulers.scheduling_utils.EulerAncestralDiscreteSchedulerOutput`] or `tuple`: [`~schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteSchedulerOutput`] or `tuple`:
[`~schedulers.scheduling_utils.EulerAncestralDiscreteSchedulerOutput`] if `return_dict` is True, otherwise If return_dict is `True`,
a `tuple`. When returning a tuple, the first element is the sample tensor. [`~schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteSchedulerOutput`] is returned,
otherwise a tuple is returned where the first element is the sample tensor.
""" """
......
...@@ -31,14 +31,14 @@ logger = logging.get_logger(__name__) # pylint: disable=invalid-name ...@@ -31,14 +31,14 @@ logger = logging.get_logger(__name__) # pylint: disable=invalid-name
# Copied from diffusers.schedulers.scheduling_ddpm.DDPMSchedulerOutput with DDPM->EulerDiscrete # Copied from diffusers.schedulers.scheduling_ddpm.DDPMSchedulerOutput with DDPM->EulerDiscrete
class EulerDiscreteSchedulerOutput(BaseOutput): class EulerDiscreteSchedulerOutput(BaseOutput):
""" """
Output class for the scheduler's step function output. Output class for the scheduler's `step` function output.
Args: Args:
prev_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images): prev_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):
Computed sample (x_{t-1}) of previous timestep. `prev_sample` should be used as next model input in the Computed sample `(x_{t-1})` of previous timestep. `prev_sample` should be used as next model input in the
denoising loop. denoising loop.
pred_original_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images): pred_original_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):
The predicted denoised sample (x_{0}) based on the model output from the current timestep. The predicted denoised sample `(x_{0})` based on the model output from the current timestep.
`pred_original_sample` can be used to preview progress or for guidance. `pred_original_sample` can be used to preview progress or for guidance.
""" """
...@@ -93,42 +93,40 @@ def betas_for_alpha_bar( ...@@ -93,42 +93,40 @@ def betas_for_alpha_bar(
class EulerDiscreteScheduler(SchedulerMixin, ConfigMixin): class EulerDiscreteScheduler(SchedulerMixin, ConfigMixin):
""" """
Euler scheduler (Algorithm 2) from Karras et al. (2022) https://arxiv.org/abs/2206.00364. . Based on the original Euler scheduler.
k-diffusion implementation by Katherine Crowson:
https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L51
[`~ConfigMixin`] takes care of storing all config attributes that are passed in the scheduler's `__init__` This model inherits from [`SchedulerMixin`] and [`ConfigMixin`]. Check the superclass documentation for the generic
function, such as `num_train_timesteps`. They can be accessed via `scheduler.config.num_train_timesteps`. methods the library implements for all schedulers such as loading and saving.
[`SchedulerMixin`] provides general loading and saving functionality via the [`SchedulerMixin.save_pretrained`] and
[`~SchedulerMixin.from_pretrained`] functions.
Args: Args:
num_train_timesteps (`int`): number of diffusion steps used to train the model. num_train_timesteps (`int`, defaults to 1000):
beta_start (`float`): the starting `beta` value of inference. The number of diffusion steps to train the model.
beta_end (`float`): the final `beta` value. beta_start (`float`, defaults to 0.0001):
beta_schedule (`str`): The starting `beta` value of inference.
the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from beta_end (`float`, defaults to 0.02):
The final `beta` value.
beta_schedule (`str`, defaults to `"linear"`):
The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from
`linear` or `scaled_linear`. `linear` or `scaled_linear`.
trained_betas (`np.ndarray`, optional): trained_betas (`np.ndarray`, *optional*):
option to pass an array of betas directly to the constructor to bypass `beta_start`, `beta_end` etc. Pass an array of betas directly to the constructor to bypass `beta_start` and `beta_end`.
prediction_type (`str`, default `"epsilon"`, optional): prediction_type (`str`, defaults to `epsilon`, *optional*):
prediction type of the scheduler function, one of `epsilon` (predicting the noise of the diffusion Prediction type of the scheduler function; can be `epsilon` (predicts the noise of the diffusion process),
process), `sample` (directly predicting the noisy sample`) or `v_prediction` (see section 2.4 `sample` (directly predicts the noisy sample`) or `v_prediction` (see section 2.4 of [Imagen
https://imagen.research.google/video/paper.pdf) Video](https://imagen.research.google/video/paper.pdf) paper).
interpolation_type (`str`, default `"linear"`, optional): interpolation_type(`str`, defaults to `"linear"`, *optional*):
interpolation type to compute intermediate sigmas for the scheduler denoising steps. Should be one of The interpolation type to compute intermediate sigmas for the scheduler denoising steps. Should be on of
[`"linear"`, `"log_linear"`]. `"linear"` or `"log_linear"`.
use_karras_sigmas (`bool`, *optional*, defaults to `False`): use_karras_sigmas (`bool`, *optional*, defaults to `False`):
This parameter controls whether to use Karras sigmas (Karras et al. (2022) scheme) for step sizes in the Whether to use Karras sigmas for step sizes in the noise schedule during the sampling process. If `True`,
noise schedule during the sampling process. If True, the sigmas will be determined according to a sequence the sigmas are determined according to a sequence of noise levels {σi}.
of noise levels {σi} as defined in Equation (5) of the paper https://arxiv.org/pdf/2206.00364.pdf. timestep_spacing (`str`, defaults to `"linspace"`):
timestep_spacing (`str`, default `"linspace"`): The way the timesteps should be scaled. Refer to Table 2 of the [Common Diffusion Noise Schedules and
The way the timesteps should be scaled. Refer to Table 2. of [Common Diffusion Noise Schedules and Sample Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) for more information.
Steps are Flawed](https://arxiv.org/abs/2305.08891) for more information. steps_offset (`int`, defaults to 0):
steps_offset (`int`, default `0`): An offset added to the inference steps. You can use a combination of `offset=1` and
an offset added to the inference steps. You can use a combination of `offset=1` and `set_alpha_to_one=False` to make the last step use step 0 for the previous alpha product like in Stable
`set_alpha_to_one=False`, to make the last step use step 0 for the previous alpha product, as done in Diffusion.
stable diffusion.
""" """
_compatibles = [e.name for e in KarrasDiffusionSchedulers] _compatibles = [e.name for e in KarrasDiffusionSchedulers]
...@@ -189,14 +187,18 @@ class EulerDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -189,14 +187,18 @@ class EulerDiscreteScheduler(SchedulerMixin, ConfigMixin):
self, sample: torch.FloatTensor, timestep: Union[float, torch.FloatTensor] self, sample: torch.FloatTensor, timestep: Union[float, torch.FloatTensor]
) -> torch.FloatTensor: ) -> torch.FloatTensor:
""" """
Scales the denoising model input by `(sigma**2 + 1) ** 0.5` to match the Euler algorithm. Ensures interchangeability with schedulers that need to scale the denoising model input depending on the
current timestep. Scales the denoising model input by `(sigma**2 + 1) ** 0.5` to match the Euler algorithm.
Args: Args:
sample (`torch.FloatTensor`): input sample sample (`torch.FloatTensor`):
timestep (`float` or `torch.FloatTensor`): the current timestep in the diffusion chain The input sample.
timestep (`int`, *optional*):
The current timestep in the diffusion chain.
Returns: Returns:
`torch.FloatTensor`: scaled input sample `torch.FloatTensor`:
A scaled input sample.
""" """
if isinstance(timestep, torch.Tensor): if isinstance(timestep, torch.Tensor):
timestep = timestep.to(self.timesteps.device) timestep = timestep.to(self.timesteps.device)
...@@ -210,13 +212,13 @@ class EulerDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -210,13 +212,13 @@ class EulerDiscreteScheduler(SchedulerMixin, ConfigMixin):
def set_timesteps(self, num_inference_steps: int, device: Union[str, torch.device] = None): def set_timesteps(self, num_inference_steps: int, device: Union[str, torch.device] = None):
""" """
Sets the timesteps used for the diffusion chain. Supporting function to be run before inference. Sets the discrete timesteps used for the diffusion chain (to be run before inference).
Args: Args:
num_inference_steps (`int`): num_inference_steps (`int`):
the number of diffusion steps used when generating samples with a pre-trained model. The number of diffusion steps used when generating samples with a pre-trained model.
device (`str` or `torch.device`, optional): device (`str` or `torch.device`, *optional*):
the device to which the timesteps should be moved to. If `None`, the timesteps are not moved. The device to which the timesteps should be moved to. If `None`, the timesteps are not moved.
""" """
self.num_inference_steps = num_inference_steps self.num_inference_steps = num_inference_steps
...@@ -317,26 +319,31 @@ class EulerDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -317,26 +319,31 @@ class EulerDiscreteScheduler(SchedulerMixin, ConfigMixin):
return_dict: bool = True, return_dict: bool = True,
) -> Union[EulerDiscreteSchedulerOutput, Tuple]: ) -> Union[EulerDiscreteSchedulerOutput, Tuple]:
""" """
Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion Predict the sample from the previous timestep by reversing the SDE. This function propagates the diffusion
process from the learned model outputs (most often the predicted noise). process from the learned model outputs (most often the predicted noise).
Args: Args:
model_output (`torch.FloatTensor`): direct output from learned diffusion model. model_output (`torch.FloatTensor`):
timestep (`float`): current timestep in the diffusion chain. The direct output from learned diffusion model.
timestep (`float`):
The current discrete timestep in the diffusion chain.
sample (`torch.FloatTensor`): sample (`torch.FloatTensor`):
current instance of sample being created by diffusion process. A current instance of a sample created by the diffusion process.
s_churn (`float`) s_churn (`float`):
s_tmin (`float`) s_tmin (`float`):
s_tmax (`float`) s_tmax (`float`):
s_noise (`float`) s_noise (`float`, defaults to 1.0):
generator (`torch.Generator`, optional): Random number generator. Scaling factor for noise added to the sample.
return_dict (`bool`): option for returning tuple rather than EulerDiscreteSchedulerOutput class generator (`torch.Generator`, *optional*):
A random number generator.
return_dict (`bool`):
Whether or not to return a [`~schedulers.scheduling_euler_discrete.EulerDiscreteSchedulerOutput`] or
tuple.
Returns: Returns:
[`~schedulers.scheduling_utils.EulerDiscreteSchedulerOutput`] or `tuple`: [`~schedulers.scheduling_euler_discrete.EulerDiscreteSchedulerOutput`] or `tuple`:
[`~schedulers.scheduling_utils.EulerDiscreteSchedulerOutput`] if `return_dict` is True, otherwise a If return_dict is `True`, [`~schedulers.scheduling_euler_discrete.EulerDiscreteSchedulerOutput`] is
`tuple`. When returning a tuple, the first element is the sample tensor. returned, otherwise a tuple is returned where the first element is the sample tensor.
""" """
if ( if (
......
...@@ -70,41 +70,41 @@ def betas_for_alpha_bar( ...@@ -70,41 +70,41 @@ def betas_for_alpha_bar(
class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin): class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin):
""" """
Implements Algorithm 2 (Heun steps) from Karras et al. (2022). for discrete beta schedules. Based on the original Scheduler with Heun steps for discrete beta schedules.
k-diffusion implementation by Katherine Crowson:
https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L90
[`~ConfigMixin`] takes care of storing all config attributes that are passed in the scheduler's `__init__` This model inherits from [`SchedulerMixin`] and [`ConfigMixin`]. Check the superclass documentation for the generic
function, such as `num_train_timesteps`. They can be accessed via `scheduler.config.num_train_timesteps`. methods the library implements for all schedulers such as loading and saving.
[`SchedulerMixin`] provides general loading and saving functionality via the [`SchedulerMixin.save_pretrained`] and
[`~SchedulerMixin.from_pretrained`] functions.
Args: Args:
num_train_timesteps (`int`): number of diffusion steps used to train the model. beta_start (`float`): the num_train_timesteps (`int`, defaults to 1000):
starting `beta` value of inference. beta_end (`float`): the final `beta` value. beta_schedule (`str`): The number of diffusion steps to train the model.
the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from beta_start (`float`, defaults to 0.0001):
The starting `beta` value of inference.
beta_end (`float`, defaults to 0.02):
The final `beta` value.
beta_schedule (`str`, defaults to `"linear"`):
The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from
`linear` or `scaled_linear`. `linear` or `scaled_linear`.
trained_betas (`np.ndarray`, optional): trained_betas (`np.ndarray`, *optional*):
option to pass an array of betas directly to the constructor to bypass `beta_start`, `beta_end` etc. Pass an array of betas directly to the constructor to bypass `beta_start` and `beta_end`.
prediction_type (`str`, default `epsilon`, optional): prediction_type (`str`, defaults to `epsilon`, *optional*):
prediction type of the scheduler function, one of `epsilon` (predicting the noise of the diffusion Prediction type of the scheduler function; can be `epsilon` (predicts the noise of the diffusion process),
process), `sample` (directly predicting the noisy sample`) or `v_prediction` (see section 2.4 `sample` (directly predicts the noisy sample`) or `v_prediction` (see section 2.4 of [Imagen
https://imagen.research.google/video/paper.pdf). Video](https://imagen.research.google/video/paper.pdf) paper).
clip_sample (`bool`, default `True`): clip_sample (`bool`, defaults to `True`):
option to clip predicted sample for numerical stability. Clip the predicted sample for numerical stability.
clip_sample_range (`float`, default `1.0`): clip_sample_range (`float`, defaults to 1.0):
the maximum magnitude for sample clipping. Valid only when `clip_sample=True`. The maximum magnitude for sample clipping. Valid only when `clip_sample=True`.
use_karras_sigmas (`bool`, *optional*, defaults to `False`): use_karras_sigmas (`bool`, *optional*, defaults to `False`):
This parameter controls whether to use Karras sigmas (Karras et al. (2022) scheme) for step sizes in the Whether to use Karras sigmas for step sizes in the noise schedule during the sampling process. If `True`,
noise schedule during the sampling process. If True, the sigmas will be determined according to a sequence the sigmas are determined according to a sequence of noise levels {σi}.
of noise levels {σi} as defined in Equation (5) of the paper https://arxiv.org/pdf/2206.00364.pdf. timestep_spacing (`str`, defaults to `"linspace"`):
timestep_spacing (`str`, default `"linspace"`): The way the timesteps should be scaled. Refer to Table 2 of the [Common Diffusion Noise Schedules and
The way the timesteps should be scaled. Refer to Table 2. of [Common Diffusion Noise Schedules and Sample Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) for more information.
Steps are Flawed](https://arxiv.org/abs/2305.08891) for more information. steps_offset (`int`, defaults to 0):
steps_offset (`int`, default `0`): An offset added to the inference steps. You can use a combination of `offset=1` and
an offset added to the inference steps. You can use a combination of `offset=1` and `set_alpha_to_one=False` to make the last step use step 0 for the previous alpha product like in Stable
`set_alpha_to_one=False`, to make the last step use step 0 for the previous alpha product, as done in Diffusion.
stable diffusion.
""" """
_compatibles = [e.name for e in KarrasDiffusionSchedulers] _compatibles = [e.name for e in KarrasDiffusionSchedulers]
...@@ -181,12 +181,18 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -181,12 +181,18 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin):
timestep: Union[float, torch.FloatTensor], timestep: Union[float, torch.FloatTensor],
) -> torch.FloatTensor: ) -> torch.FloatTensor:
""" """
Args:
Ensures interchangeability with schedulers that need to scale the denoising model input depending on the Ensures interchangeability with schedulers that need to scale the denoising model input depending on the
current timestep. current timestep.
sample (`torch.FloatTensor`): input sample timestep (`int`, optional): current timestep
Args:
sample (`torch.FloatTensor`):
The input sample.
timestep (`int`, *optional*):
The current timestep in the diffusion chain.
Returns: Returns:
`torch.FloatTensor`: scaled input sample `torch.FloatTensor`:
A scaled input sample.
""" """
step_index = self.index_for_timestep(timestep) step_index = self.index_for_timestep(timestep)
...@@ -201,13 +207,13 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -201,13 +207,13 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin):
num_train_timesteps: Optional[int] = None, num_train_timesteps: Optional[int] = None,
): ):
""" """
Sets the timesteps used for the diffusion chain. Supporting function to be run before inference. Sets the discrete timesteps used for the diffusion chain (to be run before inference).
Args: Args:
num_inference_steps (`int`): num_inference_steps (`int`):
the number of diffusion steps used when generating samples with a pre-trained model. The number of diffusion steps used when generating samples with a pre-trained model.
device (`str` or `torch.device`, optional): device (`str` or `torch.device`, *optional*):
the device to which the timesteps should be moved to. If `None`, the timesteps are not moved. The device to which the timesteps should be moved to. If `None`, the timesteps are not moved.
""" """
self.num_inference_steps = num_inference_steps self.num_inference_steps = num_inference_steps
...@@ -312,17 +318,23 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin): ...@@ -312,17 +318,23 @@ class HeunDiscreteScheduler(SchedulerMixin, ConfigMixin):
return_dict: bool = True, return_dict: bool = True,
) -> Union[SchedulerOutput, Tuple]: ) -> Union[SchedulerOutput, Tuple]:
""" """
Args: Predict the sample from the previous timestep by reversing the SDE. This function propagates the diffusion
Predict the sample at the previous timestep by reversing the SDE. Core function to propagate the diffusion
process from the learned model outputs (most often the predicted noise). process from the learned model outputs (most often the predicted noise).
model_output (`torch.FloatTensor` or `np.ndarray`): direct output from learned diffusion model. timestep
(`int`): current discrete timestep in the diffusion chain. sample (`torch.FloatTensor` or `np.ndarray`): Args:
current instance of sample being created by diffusion process. model_output (`torch.FloatTensor`):
return_dict (`bool`): option for returning tuple rather than SchedulerOutput class The direct output from learned diffusion model.
timestep (`float`):
The current discrete timestep in the diffusion chain.
sample (`torch.FloatTensor`):
A current instance of a sample created by the diffusion process.
return_dict (`bool`):
Whether or not to return a [`~schedulers.scheduling_utils.SchedulerOutput`] or tuple.
Returns: Returns:
[`~schedulers.scheduling_utils.SchedulerOutput`] or `tuple`: [`~schedulers.scheduling_utils.SchedulerOutput`] or `tuple`:
[`~schedulers.scheduling_utils.SchedulerOutput`] if `return_dict` is True, otherwise a `tuple`. When If return_dict is `True`, [`~schedulers.scheduling_utils.SchedulerOutput`] is returned, otherwise a
returning a tuple, the first element is the sample tensor. tuple is returned where the first element is the sample tensor.
""" """
step_index = self.index_for_timestep(timestep) step_index = self.index_for_timestep(timestep)
......
...@@ -24,18 +24,16 @@ from .scheduling_utils import SchedulerMixin, SchedulerOutput ...@@ -24,18 +24,16 @@ from .scheduling_utils import SchedulerMixin, SchedulerOutput
class IPNDMScheduler(SchedulerMixin, ConfigMixin): class IPNDMScheduler(SchedulerMixin, ConfigMixin):
""" """
Improved Pseudo numerical methods for diffusion models (iPNDM) ported from @crowsonkb's amazing k-diffusion A fourth-order Improved Pseudo Linear Multistep scheduler.
[library](https://github.com/crowsonkb/v-diffusion-pytorch/blob/987f8985e38208345c1959b0ea767a625831cc9b/diffusion/sampling.py#L296)
[`~ConfigMixin`] takes care of storing all config attributes that are passed in the scheduler's `__init__` This model inherits from [`SchedulerMixin`] and [`ConfigMixin`]. Check the superclass documentation for the generic
function, such as `num_train_timesteps`. They can be accessed via `scheduler.config.num_train_timesteps`. methods the library implements for all schedulers such as loading and saving.
[`SchedulerMixin`] provides general loading and saving functionality via the [`SchedulerMixin.save_pretrained`] and
[`~SchedulerMixin.from_pretrained`] functions.
For more details, see the original paper: https://arxiv.org/abs/2202.09778
Args: Args:
num_train_timesteps (`int`): number of diffusion steps used to train the model. num_train_timesteps (`int`, defaults to 1000):
The number of diffusion steps to train the model.
trained_betas (`np.ndarray`, *optional*):
Pass an array of betas directly to the constructor to bypass `beta_start` and `beta_end`.
""" """
order = 1 order = 1
...@@ -60,11 +58,13 @@ class IPNDMScheduler(SchedulerMixin, ConfigMixin): ...@@ -60,11 +58,13 @@ class IPNDMScheduler(SchedulerMixin, ConfigMixin):
def set_timesteps(self, num_inference_steps: int, device: Union[str, torch.device] = None): def set_timesteps(self, num_inference_steps: int, device: Union[str, torch.device] = None):
""" """
Sets the discrete timesteps used for the diffusion chain. Supporting function to be run before inference. Sets the discrete timesteps used for the diffusion chain (to be run before inference).
Args: Args:
num_inference_steps (`int`): num_inference_steps (`int`):
the number of diffusion steps used when generating samples with a pre-trained model. The number of diffusion steps used when generating samples with a pre-trained model.
device (`str` or `torch.device`, *optional*):
The device to which the timesteps should be moved to. If `None`, the timesteps are not moved.
""" """
self.num_inference_steps = num_inference_steps self.num_inference_steps = num_inference_steps
steps = torch.linspace(1, 0, num_inference_steps + 1)[:-1] steps = torch.linspace(1, 0, num_inference_steps + 1)[:-1]
...@@ -90,20 +90,23 @@ class IPNDMScheduler(SchedulerMixin, ConfigMixin): ...@@ -90,20 +90,23 @@ class IPNDMScheduler(SchedulerMixin, ConfigMixin):
return_dict: bool = True, return_dict: bool = True,
) -> Union[SchedulerOutput, Tuple]: ) -> Union[SchedulerOutput, Tuple]:
""" """
Step function propagating the sample with the linear multi-step method. This has one forward pass with multiple Predict the sample from the previous timestep by reversing the SDE. This function propagates the sample with
times to approximate the solution. the linear multistep method. It performs one forward pass multiple times to approximate the solution.
Args: Args:
model_output (`torch.FloatTensor`): direct output from learned diffusion model. model_output (`torch.FloatTensor`):
timestep (`int`): current discrete timestep in the diffusion chain. The direct output from learned diffusion model.
timestep (`int`):
The current discrete timestep in the diffusion chain.
sample (`torch.FloatTensor`): sample (`torch.FloatTensor`):
current instance of sample being created by diffusion process. A current instance of a sample created by the diffusion process.
return_dict (`bool`): option for returning tuple rather than SchedulerOutput class return_dict (`bool`):
Whether or not to return a [`~schedulers.scheduling_utils.SchedulerOutput`] or tuple.
Returns: Returns:
[`~scheduling_utils.SchedulerOutput`] or `tuple`: [`~scheduling_utils.SchedulerOutput`] if `return_dict` is [`~schedulers.scheduling_utils.SchedulerOutput`] or `tuple`:
True, otherwise a `tuple`. When returning a tuple, the first element is the sample tensor. If return_dict is `True`, [`~schedulers.scheduling_utils.SchedulerOutput`] is returned, otherwise a
tuple is returned where the first element is the sample tensor.
""" """
if self.num_inference_steps is None: if self.num_inference_steps is None:
raise ValueError( raise ValueError(
...@@ -138,10 +141,12 @@ class IPNDMScheduler(SchedulerMixin, ConfigMixin): ...@@ -138,10 +141,12 @@ class IPNDMScheduler(SchedulerMixin, ConfigMixin):
current timestep. current timestep.
Args: Args:
sample (`torch.FloatTensor`): input sample sample (`torch.FloatTensor`):
The input sample.
Returns: Returns:
`torch.FloatTensor`: scaled input sample `torch.FloatTensor`:
A scaled input sample.
""" """
return sample return sample
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment