Unverified Commit a359ff76 authored by M. Tolga Cangöz's avatar M. Tolga Cangöz Committed by GitHub
Browse files

[`Docs`] Fix typos and update files at API's Main Classes, Models, and Schedulers pages (#5720)



* Fix typos, update, add Copyright info, and trim trailing whitespaces

* Update docs/source/en/api/loaders.md
Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/models/autoencoder_tiny.md
Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/models/autoencoder_tiny.md
Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>

---------
Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
parent 4b45a1e1
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Normalization layers
Customized normalization layers for supporting various models in 🤗 Diffusers.
......@@ -10,6 +22,10 @@ Customized normalization layers for supporting various models in 🤗 Diffusers.
[[autodoc]] models.normalization.AdaLayerNormZero
## AdaLayerNormSingle
[[autodoc]] models.normalization.AdaLayerNormSingle
## AdaGroupNorm
[[autodoc]] models.normalization.AdaGroupNorm
\ No newline at end of file
[[autodoc]] models.normalization.AdaGroupNorm
......@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# Outputs
All models outputs are subclasses of [`~utils.BaseOutput`], data structures containing all the information returned by the model. The outputs can also be used as tuples or dictionaries.
All model outputs are subclasses of [`~utils.BaseOutput`], data structures containing all the information returned by the model. The outputs can also be used as tuples or dictionaries.
For example:
......@@ -64,4 +64,4 @@ To check a specific pipeline or model output, refer to its corresponding API doc
## ImageTextPipelineOutput
[[autodoc]] ImageTextPipelineOutput
\ No newline at end of file
[[autodoc]] ImageTextPipelineOutput
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# CMStochasticIterativeScheduler
[Consistency Models](https://huggingface.co/papers/2303.01469) by Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever introduced a multistep and onestep scheduler (Algorithm 1) that is capable of generating good samples in one or a small number of steps.
The abstract from the paper is:
*Diffusion models have made significant breakthroughs in image, audio, and video generation, but they depend on an iterative generation process that causes slow sampling speed and caps their potential for real-time applications. To overcome this limitation, we propose consistency models, a new family of generative models that achieve high sample quality without adversarial training. They support fast one-step generation by design, while still allowing for few-step sampling to trade compute for sample quality. They also support zero-shot data editing, like image inpainting, colorization, and super-resolution, without requiring explicit training on these tasks. Consistency models can be trained either as a way to distill pre-trained diffusion models, or as standalone generative models. Through extensive experiments, we demonstrate that they outperform existing distillation techniques for diffusion models in one- and few-step generation. For example, we achieve the new state-of-the-art FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 for one-step generation. When trained as standalone generative models, consistency models also outperform single-step, non-adversarial generative models on standard benchmarks like CIFAR-10, ImageNet 64x64 and LSUN 256x256.*
*Diffusion models have significantly advanced the fields of image, audio, and video generation, but they depend on an iterative sampling process that causes slow generation. To overcome this limitation, we propose consistency models, a new family of models that generate high quality samples by directly mapping noise to data. They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality. They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training on these tasks. Consistency models can be trained either by distilling pre-trained diffusion models, or as standalone generative models altogether. Through extensive experiments, we demonstrate that they outperform existing distillation techniques for diffusion models in one- and few-step sampling, achieving the new state-of-the-art FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 for one-step generation. When trained in isolation, consistency models become a new family of generative models that can outperform existing one-step, non-adversarial generative models on standard benchmarks such as CIFAR-10, ImageNet 64x64 and LSUN 256x256.*
The original codebase can be found at [openai/consistency_models](https://github.com/openai/consistency_models).
......@@ -12,4 +24,4 @@ The original codebase can be found at [openai/consistency_models](https://github
[[autodoc]] CMStochasticIterativeScheduler
## CMStochasticIterativeSchedulerOutput
[[autodoc]] schedulers.scheduling_consistency_models.CMStochasticIterativeSchedulerOutput
\ No newline at end of file
[[autodoc]] schedulers.scheduling_consistency_models.CMStochasticIterativeSchedulerOutput
......@@ -16,13 +16,11 @@ specific language governing permissions and limitations under the License.
The abstract from the paper is:
*Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training,
yet they require simulating a Markov chain for many steps to produce a sample.
*Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample.
To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models
with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process.
with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process.
We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from.
We empirically demonstrate that DDIMs can produce high quality samples 10× to 50× faster in terms of wall-clock time compared to DDPMs, allow us to trade off
computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.*
We empirically demonstrate that DDIMs can produce high quality samples 10× to 50× faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.*
The original codebase of this paper can be found at [ermongroup/ddim](https://github.com/ermongroup/ddim), and you can contact the author on [tsong.me](https://tsong.me/).
......@@ -57,13 +55,14 @@ pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config, timestep_spaci
4. rescale classifier-free guidance to prevent over-exposure
```py
image = pipeline(prompt, guidance_rescale=0.7).images[0]
image = pipe(prompt, guidance_rescale=0.7).images[0]
```
For example:
```py
from diffusers import DiffusionPipeline, DDIMScheduler
import torch
pipe = DiffusionPipeline.from_pretrained("ptx0/pseudo-journey-v2", torch_dtype=torch.float16)
pipe.scheduler = DDIMScheduler.from_config(
......@@ -72,7 +71,8 @@ pipe.scheduler = DDIMScheduler.from_config(
pipe.to("cuda")
prompt = "A lion in galaxies, spirals, nebulae, stars, smoke, iridescent, intricate detail, octane render, 8k"
image = pipeline(prompt, guidance_rescale=0.7).images[0]
image = pipe(prompt, guidance_rescale=0.7).images[0]
image
```
## DDIMScheduler
......
......@@ -13,7 +13,7 @@ specific language governing permissions and limitations under the License.
# DDIMInverseScheduler
`DDIMInverseScheduler` is the inverted scheduler from [Denoising Diffusion Implicit Models](https://huggingface.co/papers/2010.02502) (DDIM) by Jiaming Song, Chenlin Meng and Stefano Ermon.
The implementation is mostly based on the DDIM inversion definition from [Null-text Inversion for Editing Real Images using Guided Diffusion Models](https://huggingface.co/papers/2211.09794.pdf).
The implementation is mostly based on the DDIM inversion definition from [Null-text Inversion for Editing Real Images using Guided Diffusion Models](https://huggingface.co/papers/2211.09794).
## DDIMInverseScheduler
[[autodoc]] DDIMInverseScheduler
......@@ -16,10 +16,10 @@ specific language governing permissions and limitations under the License.
The abstract from the paper is:
*We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN.*
*We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. Our implementation is available at [this https URL](https://github.com/hojonathanho/diffusion).*
## DDPMScheduler
[[autodoc]] DDPMScheduler
## DDPMSchedulerOutput
[[autodoc]] schedulers.scheduling_ddpm.DDPMSchedulerOutput
\ No newline at end of file
[[autodoc]] schedulers.scheduling_ddpm.DDPMSchedulerOutput
......@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# DEISMultistepScheduler
Diffusion Exponential Integrator Sampler (DEIS) is proposed in [Fast Sampling of Diffusion Models with Exponential Integrator](https://huggingface.co/papers/2204.13902) by Qinsheng Zhang and Yongxin Chen. `DEISMultistepScheduler` is a fast high order solver for diffusion ordinary differential equations (ODEs).
Diffusion Exponential Integrator Sampler (DEIS) is proposed in [Fast Sampling of Diffusion Models with Exponential Integrator](https://huggingface.co/papers/2204.13902) by Qinsheng Zhang and Yongxin Chen. `DEISMultistepScheduler` is a fast high order solver for diffusion ordinary differential equations (ODEs).
This implementation modifies the polynomial fitting formula in log-rho space instead of the original linear `t` space in the DEIS paper. The modification enjoys closed-form coefficients for exponential multistep update instead of replying on the numerical solver.
......@@ -20,8 +20,6 @@ The abstract from the paper is:
*The past few years have witnessed the great success of Diffusion models~(DMs) in generating high-fidelity samples in generative modeling tasks. A major limitation of the DM is its notoriously slow sampling procedure which normally requires hundreds to thousands of time discretization steps of the learned diffusion process to reach the desired accuracy. Our goal is to develop a fast sampling method for DMs with a much less number of steps while retaining high sample quality. To this end, we systematically analyze the sampling procedure in DMs and identify key factors that affect the sample quality, among which the method of discretization is most crucial. By carefully examining the learned diffusion process, we propose Diffusion Exponential Integrator Sampler~(DEIS). It is based on the Exponential Integrator designed for discretizing ordinary differential equations (ODEs) and leverages a semilinear structure of the learned diffusion process to reduce the discretization error. The proposed method can be applied to any DMs and can generate high-fidelity samples in as few as 10 steps. In our experiments, it takes about 3 minutes on one A6000 GPU to generate 50k images from CIFAR10. Moreover, by directly using pre-trained DMs, we achieve the state-of-art sampling performance when the number of score function evaluation~(NFE) is limited, e.g., 4.17 FID with 10 NFEs, 3.37 FID, and 9.74 IS with only 15 NFEs on CIFAR10. Code is available at [this https URL](https://github.com/qsh-zh/deis).*
The original codebase can be found at [qsh-zh/deis](https://github.com/qsh-zh/deis).
## Tips
It is recommended to set `solver_order` to 2 or 3, while `solver_order=1` is equivalent to [`DDIMScheduler`].
......@@ -33,4 +31,4 @@ diffusion models, you can set `thresholding=True` to use the dynamic thresholdin
[[autodoc]] DEISMultistepScheduler
## SchedulerOutput
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
\ No newline at end of file
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
......@@ -20,4 +20,4 @@ The original codebase can be found at [crowsonkb/k-diffusion](https://github.com
[[autodoc]] KDPM2DiscreteScheduler
## SchedulerOutput
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
\ No newline at end of file
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
......@@ -20,4 +20,4 @@ The original codebase can be found at [crowsonkb/k-diffusion](https://github.com
[[autodoc]] KDPM2AncestralDiscreteScheduler
## SchedulerOutput
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
\ No newline at end of file
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
......@@ -18,4 +18,4 @@ The `DPMSolverSDEScheduler` is inspired by the stochastic sampler from the [Eluc
[[autodoc]] DPMSolverSDEScheduler
## SchedulerOutput
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
\ No newline at end of file
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
......@@ -19,4 +19,4 @@ The Euler scheduler (Algorithm 2) is from the [Elucidating the Design Space of D
[[autodoc]] EulerDiscreteScheduler
## EulerDiscreteSchedulerOutput
[[autodoc]] schedulers.scheduling_euler_discrete.EulerDiscreteSchedulerOutput
\ No newline at end of file
[[autodoc]] schedulers.scheduling_euler_discrete.EulerDiscreteSchedulerOutput
......@@ -18,4 +18,4 @@ A scheduler that uses ancestral sampling with Euler method steps. This is a fast
[[autodoc]] EulerAncestralDiscreteScheduler
## EulerAncestralDiscreteSchedulerOutput
[[autodoc]] schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteSchedulerOutput
\ No newline at end of file
[[autodoc]] schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteSchedulerOutput
......@@ -18,4 +18,4 @@ The Heun scheduler (Algorithm 1) is from the [Elucidating the Design Space of Di
[[autodoc]] HeunDiscreteScheduler
## SchedulerOutput
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
\ No newline at end of file
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
......@@ -18,4 +18,4 @@ specific language governing permissions and limitations under the License.
[[autodoc]] IPNDMScheduler
## SchedulerOutput
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
\ No newline at end of file
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Latent Consistency Model Multistep Scheduler
## Overview
......
......@@ -18,4 +18,4 @@ specific language governing permissions and limitations under the License.
[[autodoc]] LMSDiscreteScheduler
## LMSDiscreteSchedulerOutput
[[autodoc]] schedulers.scheduling_lms_discrete.LMSDiscreteSchedulerOutput
\ No newline at end of file
[[autodoc]] schedulers.scheduling_lms_discrete.LMSDiscreteSchedulerOutput
......@@ -21,7 +21,7 @@ samples, and it can generate quite good samples even in 10 steps.
It is recommended to set `solver_order` to 2 for guide sampling, and `solver_order=3` for unconditional sampling.
Dynamic thresholding from Imagen (https://huggingface.co/papers/2205.11487) is supported, and for pixel-space
Dynamic thresholding from [Imagen](https://huggingface.co/papers/2205.11487) is supported, and for pixel-space
diffusion models, you can set both `algorithm_type="dpmsolver++"` and `thresholding=True` to use the dynamic
thresholding. This thresholding method is unsuitable for latent-space diffusion models such as
Stable Diffusion.
......@@ -32,4 +32,4 @@ The SDE variant of DPMSolver and DPM-Solver++ is also supported, but only for th
[[autodoc]] DPMSolverMultistepScheduler
## SchedulerOutput
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
\ No newline at end of file
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
......@@ -14,11 +14,11 @@ specific language governing permissions and limitations under the License.
`DPMSolverMultistepInverse` is the inverted scheduler from [DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps](https://huggingface.co/papers/2206.00927) and [DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models](https://huggingface.co/papers/2211.01095) by Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu.
The implementation is mostly based on the DDIM inversion definition of [Null-text Inversion for Editing Real Images using Guided Diffusion Models](https://huggingface.co/papers/2211.09794.pdf) and notebook implementation of the [`DiffEdit`] latent inversion from [Xiang-cd/DiffEdit-stable-diffusion](https://github.com/Xiang-cd/DiffEdit-stable-diffusion/blob/main/diffedit.ipynb).
The implementation is mostly based on the DDIM inversion definition of [Null-text Inversion for Editing Real Images using Guided Diffusion Models](https://huggingface.co/papers/2211.09794) and notebook implementation of the [`DiffEdit`] latent inversion from [Xiang-cd/DiffEdit-stable-diffusion](https://github.com/Xiang-cd/DiffEdit-stable-diffusion/blob/main/diffedit.ipynb).
## Tips
Dynamic thresholding from Imagen (https://huggingface.co/papers/2205.11487) is supported, and for pixel-space
Dynamic thresholding from [Imagen](https://huggingface.co/papers/2205.11487) is supported, and for pixel-space
diffusion models, you can set both `algorithm_type="dpmsolver++"` and `thresholding=True` to use the dynamic
thresholding. This thresholding method is unsuitable for latent-space diffusion models such as
Stable Diffusion.
......
......@@ -61,4 +61,4 @@ The different schedulers in this class, depending on the ordinary differential e
## PushToHubMixin
[[autodoc]] utils.PushToHubMixin
\ No newline at end of file
[[autodoc]] utils.PushToHubMixin
......@@ -18,4 +18,4 @@ specific language governing permissions and limitations under the License.
[[autodoc]] PNDMScheduler
## SchedulerOutput
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
\ No newline at end of file
[[autodoc]] schedulers.scheduling_utils.SchedulerOutput
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment