Unverified Commit cc5b31ff authored by Steven Liu's avatar Steven Liu Committed by GitHub
Browse files

[docs] Migrate syntax (#12390)

* change syntax

* make style
parent d7a1a036
...@@ -18,13 +18,10 @@ specific language governing permissions and limitations under the License. ...@@ -18,13 +18,10 @@ specific language governing permissions and limitations under the License.
The Stable Diffusion upscaler diffusion model was created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), and [LAION](https://laion.ai/). It is used to enhance the resolution of input images by a factor of 4. The Stable Diffusion upscaler diffusion model was created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), and [LAION](https://laion.ai/). It is used to enhance the resolution of input images by a factor of 4.
<Tip> > [!TIP]
> Make sure to check out the Stable Diffusion [Tips](overview#tips) section to learn how to explore the tradeoff between scheduler speed and quality, and how to reuse pipeline components efficiently!
Make sure to check out the Stable Diffusion [Tips](overview#tips) section to learn how to explore the tradeoff between scheduler speed and quality, and how to reuse pipeline components efficiently! >
> If you're interested in using one of the official checkpoints for a task, explore the [CompVis](https://huggingface.co/CompVis), [Runway](https://huggingface.co/runwayml), and [Stability AI](https://huggingface.co/stabilityai) Hub organizations!
If you're interested in using one of the official checkpoints for a task, explore the [CompVis](https://huggingface.co/CompVis), [Runway](https://huggingface.co/runwayml), and [Stability AI](https://huggingface.co/stabilityai) Hub organizations!
</Tip>
## StableDiffusionUpscalePipeline ## StableDiffusionUpscalePipeline
......
...@@ -65,11 +65,8 @@ wave_prompt = "dramatic wave, the Oceans roar, Strong wave spiral across the oce ...@@ -65,11 +65,8 @@ wave_prompt = "dramatic wave, the Oceans roar, Strong wave spiral across the oce
image = pipe(prompt=wave_prompt).images[0] image = pipe(prompt=wave_prompt).images[0]
image image
``` ```
<Tip warning={true}> > [!WARNING]
> For text-to-image we use `stabilityai/stable-diffusion-2-1-unclip-small` as it was trained on CLIP ViT-L/14 embedding, the same as the Karlo model prior. [stabilityai/stable-diffusion-2-1-unclip](https://hf.co/stabilityai/stable-diffusion-2-1-unclip) was trained on OpenCLIP ViT-H, so we don't recommend its use.
For text-to-image we use `stabilityai/stable-diffusion-2-1-unclip-small` as it was trained on CLIP ViT-L/14 embedding, the same as the Karlo model prior. [stabilityai/stable-diffusion-2-1-unclip](https://hf.co/stabilityai/stable-diffusion-2-1-unclip) was trained on OpenCLIP ViT-H, so we don't recommend its use.
</Tip>
### Text guided Image-to-Image Variation ### Text guided Image-to-Image Variation
...@@ -99,11 +96,8 @@ image = pipe(init_image, prompt=prompt).images[0] ...@@ -99,11 +96,8 @@ image = pipe(init_image, prompt=prompt).images[0]
image image
``` ```
<Tip> > [!TIP]
> Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## StableUnCLIPPipeline ## StableUnCLIPPipeline
......
...@@ -174,11 +174,8 @@ Video generation is memory-intensive and one way to reduce your memory usage is ...@@ -174,11 +174,8 @@ Video generation is memory-intensive and one way to reduce your memory usage is
Check out the [Text or image-to-video](text-img2vid) guide for more details about how certain parameters can affect video generation and how to optimize inference by reducing memory usage. Check out the [Text or image-to-video](text-img2vid) guide for more details about how certain parameters can affect video generation and how to optimize inference by reducing memory usage.
<Tip> > [!TIP]
> Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## TextToVideoSDPipeline ## TextToVideoSDPipeline
[[autodoc]] TextToVideoSDPipeline [[autodoc]] TextToVideoSDPipeline
......
...@@ -289,11 +289,8 @@ can run with custom [DreamBooth](../../training/dreambooth) models, as shown bel ...@@ -289,11 +289,8 @@ can run with custom [DreamBooth](../../training/dreambooth) models, as shown bel
You can filter out some available DreamBooth-trained models with [this link](https://huggingface.co/models?search=dreambooth). You can filter out some available DreamBooth-trained models with [this link](https://huggingface.co/models?search=dreambooth).
<Tip> > [!TIP]
> Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## TextToVideoZeroPipeline ## TextToVideoZeroPipeline
[[autodoc]] TextToVideoZeroPipeline [[autodoc]] TextToVideoZeroPipeline
......
...@@ -20,11 +20,8 @@ The abstract from the paper is following: ...@@ -20,11 +20,8 @@ The abstract from the paper is following:
You can find lucidrains' DALL-E 2 recreation at [lucidrains/DALLE2-pytorch](https://github.com/lucidrains/DALLE2-pytorch). You can find lucidrains' DALL-E 2 recreation at [lucidrains/DALLE2-pytorch](https://github.com/lucidrains/DALLE2-pytorch).
<Tip> > [!TIP]
> Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## UnCLIPPipeline ## UnCLIPPipeline
[[autodoc]] UnCLIPPipeline [[autodoc]] UnCLIPPipeline
......
...@@ -27,11 +27,8 @@ The abstract from the paper is: ...@@ -27,11 +27,8 @@ The abstract from the paper is:
You can find the original codebase at [thu-ml/unidiffuser](https://github.com/thu-ml/unidiffuser) and additional checkpoints at [thu-ml](https://huggingface.co/thu-ml). You can find the original codebase at [thu-ml/unidiffuser](https://github.com/thu-ml/unidiffuser) and additional checkpoints at [thu-ml](https://huggingface.co/thu-ml).
<Tip warning={true}> > [!WARNING]
> There is currently an issue on PyTorch 1.X where the output images are all black or the pixel values become `NaNs`. This issue can be mitigated by switching to PyTorch 2.X.
There is currently an issue on PyTorch 1.X where the output images are all black or the pixel values become `NaNs`. This issue can be mitigated by switching to PyTorch 2.X.
</Tip>
This pipeline was contributed by [dg845](https://github.com/dg845). ❤️ This pipeline was contributed by [dg845](https://github.com/dg845). ❤️
...@@ -197,11 +194,8 @@ final_prompt = sample.text[0] ...@@ -197,11 +194,8 @@ final_prompt = sample.text[0]
print(final_prompt) print(final_prompt)
``` ```
<Tip> > [!TIP]
> Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## UniDiffuserPipeline ## UniDiffuserPipeline
[[autodoc]] UniDiffuserPipeline [[autodoc]] UniDiffuserPipeline
......
...@@ -12,11 +12,8 @@ specific language governing permissions and limitations under the License. ...@@ -12,11 +12,8 @@ specific language governing permissions and limitations under the License.
# Value-guided planning # Value-guided planning
<Tip warning={true}> > [!WARNING]
> 🧪 This is an experimental pipeline for reinforcement learning!
🧪 This is an experimental pipeline for reinforcement learning!
</Tip>
This pipeline is based on the [Planning with Diffusion for Flexible Behavior Synthesis](https://huggingface.co/papers/2205.09991) paper by Michael Janner, Yilun Du, Joshua B. Tenenbaum, Sergey Levine. This pipeline is based on the [Planning with Diffusion for Flexible Behavior Synthesis](https://huggingface.co/papers/2205.09991) paper by Michael Janner, Yilun Du, Joshua B. Tenenbaum, Sergey Levine.
...@@ -28,11 +25,8 @@ You can find additional information about the model on the [project page](https: ...@@ -28,11 +25,8 @@ You can find additional information about the model on the [project page](https:
The script to run the model is available [here](https://github.com/huggingface/diffusers/tree/main/examples/reinforcement_learning). The script to run the model is available [here](https://github.com/huggingface/diffusers/tree/main/examples/reinforcement_learning).
<Tip> > [!TIP]
> Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## ValueGuidedRLPipeline ## ValueGuidedRLPipeline
[[autodoc]] diffusers.experimental.ValueGuidedRLPipeline [[autodoc]] diffusers.experimental.ValueGuidedRLPipeline
...@@ -15,11 +15,8 @@ specific language governing permissions and limitations under the License. ...@@ -15,11 +15,8 @@ specific language governing permissions and limitations under the License.
Quantization techniques reduce memory and computational costs by representing weights and activations with lower-precision data types like 8-bit integers (int8). This enables loading larger models you normally wouldn't be able to fit into memory, and speeding up inference. Quantization techniques reduce memory and computational costs by representing weights and activations with lower-precision data types like 8-bit integers (int8). This enables loading larger models you normally wouldn't be able to fit into memory, and speeding up inference.
<Tip> > [!TIP]
> Learn how to quantize models in the [Quantization](../quantization/overview) guide.
Learn how to quantize models in the [Quantization](../quantization/overview) guide.
</Tip>
## PipelineQuantizationConfig ## PipelineQuantizationConfig
......
...@@ -28,11 +28,8 @@ The original codebase of this paper can be found at [ermongroup/ddim](https://gi ...@@ -28,11 +28,8 @@ The original codebase of this paper can be found at [ermongroup/ddim](https://gi
The paper [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) claims that a mismatch between the training and inference settings leads to suboptimal inference generation results for Stable Diffusion. To fix this, the authors propose: The paper [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) claims that a mismatch between the training and inference settings leads to suboptimal inference generation results for Stable Diffusion. To fix this, the authors propose:
<Tip warning={true}> > [!WARNING]
> 🧪 This is an experimental feature!
🧪 This is an experimental feature!
</Tip>
1. rescale the noise schedule to enforce zero terminal signal-to-noise ratio (SNR) 1. rescale the noise schedule to enforce zero terminal signal-to-noise ratio (SNR)
......
...@@ -18,11 +18,8 @@ The abstract from the paper is: ...@@ -18,11 +18,8 @@ The abstract from the paper is:
*Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.* *Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.*
<Tip warning={true}> > [!WARNING]
> 🚧 This scheduler is under construction!
🚧 This scheduler is under construction!
</Tip>
## ScoreSdeVpScheduler ## ScoreSdeVpScheduler
[[autodoc]] schedulers.deprecated.scheduling_sde_vp.ScoreSdeVpScheduler [[autodoc]] schedulers.deprecated.scheduling_sde_vp.ScoreSdeVpScheduler
...@@ -104,13 +104,10 @@ We can also set `num_images_per_prompt` accordingly to compare different images ...@@ -104,13 +104,10 @@ We can also set `num_images_per_prompt` accordingly to compare different images
Once several images are generated from all the prompts using multiple models (under evaluation), these results are presented to human evaluators for scoring. For Once several images are generated from all the prompts using multiple models (under evaluation), these results are presented to human evaluators for scoring. For
more details on the DrawBench and PartiPrompts benchmarks, refer to their respective papers. more details on the DrawBench and PartiPrompts benchmarks, refer to their respective papers.
<Tip> > [!TIP]
> It is useful to look at some inference samples while a model is training to measure the
It is useful to look at some inference samples while a model is training to measure the > training progress. In our [training scripts](https://github.com/huggingface/diffusers/tree/main/examples/), we support this utility with additional support for
training progress. In our [training scripts](https://github.com/huggingface/diffusers/tree/main/examples/), we support this utility with additional support for > logging to TensorBoard and Weights & Biases.
logging to TensorBoard and Weights & Biases.
</Tip>
## Quantitative Evaluation ## Quantitative Evaluation
...@@ -205,14 +202,11 @@ print(f"CLIP Score with v-1-5: {sd_clip_score_1_5}") ...@@ -205,14 +202,11 @@ print(f"CLIP Score with v-1-5: {sd_clip_score_1_5}")
It seems like the [v1-5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) checkpoint performs better than its predecessor. Note, however, that the number of prompts we used to compute the CLIP scores is quite low. For a more practical evaluation, this number should be way higher, and the prompts should be diverse. It seems like the [v1-5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) checkpoint performs better than its predecessor. Note, however, that the number of prompts we used to compute the CLIP scores is quite low. For a more practical evaluation, this number should be way higher, and the prompts should be diverse.
<Tip warning={true}> > [!WARNING]
> By construction, there are some limitations in this score. The captions in the training dataset
By construction, there are some limitations in this score. The captions in the training dataset > were crawled from the web and extracted from `alt` and similar tags associated an image on the internet.
were crawled from the web and extracted from `alt` and similar tags associated an image on the internet. > They are not necessarily representative of what a human being would use to describe an image. Hence we
They are not necessarily representative of what a human being would use to describe an image. Hence we > had to "engineer" some prompts here.
had to "engineer" some prompts here.
</Tip>
### Image-conditioned text-to-image generation ### Image-conditioned text-to-image generation
...@@ -421,11 +415,8 @@ We can extend the idea of this metric to measure how similar the original image ...@@ -421,11 +415,8 @@ We can extend the idea of this metric to measure how similar the original image
We can use these metrics for similar pipelines such as the [`StableDiffusionPix2PixZeroPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pix2pix_zero#diffusers.StableDiffusionPix2PixZeroPipeline). We can use these metrics for similar pipelines such as the [`StableDiffusionPix2PixZeroPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pix2pix_zero#diffusers.StableDiffusionPix2PixZeroPipeline).
<Tip> > [!TIP]
> Both CLIP score and CLIP direction similarity rely on the CLIP model, which can make the evaluations biased.
Both CLIP score and CLIP direction similarity rely on the CLIP model, which can make the evaluations biased.
</Tip>
***Extending metrics like IS, FID (discussed later), or KID can be difficult*** when the model under evaluation was pre-trained on a large image-captioning dataset (such as the [LAION-5B dataset](https://laion.ai/blog/laion-5b/)). This is because underlying these metrics is an InceptionNet (pre-trained on the ImageNet-1k dataset) used for extracting intermediate image features. The pre-training dataset of Stable Diffusion may have limited overlap with the pre-training dataset of InceptionNet, so it is not a good candidate here for feature extraction. ***Extending metrics like IS, FID (discussed later), or KID can be difficult*** when the model under evaluation was pre-trained on a large image-captioning dataset (such as the [LAION-5B dataset](https://laion.ai/blog/laion-5b/)). This is because underlying these metrics is an InceptionNet (pre-trained on the ImageNet-1k dataset) used for extracting intermediate image features. The pre-training dataset of Stable Diffusion may have limited overlap with the pre-training dataset of InceptionNet, so it is not a good candidate here for feature extraction.
...@@ -554,21 +545,18 @@ The lower the FID, the better it is. Several things can influence FID here: ...@@ -554,21 +545,18 @@ The lower the FID, the better it is. Several things can influence FID here:
For the last two points, it is, therefore, a good practice to run the evaluation across different seeds and inference steps, and then report an average result. For the last two points, it is, therefore, a good practice to run the evaluation across different seeds and inference steps, and then report an average result.
<Tip warning={true}> > [!WARNING]
> FID results tend to be fragile as they depend on a lot of factors:
FID results tend to be fragile as they depend on a lot of factors: >
> * The specific Inception model used during computation.
* The specific Inception model used during computation. > * The implementation accuracy of the computation.
* The implementation accuracy of the computation. > * The image format (not the same if we start from PNGs vs JPGs).
* The image format (not the same if we start from PNGs vs JPGs). >
> Keeping that in mind, FID is often most useful when comparing similar runs, but it is
Keeping that in mind, FID is often most useful when comparing similar runs, but it is > hard to reproduce paper results unless the authors carefully disclose the FID
hard to reproduce paper results unless the authors carefully disclose the FID > measurement code.
measurement code. >
> These points apply to other related metrics too, such as KID and IS.
These points apply to other related metrics too, such as KID and IS.
</Tip>
As a final step, let's visually inspect the `fake_images`. As a final step, let's visually inspect the `fake_images`.
......
...@@ -16,11 +16,8 @@ specific language governing permissions and limitations under the License. ...@@ -16,11 +16,8 @@ specific language governing permissions and limitations under the License.
Core ML models can leverage all the compute engines available in Apple devices: the CPU, the GPU, and the Apple Neural Engine (or ANE, a tensor-optimized accelerator available in Apple Silicon Macs and modern iPhones/iPads). Depending on the model and the device it's running on, Core ML can mix and match compute engines too, so some portions of the model may run on the CPU while others run on GPU, for example. Core ML models can leverage all the compute engines available in Apple devices: the CPU, the GPU, and the Apple Neural Engine (or ANE, a tensor-optimized accelerator available in Apple Silicon Macs and modern iPhones/iPads). Depending on the model and the device it's running on, Core ML can mix and match compute engines too, so some portions of the model may run on the CPU while others run on GPU, for example.
<Tip> > [!TIP]
> You can also run the `diffusers` Python codebase on Apple Silicon Macs using the `mps` accelerator built into PyTorch. This approach is explained in depth in [the mps guide](mps), but it is not compatible with native apps.
You can also run the `diffusers` Python codebase on Apple Silicon Macs using the `mps` accelerator built into PyTorch. This approach is explained in depth in [the mps guide](mps), but it is not compatible with native apps.
</Tip>
## Stable Diffusion Core ML Checkpoints ## Stable Diffusion Core ML Checkpoints
......
...@@ -239,11 +239,8 @@ The `step()` function is [called](https://github.com/huggingface/diffusers/blob/ ...@@ -239,11 +239,8 @@ The `step()` function is [called](https://github.com/huggingface/diffusers/blob/
In general, the `sigmas` should [stay on the CPU](https://github.com/huggingface/diffusers/blob/35a969d297cba69110d175ee79c59312b9f49e1e/src/diffusers/schedulers/scheduling_euler_discrete.py#L240) to avoid the communication sync and latency. In general, the `sigmas` should [stay on the CPU](https://github.com/huggingface/diffusers/blob/35a969d297cba69110d175ee79c59312b9f49e1e/src/diffusers/schedulers/scheduling_euler_discrete.py#L240) to avoid the communication sync and latency.
<Tip> > [!TIP]
> Refer to the [torch.compile and Diffusers: A Hands-On Guide to Peak Performance](https://pytorch.org/blog/torch-compile-and-diffusers-a-hands-on-guide-to-peak-performance/) blog post for maximizing performance with `torch.compile` for diffusion models.
Refer to the [torch.compile and Diffusers: A Hands-On Guide to Peak Performance](https://pytorch.org/blog/torch-compile-and-diffusers-a-hands-on-guide-to-peak-performance/) blog post for maximizing performance with `torch.compile` for diffusion models.
</Tip>
### Benchmarks ### Benchmarks
......
...@@ -38,11 +38,8 @@ image = pipe(prompt).images[0] ...@@ -38,11 +38,8 @@ image = pipe(prompt).images[0]
image image
``` ```
<Tip warning={true}> > [!WARNING]
> The PyTorch [mps](https://pytorch.org/docs/stable/notes/mps.html) backend does not support NDArray sizes greater than `2**32`. Please open an [Issue](https://github.com/huggingface/diffusers/issues/new/choose) if you encounter this problem so we can investigate.
The PyTorch [mps](https://pytorch.org/docs/stable/notes/mps.html) backend does not support NDArray sizes greater than `2**32`. Please open an [Issue](https://github.com/huggingface/diffusers/issues/new/choose) if you encounter this problem so we can investigate.
</Tip>
If you're using **PyTorch 1.13**, you need to "prime" the pipeline with an additional one-time pass through it. This is a temporary workaround for an issue where the first inference pass produces slightly different results than subsequent ones. You only need to do this pass once, and after just one inference step you can discard the result. If you're using **PyTorch 1.13**, you need to "prime" the pipeline with an additional one-time pass through it. This is a temporary workaround for an issue where the first inference pass produces slightly different results than subsequent ones. You only need to do this pass once, and after just one inference step you can discard the result.
......
...@@ -20,11 +20,8 @@ Diffusers functionalities are available on [AWS Inf2 instances](https://aws.amaz ...@@ -20,11 +20,8 @@ Diffusers functionalities are available on [AWS Inf2 instances](https://aws.amaz
python -m pip install --upgrade-strategy eager optimum[neuronx] python -m pip install --upgrade-strategy eager optimum[neuronx]
``` ```
<Tip> > [!TIP]
> We provide pre-built [Hugging Face Neuron Deep Learning AMI](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2) (DLAMI) and Optimum Neuron containers for Amazon SageMaker. It's recommended to correctly set up your environment.
We provide pre-built [Hugging Face Neuron Deep Learning AMI](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2) (DLAMI) and Optimum Neuron containers for Amazon SageMaker. It's recommended to correctly set up your environment.
</Tip>
The example below demonstrates how to generate images with the Stable Diffusion XL model on an inf2.8xlarge instance (you can switch to cheaper inf2.xlarge instances once the model is compiled). To generate some images, use the [`~optimum.neuron.NeuronStableDiffusionXLPipeline`] class, which is similar to the [`StableDiffusionXLPipeline`] class in Diffusers. The example below demonstrates how to generate images with the Stable Diffusion XL model on an inf2.8xlarge instance (you can switch to cheaper inf2.xlarge instances once the model is compiled). To generate some images, use the [`~optimum.neuron.NeuronStableDiffusionXLPipeline`] class, which is similar to the [`StableDiffusionXLPipeline`] class in Diffusers.
......
...@@ -34,11 +34,8 @@ image = pipeline(prompt).images[0] ...@@ -34,11 +34,8 @@ image = pipeline(prompt).images[0]
pipeline.save_pretrained("./onnx-stable-diffusion-v1-5") pipeline.save_pretrained("./onnx-stable-diffusion-v1-5")
``` ```
<Tip warning={true}> > [!WARNING]
> Generating multiple prompts in a batch seems to take too much memory. While we look into it, you may need to iterate instead of batching.
Generating multiple prompts in a batch seems to take too much memory. While we look into it, you may need to iterate instead of batching.
</Tip>
To export the pipeline in the ONNX format offline and use it later for inference, To export the pipeline in the ONNX format offline and use it later for inference,
use the [`optimum-cli export`](https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli) command: use the [`optimum-cli export`](https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli) command:
......
...@@ -20,16 +20,10 @@ Install xFormers from `pip`: ...@@ -20,16 +20,10 @@ Install xFormers from `pip`:
pip install xformers pip install xformers
``` ```
<Tip> > [!TIP]
> The xFormers `pip` package requires the latest version of PyTorch. If you need to use a previous version of PyTorch, then we recommend [installing xFormers from the source](https://github.com/facebookresearch/xformers#installing-xformers).
The xFormers `pip` package requires the latest version of PyTorch. If you need to use a previous version of PyTorch, then we recommend [installing xFormers from the source](https://github.com/facebookresearch/xformers#installing-xformers).
</Tip>
After xFormers is installed, you can use `enable_xformers_memory_efficient_attention()` for faster inference and reduced memory consumption as shown in this [section](memory#memory-efficient-attention). After xFormers is installed, you can use `enable_xformers_memory_efficient_attention()` for faster inference and reduced memory consumption as shown in this [section](memory#memory-efficient-attention).
<Tip warning={true}> > [!WARNING]
> According to this [issue](https://github.com/huggingface/diffusers/issues/2234#issuecomment-1416931212), xFormers `v0.0.16` cannot be used for training (fine-tune or DreamBooth) in some GPUs. If you observe this problem, please install a development version as indicated in the issue comments.
According to this [issue](https://github.com/huggingface/diffusers/issues/2234#issuecomment-1416931212), xFormers `v0.0.16` cannot be used for training (fine-tune or DreamBooth) in some GPUs. If you observe this problem, please install a development version as indicated in the issue comments.
</Tip>
...@@ -206,11 +206,8 @@ Once a model is quantized, you can push the model to the Hub with the [`~ModelMi ...@@ -206,11 +206,8 @@ Once a model is quantized, you can push the model to the Hub with the [`~ModelMi
</hfoption> </hfoption>
</hfoptions> </hfoptions>
<Tip warning={true}> > [!WARNING]
> Training with 8-bit and 4-bit weights are only supported for training *extra* parameters.
Training with 8-bit and 4-bit weights are only supported for training *extra* parameters.
</Tip>
Check your memory footprint with the `get_memory_footprint` method: Check your memory footprint with the `get_memory_footprint` method:
...@@ -234,11 +231,8 @@ model_4bit = AutoModel.from_pretrained( ...@@ -234,11 +231,8 @@ model_4bit = AutoModel.from_pretrained(
## 8-bit (LLM.int8() algorithm) ## 8-bit (LLM.int8() algorithm)
<Tip> > [!TIP]
> Learn more about the details of 8-bit quantization in this [blog post](https://huggingface.co/blog/hf-bitsandbytes-integration)!
Learn more about the details of 8-bit quantization in this [blog post](https://huggingface.co/blog/hf-bitsandbytes-integration)!
</Tip>
This section explores some of the specific features of 8-bit models, such as outlier thresholds and skipping module conversion. This section explores some of the specific features of 8-bit models, such as outlier thresholds and skipping module conversion.
...@@ -283,11 +277,8 @@ model_8bit = SD3Transformer2DModel.from_pretrained( ...@@ -283,11 +277,8 @@ model_8bit = SD3Transformer2DModel.from_pretrained(
## 4-bit (QLoRA algorithm) ## 4-bit (QLoRA algorithm)
<Tip> > [!TIP]
> Learn more about its details in this [blog post](https://huggingface.co/blog/4bit-transformers-bitsandbytes).
Learn more about its details in this [blog post](https://huggingface.co/blog/4bit-transformers-bitsandbytes).
</Tip>
This section explores some of the specific features of 4-bit models, such as changing the compute data type, using the Normal Float 4 (NF4) data type, and using nested quantization. This section explores some of the specific features of 4-bit models, such as changing the compute data type, using the Normal Float 4 (NF4) data type, and using nested quantization.
......
...@@ -33,11 +33,8 @@ cd examples/controlnet ...@@ -33,11 +33,8 @@ cd examples/controlnet
pip install -r requirements.txt pip install -r requirements.txt
``` ```
<Tip> > [!TIP]
> 🤗 Accelerate is a library for helping you train on multiple GPUs/TPUs or with mixed-precision. It'll automatically configure your training setup based on your hardware and environment. Take a look at the 🤗 Accelerate [Quick tour](https://huggingface.co/docs/accelerate/quicktour) to learn more.
🤗 Accelerate is a library for helping you train on multiple GPUs/TPUs or with mixed-precision. It'll automatically configure your training setup based on your hardware and environment. Take a look at the 🤗 Accelerate [Quick tour](https://huggingface.co/docs/accelerate/quicktour) to learn more.
</Tip>
Initialize an 🤗 Accelerate environment: Initialize an 🤗 Accelerate environment:
...@@ -61,11 +58,8 @@ write_basic_config() ...@@ -61,11 +58,8 @@ write_basic_config()
Lastly, if you want to train a model on your own dataset, take a look at the [Create a dataset for training](create_dataset) guide to learn how to create a dataset that works with the training script. Lastly, if you want to train a model on your own dataset, take a look at the [Create a dataset for training](create_dataset) guide to learn how to create a dataset that works with the training script.
<Tip> > [!TIP]
> The following sections highlight parts of the training script that are important for understanding how to modify it, but it doesn't cover every aspect of the script in detail. If you're interested in learning more, feel free to read through the [script](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/train_controlnet.py) and let us know if you have any questions or concerns.
The following sections highlight parts of the training script that are important for understanding how to modify it, but it doesn't cover every aspect of the script in detail. If you're interested in learning more, feel free to read through the [script](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/train_controlnet.py) and let us know if you have any questions or concerns.
</Tip>
## Script parameters ## Script parameters
...@@ -100,11 +94,8 @@ As with the script parameters, a general walkthrough of the training script is p ...@@ -100,11 +94,8 @@ As with the script parameters, a general walkthrough of the training script is p
The training script has a [`make_train_dataset`](https://github.com/huggingface/diffusers/blob/64603389da01082055a901f2883c4810d1144edb/examples/controlnet/train_controlnet.py#L582) function for preprocessing the dataset with image transforms and caption tokenization. You'll see that in addition to the usual caption tokenization and image transforms, the script also includes transforms for the conditioning image. The training script has a [`make_train_dataset`](https://github.com/huggingface/diffusers/blob/64603389da01082055a901f2883c4810d1144edb/examples/controlnet/train_controlnet.py#L582) function for preprocessing the dataset with image transforms and caption tokenization. You'll see that in addition to the usual caption tokenization and image transforms, the script also includes transforms for the conditioning image.
<Tip> > [!TIP]
> If you're streaming a dataset on a TPU, performance may be bottlenecked by the 🤗 Datasets library which is not optimized for images. To ensure maximum throughput, you're encouraged to explore other dataset formats like [WebDataset](https://webdataset.github.io/webdataset/), [TorchData](https://github.com/pytorch/data), and [TensorFlow Datasets](https://www.tensorflow.org/datasets/tfless_tfds).
If you're streaming a dataset on a TPU, performance may be bottlenecked by the 🤗 Datasets library which is not optimized for images. To ensure maximum throughput, you're encouraged to explore other dataset formats like [WebDataset](https://webdataset.github.io/webdataset/), [TorchData](https://github.com/pytorch/data), and [TensorFlow Datasets](https://www.tensorflow.org/datasets/tfless_tfds).
</Tip>
```py ```py
conditioning_image_transforms = transforms.Compose( conditioning_image_transforms = transforms.Compose(
......
...@@ -7,11 +7,8 @@ This guide will show you two ways to create a dataset to finetune on: ...@@ -7,11 +7,8 @@ This guide will show you two ways to create a dataset to finetune on:
- provide a folder of images to the `--train_data_dir` argument - provide a folder of images to the `--train_data_dir` argument
- upload a dataset to the Hub and pass the dataset repository id to the `--dataset_name` argument - upload a dataset to the Hub and pass the dataset repository id to the `--dataset_name` argument
<Tip> > [!TIP]
> 💡 Learn more about how to create an image dataset for training in the [Create an image dataset](https://huggingface.co/docs/datasets/image_dataset) guide.
💡 Learn more about how to create an image dataset for training in the [Create an image dataset](https://huggingface.co/docs/datasets/image_dataset) guide.
</Tip>
## Provide a dataset as a folder ## Provide a dataset as a folder
...@@ -33,11 +30,8 @@ accelerate launch train_unconditional.py \ ...@@ -33,11 +30,8 @@ accelerate launch train_unconditional.py \
## Upload your data to the Hub ## Upload your data to the Hub
<Tip> > [!TIP]
> 💡 For more details and context about creating and uploading a dataset to the Hub, take a look at the [Image search with 🤗 Datasets](https://huggingface.co/blog/image-search-datasets) post.
💡 For more details and context about creating and uploading a dataset to the Hub, take a look at the [Image search with 🤗 Datasets](https://huggingface.co/blog/image-search-datasets) post.
</Tip>
Start by creating a dataset with the [`ImageFolder`](https://huggingface.co/docs/datasets/image_load#imagefolder) feature, which creates an `image` column containing the PIL-encoded images. Start by creating a dataset with the [`ImageFolder`](https://huggingface.co/docs/datasets/image_load#imagefolder) feature, which creates an `image` column containing the PIL-encoded images.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment