[docs] SDXL (#4428)

* first draft * reorg toctree * note about minsdxl * feedback * fix * micro-conditionings * add tip * fix section levels * d'oh fix pipeline names * feedback * remove old section

[docs] SDXL (#4428)
* first draft * reorg toctree * note about minsdxl * feedback * fix * micro-conditionings * add tip * fix section levels * d'oh fix pipeline names * feedback * remove old section
a1fdfca3 · Steven Liu · GitHub · d1e20be6 · a1fdfca3 · a1fdfca3
Unverified Commit a1fdfca3 authored Aug 30, 2023 by Steven Liu Committed by GitHub Aug 30, 2023
4 changed files
--- a/docs/source/en/_toctree.yml
+++ b/docs/source/en/_toctree.yml
@@ -36,38 +36,42 @@
      title: Push files to the Hub
    title: Loading & Hub
  - sections:
-    - local: using-diffusers/pipeline_overview
-      title: Overview
    - local: using-diffusers/unconditional_image_generation
      title: Unconditional image generation
    - local: using-diffusers/conditional_image_generation
-      title: Text-to-image generation
+      title: Text-to-image
    - local: using-diffusers/img2img
-      title: Text-guided image-to-image
+      title: Image-to-image
    - local: using-diffusers/inpaint
-      title: Text-guided image-inpainting
+      title: Inpainting
    - local: using-diffusers/depth2img
-      title: Text-guided depth-to-image
+      title: Depth-to-image
+    title: Tasks
+  - sections:
    - local: using-diffusers/textual_inversion_inference
      title: Textual inversion
    - local: training/distributed_inference
      title: Distributed inference with multiple GPUs
-    - local: using-diffusers/distilled_sd
-      title: Distilled Stable Diffusion inference
    - local: using-diffusers/reusing_seeds
      title: Improve image quality with deterministic generation
    - local: using-diffusers/control_brightness
      title: Control image brightness
+    - local: using-diffusers/weighted_prompts
+      title: Prompt weighting
+    title: Techniques
+  - sections:
+    - local: using-diffusers/pipeline_overview
+      title: Overview
+    - local: using-diffusers/sdxl
+      title: Stable Diffusion XL
+    - local: using-diffusers/distilled_sd
+      title: Distilled Stable Diffusion inference
    - local: using-diffusers/reproducibility
      title: Create reproducible pipelines
    - local: using-diffusers/custom_pipeline_examples
      title: Community pipelines
    - local: using-diffusers/contribute_pipeline
      title: How to contribute a community pipeline
-    - local: using-diffusers/stable_diffusion_jax_how_to
-      title: Stable Diffusion in JAX/Flax
-    - local: using-diffusers/weighted_prompts
-      title: Prompt weighting
    title: Pipelines for Inference
  - sections:
    - local: training/overview
@@ -105,6 +109,8 @@
    title: Memory and Speed
  - local: optimization/torch2.0
    title: Torch2.0 support
+  - local: using-diffusers/stable_diffusion_jax_how_to
+    title: Stable Diffusion in JAX/Flax
  - local: optimization/xformers
    title: xFormers
  - local: optimization/onnx

--- a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.md
+++ b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.md
@@ -10,414 +10,29 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 specific language governing permissions and limitations under the License.
 -->

-# Stable diffusion XL
+# Stable Diffusion XL

-Stable Diffusion XL was proposed in [SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis](https://arxiv.org/abs/2307.01952) by Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, Robin Rombach
+Stable Diffusion XL (SDXL) was proposed in [SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis](https://huggingface.co/papers/2307.01952) by Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach.

-The abstract of the paper is the following:
+The abstract from the paper is:

 *We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators.*

 ## Tips

- Stable Diffusion XL works especially well with images between 768 and 1024.
- Stable Diffusion XL can pass a different prompt for each of the text encoders it was trained on as shown below. We can even pass different parts of the same prompt to the text encoders.
- Stable Diffusion XL output image can be improved by making use of a refiner as shown below.
- One can make use of `negative_original_size`, `negative_crops_coords_top_left`, and `negative_target_size` to influence the generation process.
-
-### Available checkpoints:
-
- *Text-to-Image (1024x1024 resolution)*: [stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) with [`StableDiffusionXLPipeline`]
- *Image-to-Image / Refiner (1024x1024 resolution)*: [stabilityai/stable-diffusion-xl-refiner-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0) with [`StableDiffusionXLImg2ImgPipeline`]
-
-## Usage Example
-
-Before using SDXL make sure to have `transformers`, `accelerate`, `safetensors` and `invisible_watermark` installed. 
-You can install the libraries as follows:
-
-```
-pip install transformers
-pip install accelerate
-pip install safetensors
-```
-
-### Watermarker
-
-We recommend to add an invisible watermark to images generating by Stable Diffusion XL, this can help with identifying if an image is machine-synthesised for downstream applications. To do so, please install
-the [invisible-watermark library](https://pypi.org/project/invisible-watermark/) via:
-
-```
-pip install invisible-watermark>=0.2.0
-```
-
-If the `invisible-watermark` library is installed the watermarker will be used **by default**.
-
-If you have other provisions for generating or deploying images safely, you can disable the watermarker as follows:
-
-```py
-pipe = StableDiffusionXLPipeline.from_pretrained(..., add_watermarker=False)
-```
-
-### Text-to-Image
-
-You can use SDXL as follows for *text-to-image*:
-
-```py
-from diffusers import StableDiffusionXLPipeline
-import torch
-
-pipe = StableDiffusionXLPipeline.from_pretrained(
-    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
-)
-pipe.to("cuda")
-
-prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
-image = pipe(prompt=prompt).images[0]
-```
-
-You can additionally pass negative conditions about an image's size and position to avoid undesirable cropping behavior in the generated image, and improve image resolution. Let's take an example:
-
-```python
-from diffusers import StableDiffusionXLPipeline
-import torch
-
-pipe = StableDiffusionXLPipeline.from_pretrained(
-    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
-)
-pipe.to("cuda")
-
-prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
-image = pipe(
-    prompt=prompt,
-    negative_original_size=(512, 512),
-    negative_crops_coords_top_left=(0, 0),
-    negative_target_size=(1024, 1024),
-).images[0]
-```
-
-Here is a comparative example that shows the influence of using three `negative_original_size`s of
-(128, 128), (256, 256), and (512, 512) respectively:
-
-![](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/sd_xl/negative_conditions.png)
-
-<Tip>
-
-One can use these negative conditions in the other SDXL pipelines ([Image-To-Image](#image-to-image), [Inpainting](#inpainting), [ControlNet](../controlnet_sdxl.md)) too!
-
-</Tip>
-
-### Image-to-image 
-
-You can use SDXL as follows for *image-to-image*:
-
-```py 
-import torch
-from diffusers import StableDiffusionXLImg2ImgPipeline
-from diffusers.utils import load_image
-
-pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
-    "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
-)
-pipe = pipe.to("cuda")
-url = "https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/aa_xl/000000009.png"
-
-init_image = load_image(url).convert("RGB")
-prompt = "a photo of an astronaut riding a horse on mars"
-image = pipe(prompt, image=init_image).images[0]
-```
-
-### Inpainting
-
-You can use SDXL as follows for *inpainting*
-
-```py 
-import torch
-from diffusers import StableDiffusionXLInpaintPipeline
-from diffusers.utils import load_image
-
-pipe = StableDiffusionXLInpaintPipeline.from_pretrained(
-    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
-)
-pipe.to("cuda")
-
-img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
-mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
-
-init_image = load_image(img_url).convert("RGB")
-mask_image = load_image(mask_url).convert("RGB")
-
-prompt = "A majestic tiger sitting on a bench"
-image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, num_inference_steps=50, strength=0.80).images[0]
-```
-
-### Refining the image output
-
-In addition to the [base model checkpoint](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), 
-StableDiffusion-XL also includes a [refiner checkpoint](huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0)
-that is specialized in denoising low-noise stage images to generate images of improved high-frequency quality.
-This refiner checkpoint can be used as a "second-step" pipeline after having run the base checkpoint to improve
-image quality.
-
-When using the refiner, one can easily 
- 1.) employ the base model and refiner as an *Ensemble of Expert Denoisers* as first proposed in [eDiff-I](https://research.nvidia.com/labs/dir/eDiff-I/) or
- 2.) simply run the refiner in [SDEdit](https://arxiv.org/abs/2108.01073) fashion after the base model.
-
-**Note**: The idea of using SD-XL base & refiner as an ensemble of experts was first brought forward by 
-a couple community contributors which also helped shape the following `diffusers` implementation, namely:
- [SytanSD](https://github.com/SytanSD)
- [bghira](https://github.com/bghira)
- [Birch-san](https://github.com/Birch-san)
- [AmericanPresidentJimmyCarter](https://github.com/AmericanPresidentJimmyCarter)
-
-#### 1.) Ensemble of Expert Denoisers
-
-When using the base and refiner model as an ensemble of expert of denoisers, the base model should serve as the 
-expert for the high-noise diffusion stage and the refiner serves as the expert for the low-noise diffusion stage.
-
-The advantage of 1.) over 2.) is that it requires less overall denoising steps and therefore should be significantly
-faster. The drawback is that one cannot really inspect the output of the base model; it will still be heavily denoised.
-
-To use the base model and refiner as an ensemble of expert denoisers, make sure to define the span
-of timesteps which should be run through the high-noise denoising stage (*i.e.* the base model) and the low-noise
-denoising stage (*i.e.* the refiner model) respectively. We can set the intervals using the [`denoising_end`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_xl#diffusers.StableDiffusionXLPipeline.__call__.denoising_end) of the base model 
-and [`denoising_start`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_xl#diffusers.StableDiffusionXLImg2ImgPipeline.__call__.denoising_start) of the refiner model.
-
-For both `denoising_end` and `denoising_start` a float value between 0 and 1 should be passed.
-When passed, the end and start of denoising will be defined by proportions of discrete timesteps as
-defined by the model schedule.
-Note that this will override `strength` if it is also declared, since the number of denoising steps
-is determined by the discrete timesteps the model was trained on and the declared fractional cutoff.
-
-Let's look at an example.
-First, we import the two pipelines. Since the text encoders and variational autoencoder are the same
-you don't have to load those again for the refiner.
-
-```py
-from diffusers import DiffusionPipeline
-import torch
-
-base = DiffusionPipeline.from_pretrained(
-    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
-)
-base.to("cuda")
-
-refiner = DiffusionPipeline.from_pretrained(
-    "stabilityai/stable-diffusion-xl-refiner-1.0",
-    text_encoder_2=base.text_encoder_2,
-    vae=base.vae,
-    torch_dtype=torch.float16,
-    use_safetensors=True,
-    variant="fp16",
-)
-refiner.to("cuda")
-```
-
-Now we define the number of inference steps and the point at which the model shall be run through the 
-high-noise denoising stage (*i.e.* the base model).
-
-```py
-n_steps = 40
-high_noise_frac = 0.8
-```
-
-Stable Diffusion XL base is trained on timesteps 0-999 and Stable Diffusion XL refiner is finetuned
-from the base model on low noise timesteps 0-199 inclusive, so we use the base model for the first
-800 timesteps (high noise) and the refiner for the last 200 timesteps (low noise). Hence, `high_noise_frac`
-is set to 0.8, so that all steps 200-999 (the first 80% of denoising timesteps) are performed by the
-base model and steps 0-199 (the last 20% of denoising timesteps) are performed by the refiner model.
-
-Remember, the denoising process starts at **high value** (high noise) timesteps and ends at
-**low value** (low noise) timesteps.
-
-Let's run the two pipelines now. Make sure to set `denoising_end` and
-`denoising_start` to the same values and keep `num_inference_steps` constant. Also remember that
-the output of the base model should be in latent space:
-
-```py
-prompt = "A majestic lion jumping from a big stone at night"
-
-image = base(
-    prompt=prompt,
-    num_inference_steps=n_steps,
-    denoising_end=high_noise_frac,
-    output_type="latent",
-).images
-image = refiner(
-    prompt=prompt,
-    num_inference_steps=n_steps,
-    denoising_start=high_noise_frac,
-    image=image,
-).images[0]
-```
-
-Let's have a look at the images
-
-| Original Image | Ensemble of Denoisers Experts |
-|---|---|
-| ![lion_base_timesteps](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lion_base.png) | ![lion_refined_timesteps](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lion_refined.png)
-
-If we would have just run the base model on the same 40 steps, the image would have been arguably less detailed (e.g. the lion eyes and nose):
+- SDXL works especially well with images between 768 and 1024.
+- SDXL can pass a different prompt for each of the text encoders it was trained on. We can even pass different parts of the same prompt to the text encoders.
+- SDXL output images can be improved by making use of a refiner model in an image-to-image setting.
+- SDXL offers `negative_original_size`, `negative_crops_coords_top_left`, and `negative_target_size` to negatively condition the model on image resolution and cropping parameters.

 <Tip>

-The ensemble-of-experts method works well on all available schedulers!
-
-</Tip>
-
-#### 2.) Refining the image output from fully denoised base image
-
-In standard [`StableDiffusionImg2ImgPipeline`]-fashion, the fully-denoised image generated of the base model 
-can be further improved using the [refiner checkpoint](huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0).
-
-For this, you simply run the refiner as a normal image-to-image pipeline after the "base" text-to-image 
-pipeline. You can leave the outputs of the base model in latent space.
+To learn how to use SDXL for various tasks, how to optimize performance, and other usage examples, take a look at the [Stable Diffusion XL](/using-diffusers/sdxl) guide.

-```py
-from diffusers import DiffusionPipeline
-import torch
-
-pipe = DiffusionPipeline.from_pretrained(
-    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
-)
-pipe.to("cuda")
-
-refiner = DiffusionPipeline.from_pretrained(
-    "stabilityai/stable-diffusion-xl-refiner-1.0",
-    text_encoder_2=pipe.text_encoder_2,
-    vae=pipe.vae,
-    torch_dtype=torch.float16,
-    use_safetensors=True,
-    variant="fp16",
-)
-refiner.to("cuda")
-
-prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
-
-image = pipe(prompt=prompt, output_type="latent" if use_refiner else "pil").images[0]
-image = refiner(prompt=prompt, image=image[None, :]).images[0]
-```
-
-| Original Image | Refined Image |
-|---|---|
-| ![](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/sd_xl/init_image.png) | ![](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/sd_xl/refined_image.png) |
-
-<Tip>
-
-The refiner can also very well be used in an in-painting setting. To do so just make
-  sure you use the [`StableDiffusionXLInpaintPipeline`] classes as shown below
+Check out the [Stability AI](https://huggingface.co/stabilityai) Hub organization for the official base and refiner model checkpoints! 

 </Tip>

-To use the refiner for inpainting in the Ensemble of Expert Denoisers setting you can do the following:
-
-```py
-from diffusers import StableDiffusionXLInpaintPipeline
-from diffusers.utils import load_image
-
-pipe = StableDiffusionXLInpaintPipeline.from_pretrained(
-    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
-)
-pipe.to("cuda")
-
-refiner = StableDiffusionXLInpaintPipeline.from_pretrained(
-    "stabilityai/stable-diffusion-xl-refiner-1.0",
-    text_encoder_2=pipe.text_encoder_2,
-    vae=pipe.vae,
-    torch_dtype=torch.float16,
-    use_safetensors=True,
-    variant="fp16",
-)
-refiner.to("cuda")
-
-img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
-mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
-
-init_image = load_image(img_url).convert("RGB")
-mask_image = load_image(mask_url).convert("RGB")
-
-prompt = "A majestic tiger sitting on a bench"
-num_inference_steps = 75
-high_noise_frac = 0.7
-
-image = pipe(
-    prompt=prompt,
-    image=init_image,
-    mask_image=mask_image,
-    num_inference_steps=num_inference_steps,
-    denoising_start=high_noise_frac,
-    output_type="latent",
-).images
-image = refiner(
-    prompt=prompt,
-    image=image,
-    mask_image=mask_image,
-    num_inference_steps=num_inference_steps,
-    denoising_start=high_noise_frac,
-).images[0]
-```
-
-To use the refiner for inpainting in the standard SDE-style setting, simply remove `denoising_end` and `denoising_start` and choose a smaller
-number of inference steps for the refiner.
-
-### Loading single file checkpoints / original file format
-
-By making use of [`~diffusers.loaders.FromSingleFileMixin.from_single_file`] you can also load the 
-original file format into `diffusers`:
-
-```py
-from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
-import torch
-
-pipe = StableDiffusionXLPipeline.from_single_file(
-    "./sd_xl_base_1.0.safetensors", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
-)
-pipe.to("cuda")
-
-refiner = StableDiffusionXLImg2ImgPipeline.from_single_file(
-    "./sd_xl_refiner_1.0.safetensors", torch_dtype=torch.float16, use_safetensors=True, variant="fp16"
-)
-refiner.to("cuda")
-```
-
-### Memory optimization via model offloading 
-
-If you are seeing out-of-memory errors, we recommend making use of [`StableDiffusionXLPipeline.enable_model_cpu_offload`].
-
-```diff
- pipe.to("cuda")
-+ pipe.enable_model_cpu_offload()
-```
-
-and 
-
-```diff
- refiner.to("cuda")
-+ refiner.enable_model_cpu_offload()
-```
-
-### Speed-up inference with `torch.compile`
-
-You can speed up inference by making use of `torch.compile`. This should give you **ca.** 20% speed-up.
-
-```diff
-+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
-+ refiner.unet = torch.compile(refiner.unet, mode="reduce-overhead", fullgraph=True)
-```
-
-### Running with `torch < 2.0`
-
-**Note** that if you want to run Stable Diffusion XL with `torch` < 2.0, please make sure to enable xformers 
-attention:
-
-```
-pip install xformers
-```
-
-```diff
-+pipe.enable_xformers_memory_efficient_attention()
-+refiner.enable_xformers_memory_efficient_attention()
-```
-
 ## StableDiffusionXLPipeline

 [[autodoc]] StableDiffusionXLPipeline
@@ -435,25 +50,3 @@ pip install xformers
 [[autodoc]] StableDiffusionXLInpaintPipeline
 	- all
 	- __call__
-
-### Passing different prompts to each text-encoder
-
-Stable Diffusion XL was trained on two text encoders. The default behavior is to pass the same prompt to each. But it is possible to pass a different prompt for each text-encoder, as [some users](https://github.com/huggingface/diffusers/issues/4004#issuecomment-1627764201) noted that it can boost quality.
-To do so, you can pass `prompt_2` and `negative_prompt_2` in addition to `prompt` and `negative_prompt`. By doing that, you will pass the original prompts and negative prompts (as in `prompt` and `negative_prompt`) to `text_encoder` (in official SDXL 0.9/1.0 that is [OpenAI CLIP-ViT/L-14](https://huggingface.co/openai/clip-vit-large-patch14)),
-and `prompt_2` and `negative_prompt_2` to `text_encoder_2` (in official SDXL 0.9/1.0 that is [OpenCLIP-ViT/bigG-14](https://huggingface.co/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k)).
-
-```py
-from diffusers import StableDiffusionXLPipeline
-import torch
-
-pipe = StableDiffusionXLPipeline.from_pretrained(
-    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
-)
-pipe.to("cuda")
-
-# prompt will be passed to OAI CLIP-ViT/L-14
-prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
-# prompt_2 will be passed to OpenCLIP-ViT/bigG-14
-prompt_2 = "monet painting"
-image = pipe(prompt=prompt, prompt_2=prompt_2).images[0]
-```
--- a/docs/source/en/using-diffusers/pipeline_overview.md
+++ b/docs/source/en/using-diffusers/pipeline_overview.md
@@ -12,6 +12,6 @@ specific language governing permissions and limitations under the License.

 # Overview

-A pipeline is an end-to-end class that provides a quick and easy way to use a diffusion system for inference by bundling independently trained models and schedulers together. Certain combinations of models and schedulers define specific pipeline types, like [`StableDiffusionPipeline`] or [`StableDiffusionControlNetPipeline`], with specific capabilities. All pipeline types inherit from the base [`DiffusionPipeline`] class; pass it any checkpoint, and it'll automatically detect the pipeline type and load the necessary components.
+A pipeline is an end-to-end class that provides a quick and easy way to use a diffusion system for inference by bundling independently trained models and schedulers together. Certain combinations of models and schedulers define specific pipeline types, like [`StableDiffusionXLPipeline`] or [`StableDiffusionControlNetPipeline`], with specific capabilities. All pipeline types inherit from the base [`DiffusionPipeline`] class; pass it any checkpoint, and it'll automatically detect the pipeline type and load the necessary components.

-This section introduces you to some of the tasks supported by our pipelines such as unconditional image generation and different techniques and variations of text-to-image generation. You'll also learn how to gain more control over the generation process by setting a seed for reproducibility and weighting prompts to adjust the influence certain words in the prompt has over the output. Finally, you'll see how you can create a community pipeline for a custom task like generating images from speech.
\ No newline at end of file
+This section introduces you to some of the more complex pipelines like Stable Diffusion XL, ControlNet, and DiffEdit, which require additional inputs. You'll also learn how to use a distilled version of the Stable Diffusion model to speed up inference, how to control randomness on your hardware when generating images, and how to create a community pipeline for a custom task like generating images from speech.
\ No newline at end of file
--- a/docs/source/en/using-diffusers/sdxl.md
+++ b/docs/source/en/using-diffusers/sdxl.md