[`Docs`] Fix typos and update files at Optimization Page (#5674)

* Fix typos, update, trim trailing whitespace * Trim trailing whitespaces * Update docs/source/en/optimization/memory.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/optimization/memory.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update _toctree.yml * Update adapt_a_model.md * Reverse * Reverse * Reverse * Update dreambooth.md * Update instructpix2pix.md * Update lora.md * Update overview.md * Update t2i_adapters.md * Update text2image.md * Update text_inversion.md * Update create_dataset.md * Update create_dataset.md * Update create_dataset.md * Update create_dataset.md * Update coreml.md * Delete docs/source/en/training/create_dataset.md * Original create_dataset.md * Update create_dataset.md * Delete docs/source/en/training/create_dataset.md * Add original file * Delete docs/source/en/training/create_dataset.md * Add original one * Delete docs/source/en/training/text2image.md * Delete docs/source/en/training/instructpix2pix.md * Delete docs/source/en/training/dreambooth.md * Add original files --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

[`Docs`] Fix typos and update files at Optimization Page (#5674)
* Fix typos, update, trim trailing whitespace * Trim trailing whitespaces * Update docs/source/en/optimization/memory.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/optimization/memory.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update _toctree.yml * Update adapt_a_model.md * Reverse * Reverse * Reverse * Update dreambooth.md * Update instructpix2pix.md * Update lora.md * Update overview.md * Update t2i_adapters.md * Update text2image.md * Update text_inversion.md * Update create_dataset.md * Update create_dataset.md * Update create_dataset.md * Update create_dataset.md * Update coreml.md * Delete docs/source/en/training/create_dataset.md * Original create_dataset.md * Update create_dataset.md * Delete docs/source/en/training/create_dataset.md * Add original file * Delete docs/source/en/training/create_dataset.md * Add original one * Delete docs/source/en/training/text2image.md * Delete docs/source/en/training/instructpix2pix.md * Delete docs/source/en/training/dreambooth.md * Add original files --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
53a8439f · M. Tolga Cangöz · GitHub · db2d8e76 · 53a8439f · 53a8439f
Unverified Commit 53a8439f authored Nov 10, 2023 by M. Tolga Cangöz Committed by GitHub Nov 09, 2023
19 changed files
--- a/docs/README.md
+++ b/docs/README.md
--- a/docs/TRANSLATING.md
+++ b/docs/TRANSLATING.md
--- a/docs/source/en/_toctree.yml
+++ b/docs/source/en/_toctree.yml
@@ -135,7 +135,7 @@
    - local: optimization/memory
      title: Reduce memory usage
    - local: optimization/torch2.0
-      title: Torch 2.0
+      title: PyTorch 2.0
    - local: optimization/xformers
      title: xFormers
    - local: optimization/tome

--- a/docs/source/en/conceptual/ethical_guidelines.md
+++ b/docs/source/en/conceptual/ethical_guidelines.md
--- a/docs/source/en/conceptual/evaluation.md
+++ b/docs/source/en/conceptual/evaluation.md
--- a/docs/source/en/conceptual/philosophy.md
+++ b/docs/source/en/conceptual/philosophy.md
--- a/docs/source/en/optimization/coreml.md
+++ b/docs/source/en/optimization/coreml.md
@@ -31,7 +31,7 @@ Thankfully, Apple engineers developed [a conversion tool](https://github.com/app
 Before you convert a model, though, take a moment to explore the Hugging Face Hub – chances are the model you're interested in is already available in Core ML format:
 - the [Apple](https://huggingface.co/apple) organization includes Stable Diffusion versions 1.4, 1.5, 2.0 base, and 2.1 base
- [coreml](https://huggingface.co/coreml) organization includes custom DreamBoothed and finetuned models
+- [coreml community](https://huggingface.co/coreml-community) includes custom finetuned models
 - use this [filter](https://huggingface.co/models?pipeline_tag=text-to-image&library=coreml&p=2&sort=likes) to return all available Core ML checkpoints
 If you can't find the model you're interested in, we recommend you follow the instructions for [Converting Models to Core ML](https://github.com/apple/ml-stable-diffusion#-converting-models-to-core-ml) by Apple.
@@ -90,7 +90,6 @@ snapshot_download(repo_id, allow_patterns=f"{variant}/*", local_dir=model_path,
 print(f"Model downloaded at {model_path}")
 ```
 ### Inference[[python-inference]]
 Once you have downloaded a snapshot of the model, you can test it using Apple's Python script.
@@ -99,7 +98,7 @@ Once you have downloaded a snapshot of the model, you can test it using Apple's
 python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i models/coreml-stable-diffusion-v1-4_original_packages -o </path/to/output/image> --compute-unit CPU_AND_GPU --seed 93
 ```
-`<output-mlpackages-directory>` should point to the checkpoint you downloaded in the step above, and `--compute-unit` indicates the hardware you want to allow for inference. It must be one of the following options: `ALL`, `CPU_AND_GPU`, `CPU_ONLY`, `CPU_AND_NE`. You may also provide an optional output path, and a seed for reproducibility.
+Pass the path of the downloaded checkpoint with `-i` flag to the script. `--compute-unit` indicates the hardware you want to allow for inference. It must be one of the following options: `ALL`, `CPU_AND_GPU`, `CPU_ONLY`, `CPU_AND_NE`. You may also provide an optional output path, and a seed for reproducibility.
 The inference script assumes you're using the original version of the Stable Diffusion model, `CompVis/stable-diffusion-v1-4`. If you use another model, you *have* to specify its Hub id in the inference command line, using the `--model-version` option. This works for models already supported and custom models you trained or fine-tuned yourself.
@@ -109,7 +108,6 @@ For example, if you want to use [`runwayml/stable-diffusion-v1-5`](https://huggi
 python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" --compute-unit ALL -o output --seed 93 -i models/coreml-stable-diffusion-v1-5_original_packages --model-version runwayml/stable-diffusion-v1-5
 ```
 ## Core ML inference in Swift
 Running inference in Swift is slightly faster than in Python because the models are already compiled in the `mlmodelc` format. This is noticeable on app startup when the model is loaded but shouldn’t be noticeable if you run several generations afterward.
@@ -149,7 +147,6 @@ You have to specify in `--resource-path` one of the checkpoints downloaded in th
 For more details, please refer to the [instructions in Apple's repo](https://github.com/apple/ml-stable-diffusion).
 ## Supported Diffusers Features
 The Core ML models and inference code don't support many of the features, options, and flexibility of 🧨 Diffusers. These are some of the limitations to keep in mind:
@@ -160,8 +157,8 @@ The Core ML models and inference code don't support many of the features, option
 Apple's [conversion and inference repo](https://github.com/apple/ml-stable-diffusion) and our own [swift-coreml-diffusers](https://github.com/huggingface/swift-coreml-diffusers) repos are intended as technology demonstrators to enable other developers to build upon.
-If you feel strongly about any missing features, please feel free to open a feature request or, better yet, a contribution PR :)
+If you feel strongly about any missing features, please feel free to open a feature request or, better yet, a contribution PR 🙂.
 ## Native Diffusers Swift app
-One easy way to run Stable Diffusion on your own Apple hardware is to use [our open-source Swift repo](https://github.com/huggingface/swift-coreml-diffusers), based on `diffusers` and Apple's conversion and inference repo. You can study the code, compile it with [Xcode](https://developer.apple.com/xcode/) and adapt it for your own needs. For your convenience, there's also a [standalone Mac app in the App Store](https://apps.apple.com/app/diffusers/id1666309574), so you can play with it without having to deal with the code or IDE. If you are a developer and have determined that Core ML is the best solution to build your Stable Diffusion app, then you can use the rest of this guide to get started with your project. We can't wait to see what you'll build :)
+One easy way to run Stable Diffusion on your own Apple hardware is to use [our open-source Swift repo](https://github.com/huggingface/swift-coreml-diffusers), based on `diffusers` and Apple's conversion and inference repo. You can study the code, compile it with [Xcode](https://developer.apple.com/xcode/) and adapt it for your own needs. For your convenience, there's also a [standalone Mac app in the App Store](https://apps.apple.com/app/diffusers/id1666309574), so you can play with it without having to deal with the code or IDE. If you are a developer and have determined that Core ML is the best solution to build your Stable Diffusion app, then you can use the rest of this guide to get started with your project. We can't wait to see what you'll build 🙂.
--- a/docs/source/en/optimization/fp16.md
+++ b/docs/source/en/optimization/fp16.md
--- a/docs/source/en/optimization/habana.md
+++ b/docs/source/en/optimization/habana.md
@@ -55,8 +55,7 @@ outputs = pipeline(
 )
 ```
-For more information, check out 🤗 Optimum Habana's [documentation](https://huggingface.co/docs/optimum/habana/usage_guides/stable_diffusion) and the [example](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion) provided in the official Github repository.
+For more information, check out 🤗 Optimum Habana's [documentation](https://huggingface.co/docs/optimum/habana/usage_guides/stable_diffusion) and the [example](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion) provided in the official GitHub repository.
 ## Benchmark

--- a/docs/source/en/optimization/memory.md
+++ b/docs/source/en/optimization/memory.md
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
 # Reduce memory usage
 A barrier to using diffusion models is the large amount of memory required. To overcome this challenge, there are several memory-reducing techniques you can use to run even some of the largest models on free-tier or consumer GPUs. Some of these techniques can even be combined to further reduce memory usage.
@@ -18,10 +30,9 @@ The results below are obtained from generating a single 512x512 image from the p
 | traced UNet      | 3.21s   | x2.96   |
 | memory-efficient attention  | 2.63s  | x3.61   |
 ## Sliced VAE
-Sliced VAE enables decoding large batches of images with limited VRAM or batches with 32 images or more by decoding the batches of latents one image at a time. You'll likely want to couple this with [`~ModelMixin.enable_xformers_memory_efficient_attention`] to further reduce memory use.
+Sliced VAE enables decoding large batches of images with limited VRAM or batches with 32 images or more by decoding the batches of latents one image at a time. You'll likely want to couple this with [`~ModelMixin.enable_xformers_memory_efficient_attention`] to reduce memory use further if you have xFormers installed.
 To use sliced VAE, call [`~StableDiffusionPipeline.enable_vae_slicing`] on your pipeline before inference:
@@ -38,6 +49,7 @@ pipe = pipe.to("cuda")
 prompt = "a photo of an astronaut riding a horse on mars"
 pipe.enable_vae_slicing()
+#pipe.enable_xformers_memory_efficient_attention()
 images = pipe([prompt] * 32).images
 ```
@@ -45,7 +57,7 @@ You may see a small performance boost in VAE decoding on multi-image batches, an
 ## Tiled VAE
-Tiled VAE processing also enables working with large images on limited VRAM (for example, generating 4k images on 8GB of VRAM) by splitting the image into overlapping tiles, decoding the tiles, and then blending the outputs together to compose the final image. You should also used tiled VAE with [`~ModelMixin.enable_xformers_memory_efficient_attention`] to further reduce memory use.
+Tiled VAE processing also enables working with large images on limited VRAM (for example, generating 4k images on 8GB of VRAM) by splitting the image into overlapping tiles, decoding the tiles, and then blending the outputs together to compose the final image. You should also used tiled VAE with [`~ModelMixin.enable_xformers_memory_efficient_attention`] to reduce memory use further if you have xFormers installed.
 To use tiled VAE processing, call [`~StableDiffusionPipeline.enable_vae_tiling`] on your pipeline before inference:
@@ -62,7 +74,7 @@ pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
 pipe = pipe.to("cuda")
 prompt = "a beautiful landscape photograph"
 pipe.enable_vae_tiling()
-pipe.enable_xformers_memory_efficient_attention()
+#pipe.enable_xformers_memory_efficient_attention()
 image = pipe([prompt], width=3840, height=2224, num_inference_steps=20).images[0]
 ```
@@ -98,24 +110,6 @@ Consider using [model offloading](#model-offloading) if you want to optimize for
 </Tip>
-CPU offloading can also be chained with attention slicing to reduce memory consumption to less than 2GB.
-```Python
-import torch
-from diffusers import StableDiffusionPipeline
-pipe = StableDiffusionPipeline.from_pretrained(
-    "runwayml/stable-diffusion-v1-5",
-    torch_dtype=torch.float16,
-    use_safetensors=True,
-)
-prompt = "a photo of an astronaut riding a horse on mars"
-pipe.enable_sequential_cpu_offload()
-image = pipe(prompt).images[0]
-```
 <Tip warning={true}>
 When using [`~StableDiffusionPipeline.enable_sequential_cpu_offload`], don't move the pipeline to CUDA beforehand or else the gain in memory consumption will only be minimal (see this [issue](https://github.com/huggingface/diffusers/issues/1934) for more information).
@@ -156,28 +150,9 @@ pipe.enable_model_cpu_offload()
 image = pipe(prompt).images[0]
 ```
-Model offloading can also be combined with attention slicing for additional memory savings.
-```Python
-import torch
-from diffusers import StableDiffusionPipeline
-pipe = StableDiffusionPipeline.from_pretrained(
-    "runwayml/stable-diffusion-v1-5",
-    torch_dtype=torch.float16,
-    use_safetensors=True,
-)
-prompt = "a photo of an astronaut riding a horse on mars"
-pipe.enable_model_cpu_offload()
-image = pipe(prompt).images[0]
-```
 <Tip warning={true}>
-In order to properly offload models after they're called, it is required to run the entire pipeline and models are called in the pipeline's expected order. Exercise caution if models are reused outside the context of the pipeline after hooks have been installed. See [Removing Hooks](https://huggingface.co/docs/accelerate/en/package_reference/big_modeling#accelerate.hooks.remove_hook_from_module)
+In order to properly offload models after they're called, it is required to run the entire pipeline and models are called in the pipeline's expected order. Exercise caution if models are reused outside the context of the pipeline after hooks have been installed. See [Removing Hooks](https://huggingface.co/docs/accelerate/en/package_reference/big_modeling#accelerate.hooks.remove_hook_from_module) for more information.
-for more information.
 [`~StableDiffusionPipeline.enable_model_cpu_offload`] is a stateful operation that installs hooks on the models and state on the pipeline.
@@ -303,7 +278,7 @@ unet_traced = torch.jit.load("unet_traced.pt")
 class TracedUNet(torch.nn.Module):
    def __init__(self):
        super().__init__()
-        self.in_channels = pipe.unet.in_channels
+        self.in_channels = pipe.unet.config.in_channels
        self.device = pipe.unet.device
    def forward(self, latent_model_input, t, encoder_hidden_states):
@@ -319,7 +294,7 @@ with torch.inference_mode():
 ## Memory-efficient attention
-Recent work on optimizing bandwidth in the attention block has generated huge speed-ups and reductions in GPU memory usage. The most recent type of memory-efficient attention is [Flash Attention](https://arxiv.org/pdf/2205.14135.pdf) (you can check out the original code at [HazyResearch/flash-attention](https://github.com/HazyResearch/flash-attention)).
+Recent work on optimizing bandwidth in the attention block has generated huge speed-ups and reductions in GPU memory usage. The most recent type of memory-efficient attention is [Flash Attention](https://arxiv.org/abs/2205.14135) (you can check out the original code at [HazyResearch/flash-attention](https://github.com/HazyResearch/flash-attention)).
 <Tip>
@@ -354,4 +329,4 @@ with torch.inference_mode():
 # pipe.disable_xformers_memory_efficient_attention()
 ```
-The iteration speed when using `xformers` should match the iteration speed of Torch 2.0 as described [here](torch2.0).
+The iteration speed when using `xformers` should match the iteration speed of PyTorch 2.0 as described [here](torch2.0).
--- a/docs/source/en/optimization/mps.md
+++ b/docs/source/en/optimization/mps.md
@@ -31,6 +31,8 @@ pipe = pipe.to("mps")
 pipe.enable_attention_slicing()
 prompt = "a photo of an astronaut riding a horse on mars"
+image = pipe(prompt).images[0]
+image
 ```
 <Tip warning={true}>
@@ -48,10 +50,10 @@ If you're using **PyTorch 1.13**, you need to "prime" the pipeline with an addit
  pipe.enable_attention_slicing()
  prompt = "a photo of an astronaut riding a horse on mars"
-# First-time "warmup" pass if PyTorch version is 1.13
+  # First-time "warmup" pass if PyTorch version is 1.13
 + _ = pipe(prompt, num_inference_steps=1)
-# Results match those from the CPU device after the warmup pass.
+  # Results match those from the CPU device after the warmup pass.
  image = pipe(prompt).images[0]
 ```
@@ -63,6 +65,7 @@ To prevent this from happening, we recommend *attention slicing* to reduce memor
 ```py
 from diffusers import DiffusionPipeline
+import torch
 pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True).to("mps")
 pipeline.enable_attention_slicing()

--- a/docs/source/en/optimization/onnx.md
+++ b/docs/source/en/optimization/onnx.md
@@ -10,13 +10,12 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 specific language governing permissions and limitations under the License.
 -->
 # ONNX Runtime
 🤗 [Optimum](https://github.com/huggingface/optimum) provides a Stable Diffusion pipeline compatible with ONNX Runtime. You'll need to install 🤗 Optimum with the following command for ONNX Runtime support:
 ```bash
-pip install optimum["onnxruntime"]
+pip install -q optimum["onnxruntime"]
 ```
 This guide will show you how to use the Stable Diffusion and Stable Diffusion XL (SDXL) pipelines with ONNX Runtime.

--- a/docs/source/en/optimization/open_vino.md
+++ b/docs/source/en/optimization/open_vino.md
@@ -10,14 +10,13 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 specific language governing permissions and limitations under the License.
 -->
 # OpenVINO
-🤗 [Optimum](https://github.com/huggingface/optimum-intel) provides Stable Diffusion pipelines compatible with OpenVINO to perform inference on a variety of Intel processors (see the [full list]((https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_Supported_Devices.html)) of supported devices).
+🤗 [Optimum](https://github.com/huggingface/optimum-intel) provides Stable Diffusion pipelines compatible with OpenVINO to perform inference on a variety of Intel processors (see the [full list](https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_Supported_Devices.html) of supported devices).
 You'll need to install 🤗 Optimum Intel with the `--upgrade-strategy eager` option to ensure [`optimum-intel`](https://github.com/huggingface/optimum-intel) is using the latest version:
-```
+```bash
 pip install --upgrade-strategy eager optimum["openvino"]
 ```

--- a/docs/source/en/optimization/tome.md
+++ b/docs/source/en/optimization/tome.md
@@ -14,18 +14,25 @@ specific language governing permissions and limitations under the License.
 [Token merging](https://huggingface.co/papers/2303.17604) (ToMe) merges redundant tokens/patches progressively in the forward pass of a Transformer-based network which can speed-up the inference latency of [`StableDiffusionPipeline`].
+Install ToMe from `pip`:
+```bash
+pip install tomesd
+```
 You can use ToMe from the [`tomesd`](https://github.com/dbolya/tomesd) library with the [`apply_patch`](https://github.com/dbolya/tomesd?tab=readme-ov-file#usage) function:
 ```diff
-from diffusers import StableDiffusionPipeline
+  from diffusers import StableDiffusionPipeline
-import tomesd
+  import torch
+  import tomesd
-pipeline = StableDiffusionPipeline.from_pretrained(
+  pipeline = StableDiffusionPipeline.from_pretrained(
        "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True,
-).to("cuda")
+  ).to("cuda")
 + tomesd.apply_patch(pipeline, ratio=0.5)
-image = pipeline("a photo of an astronaut riding a horse on mars").images[0]
+  image = pipeline("a photo of an astronaut riding a horse on mars").images[0]
 ```
 The `apply_patch` function exposes a number of [arguments](https://github.com/dbolya/tomesd#usage) to help strike a balance between pipeline inference speed and the quality of the generated tokens. The most important argument is `ratio` which controls the number of tokens that are merged during the forward pass.

--- a/docs/source/en/optimization/torch2.0.md
+++ b/docs/source/en/optimization/torch2.0.md
@@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 specific language governing permissions and limitations under the License.
 -->
-# Torch 2.0
+# PyTorch 2.0
 🤗 Diffusers supports the latest optimizations from [PyTorch 2.0](https://pytorch.org/get-started/pytorch-2.0/) which include:
@@ -48,7 +48,6 @@ In some cases - such as making the pipeline more deterministic or converting it
 ```diff
  import torch
  from diffusers import DiffusionPipeline
-  from diffusers.models.attention_processor import AttnProcessor
  pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True).to("cuda")
 + pipe.unet.set_default_attn_processor()
@@ -112,15 +111,12 @@ for _ in range(3):
 ```python
 from diffusers import StableDiffusionImg2ImgPipeline
-import requests
+from diffusers.utils import load_image
 import torch
-from PIL import Image
-from io import BytesIO
 url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
-response = requests.get(url)
+init_image = load_image(url)
-init_image = Image.open(BytesIO(response.content)).convert("RGB")
 init_image = init_image.resize((512, 512))
 path = "runwayml/stable-diffusion-v1-5"
@@ -145,23 +141,14 @@ for _ in range(3):
 ```python
 from diffusers import StableDiffusionInpaintPipeline
-import requests
+from diffusers.utils import load_image
 import torch
-from PIL import Image
-from io import BytesIO
-url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
-def download_image(url):
-    response = requests.get(url)
-    return Image.open(BytesIO(response.content)).convert("RGB")
 img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
 mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
-init_image = download_image(img_url).resize((512, 512))
+init_image = load_image(img_url).resize((512, 512))
-mask_image = download_image(mask_url).resize((512, 512))
+mask_image = load_image(mask_url).resize((512, 512))
 path = "runwayml/stable-diffusion-inpainting"
@@ -185,15 +172,12 @@ for _ in range(3):
 ```python
 from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
-import requests
+from diffusers.utils import load_image
 import torch
-from PIL import Image
-from io import BytesIO
 url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
-response = requests.get(url)
+init_image = load_image(url)
-init_image = Image.open(BytesIO(response.content)).convert("RGB")
 init_image = init_image.resize((512, 512))
 path = "runwayml/stable-diffusion-v1-5"
@@ -227,20 +211,20 @@ import torch
 run_compile = True  # Set True / False
-pipe = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-M-v1.0", variant="fp16", text_encoder=None, torch_dtype=torch.float16, use_safetensors=True)
+pipe_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-M-v1.0", variant="fp16", text_encoder=None, torch_dtype=torch.float16, use_safetensors=True)
-pipe.to("cuda")
+pipe_1.to("cuda")
 pipe_2 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-II-M-v1.0", variant="fp16", text_encoder=None, torch_dtype=torch.float16, use_safetensors=True)
 pipe_2.to("cuda")
 pipe_3 = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler", torch_dtype=torch.float16, use_safetensors=True)
 pipe_3.to("cuda")
-pipe.unet.to(memory_format=torch.channels_last)
+pipe_1.unet.to(memory_format=torch.channels_last)
 pipe_2.unet.to(memory_format=torch.channels_last)
 pipe_3.unet.to(memory_format=torch.channels_last)
 if run_compile:
-    pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
+    pipe_1.unet = torch.compile(pipe_1.unet, mode="reduce-overhead", fullgraph=True)
    pipe_2.unet = torch.compile(pipe_2.unet, mode="reduce-overhead", fullgraph=True)
    pipe_3.unet = torch.compile(pipe_3.unet, mode="reduce-overhead", fullgraph=True)
@@ -250,9 +234,9 @@ prompt_embeds = torch.randn((1, 2, 4096), dtype=torch.float16)
 neg_prompt_embeds = torch.randn((1, 2, 4096), dtype=torch.float16)
 for _ in range(3):
-    image = pipe(prompt_embeds=prompt_embeds, negative_prompt_embeds=neg_prompt_embeds, output_type="pt").images
+    image_1 = pipe_1(prompt_embeds=prompt_embeds, negative_prompt_embeds=neg_prompt_embeds, output_type="pt").images
-    image_2 = pipe_2(image=image, prompt_embeds=prompt_embeds, negative_prompt_embeds=neg_prompt_embeds, output_type="pt").images
+    image_2 = pipe_2(image=image_1, prompt_embeds=prompt_embeds, negative_prompt_embeds=neg_prompt_embeds, output_type="pt").images
-    image_3 = pipe_3(prompt=prompt, image=image, noise_level=100).images
+    image_3 = pipe_3(prompt=prompt, image=image_1, noise_level=100).images
 ```
 </details>

--- a/docs/source/en/quicktour.md
+++ b/docs/source/en/quicktour.md
--- a/docs/source/en/stable_diffusion.md
+++ b/docs/source/en/stable_diffusion.md
--- a/docs/source/en/using-diffusers/callback.md
+++ b/docs/source/en/using-diffusers/callback.md
@@ -14,14 +14,14 @@ specific language governing permissions and limitations under the License.
 [[open-in-colab]]
-Most 🤗 Diffusers pipeline now accept a `callback_on_step_end` argument that allows you to change the default behavior of denoising loop with custom defined functions. Here is an example of a callback function we can write to disable classifier free guidance after 40% of inference steps to save compute with minimum tradeoff in performance.
+Most 🤗 Diffusers pipelines now accept a `callback_on_step_end` argument that allows you to change the default behavior of denoising loop with custom defined functions. Here is an example of a callback function we can write to disable classifier-free guidance after 40% of inference steps to save compute with a minimum tradeoff in performance.
 ```python
 def callback_dynamic_cfg(pipe, step_index, timestep, callback_kwargs):
        # adjust the batch_size of prompt_embeds according to guidance_scale
        if step_index == int(pipe.num_timestep * 0.4):
                prompt_embeds = callback_kwargs["prompt_embeds"]
-                prompt_embeds =prompt_embeds.chunk(2)[-1]
+                prompt_embeds = prompt_embeds.chunk(2)[-1]
        # update guidance_scale and prompt_embeds
        pipe._guidance_scale = 0.0
@@ -36,7 +36,7 @@ Your callback function has below arguments:
 You can pass the callback function as `callback_on_step_end` argument to the pipeline along with `callback_on_step_end_tensor_inputs`.
-```
+```python
 import torch
 from diffusers import StableDiffusionPipeline
@@ -46,7 +46,7 @@ pipe = pipe.to("cuda")
 prompt = "a photo of an astronaut riding a horse on mars"
 generator = torch.Generator(device="cuda").manual_seed(1)
-out= pipe(prompt, generator=generator, callback_on_step_end = callback_custom_cfg, callback_on_step_end_tensor_inputs=['prompt_embeds'])
+out = pipe(prompt, generator=generator, callback_on_step_end=callback_custom_cfg, callback_on_step_end_tensor_inputs=['prompt_embeds'])
 out.images[0].save("out_custom_cfg.png")
 ```
@@ -55,6 +55,6 @@ Your callback function will be executed at the end of each denoising step and mo
 <Tip>
-Currently we only support `callback_on_step_end`. If you have a solid use case and require a callback function with a different execution point, please open an [feature request](https://github.com/huggingface/diffusers/issues/new/choose) so we can add it!
+Currently we only support `callback_on_step_end`. If you have a solid use case and require a callback function with a different execution point, please open a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&projects=&template=feature_request.md&title=) so we can add it!
 </Tip>
--- a/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md
+++ b/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md
@@ -39,24 +39,19 @@ device_type = jax.devices()[0].device_kind
 print(f"Found {num_devices} JAX devices of type {device_type}.")
 assert (
    "TPU" in device_type,
-    "Available device is not a TPU, please select TPU from Edit > Notebook settings > Hardware accelerator"
+    "Available device is not a TPU, please select TPU from Runtime > Change runtime type > Hardware accelerator"
 )
-"Found 8 JAX devices of type Cloud TPU."
+# Found 8 JAX devices of type Cloud TPU.
 ```
 Great, now you can import the rest of the dependencies you'll need:
 ```python
-import numpy as np
 import jax.numpy as jnp
-from pathlib import Path
 from jax import pmap
 from flax.jax_utils import replicate
 from flax.training.common_utils import shard
-from PIL import Image
-from huggingface_hub import notebook_login
 from diffusers import FlaxStableDiffusionPipeline
 ```
@@ -90,7 +85,7 @@ prompt = "A cinematic film still of Morgan Freeman starring as Jimi Hendrix, por
 prompt = [prompt] * jax.device_count()
 prompt_ids = pipeline.prepare_inputs(prompt)
 prompt_ids.shape
-"(8, 77)"
+# (8, 77)
 ```
 Model parameters and inputs have to be replicated across the 8 parallel devices. The parameters dictionary is replicated with [`flax.jax_utils.replicate`](https://flax.readthedocs.io/en/latest/api_reference/flax.jax_utils.html#flax.jax_utils.replicate) which traverses the dictionary and changes the shape of the weights so they are repeated 8 times. Arrays are replicated using `shard`.
@@ -102,7 +97,7 @@ p_params = replicate(params)
 # arrays
 prompt_ids = shard(prompt_ids)
 prompt_ids.shape
-"(8, 1, 77)"
+# (8, 1, 77)
 ```
 This shape means each one of the 8 devices receives as an input a `jnp` array with shape `(1, 77)`, where `1` is the batch size per device. On TPUs with sufficient memory, you could have a batch size larger than `1` if you want to generate multiple images (per chip) at once.
@@ -127,7 +122,7 @@ To take advantage of JAX's optimized speed on a TPU, pass `jit=True` to the pipe
 <Tip warning={true}>
-You need to ensure all your inputs have the same shape in subsequent calls, other JAX will need to recompile the code which is slower.
+You need to ensure all your inputs have the same shape in subsequent calls, otherwise JAX will need to recompile the code which is slower.
 </Tip>
@@ -137,18 +132,18 @@ The first inference run takes more time because it needs to compile the code, bu
 %%time
 images = pipeline(prompt_ids, p_params, rng, jit=True)[0]
-"CPU times: user 56.2 s, sys: 42.5 s, total: 1min 38s"
+# CPU times: user 56.2 s, sys: 42.5 s, total: 1min 38s
-"Wall time: 1min 29s"
+# Wall time: 1min 29s
 ```
 The returned array has shape `(8, 1, 512, 512, 3)` which should be reshaped to remove the second dimension and get 8 images of `512 × 512 × 3`. Then you can use the [`~utils.numpy_to_pil`] function to convert the arrays into images.
 ```python
-from diffusers import make_image_grid
+from diffusers.utils import make_image_grid
 images = images.reshape((images.shape[0] * images.shape[1],) + images.shape[-3:])
 images = pipeline.numpy_to_pil(images)
-make_image_grid(images, 2, 4)
+make_image_grid(images, rows=2, cols=4)
 ```
 ![img](https://huggingface.co/datasets/YiYiXu/test-doc-assets/resolve/main/stable_diffusion_jax_how_to_cell_38_output_0.jpeg)
@@ -181,7 +176,6 @@ make_image_grid(images, 2, 4)
 ![img](https://huggingface.co/datasets/YiYiXu/test-doc-assets/resolve/main/stable_diffusion_jax_how_to_cell_43_output_0.jpeg)
 ## How does parallelization work?
 The Flax pipeline in 🤗 Diffusers automatically compiles the model and runs it in parallel on all available devices. Let's take a closer look at how that process works.
@@ -202,7 +196,7 @@ p_generate = pmap(pipeline._generate)
 After calling `pmap`, the prepared function `p_generate` will:
 1. Make a copy of the underlying function, `pipeline._generate`, on each device.
-2. Send each device a different portion of the input arguments (this is why its necessary to call the *shard* function). In this case, `prompt_ids` has shape `(8, 1, 77, 768)` so the array is split into 8 and each copy of `_generate` receives an input with shape `(1, 77, 768)`.
+2. Send each device a different portion of the input arguments (this is why it's necessary to call the *shard* function). In this case, `prompt_ids` has shape `(8, 1, 77, 768)` so the array is split into 8 and each copy of `_generate` receives an input with shape `(1, 77, 768)`.
 The most important thing to pay attention to here is the batch size (1 in this example), and the input dimensions that make sense for your code. You don't have to change anything else to make the code work in parallel.
@@ -212,13 +206,14 @@ The first time you call the pipeline takes more time, but the calls afterward ar
 %%time
 images = p_generate(prompt_ids, p_params, rng)
 images = images.block_until_ready()
-"CPU times: user 1min 15s, sys: 18.2 s, total: 1min 34s"
-"Wall time: 1min 15s"
+# CPU times: user 1min 15s, sys: 18.2 s, total: 1min 34s
+# Wall time: 1min 15s
 ```
 Check your image dimensions to see if they're correct:
 ```python
 images.shape
-"(8, 1, 512, 512, 3)"
+# (8, 1, 512, 512, 3)
 ```