[docs] Loader APIs (#5813)

* first draft * remove old loader doc * start adding lora code examples * finish * add link to loralinearlayer * feedback * fix

[docs] Loader APIs (#5813)
* first draft * remove old loader doc * start adding lora code examples * finish * add link to loralinearlayer * feedback * fix
7457aa67 · Steven Liu · GitHub · c72a1739 · 7457aa67 · 7457aa67
Unverified Commit 7457aa67 authored Nov 20, 2023 by Steven Liu Committed by GitHub Nov 20, 2023
9 changed files
--- a/docs/source/en/_toctree.yml
+++ b/docs/source/en/_toctree.yml
@@ -186,13 +186,21 @@
  - sections:
    - local: api/configuration
      title: Configuration
-    - local: api/loaders
-      title: Loaders
    - local: api/logging
      title: Logging
    - local: api/outputs
      title: Outputs
    title: Main Classes
+  - sections:
+    - local: api/loaders/lora
+      title: LoRA
+    - local: api/loaders/single_file
+      title: Single files
+    - local: api/loaders/textual_inversion
+      title: Textual Inversion
+    - local: api/loaders/unet
+      title: UNet
+    title: Loaders
  - sections:
    - local: api/models/overview
      title: Overview

--- a/docs/source/en/api/loaders/lora.md
+++ b/docs/source/en/api/loaders/lora.md
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# LoRA
+
+LoRA is a fast and lightweight training method that inserts and trains a significantly smaller number of parameters instead of all the model parameters. This produces a smaller file (~100 MBs) and makes it easier to quickly train a model to learn a new concept. LoRA weights are typically loaded into the UNet, text encoder or both. There are two classes for loading LoRA weights:
+
+- [`LoraLoaderMixin`] provides functions for loading and unloading, fusing and unfusing, enabling and disabling, and more functions for managing LoRA weights. This class can be used with any model.
+- [`StableDiffusionXLLoraLoaderMixin`] is a [Stable Diffusion (SDXL)](../../api/pipelines/stable_diffusion/stable_diffusion_xl) version of the [`LoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model.
+
+<Tip>
+
+To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.
+
+</Tip>
+
+## LoraLoaderMixin
+
+[[autodoc]] loaders.lora.LoraLoaderMixin
+
+## StableDiffusionXLLoraLoaderMixin
+
+[[autodoc]] loaders.lora.StableDiffusionXLLoraLoaderMixin
\ No newline at end of file
--- a/docs/source/en/api/loaders.md
+++ b/docs/source/en/api/loaders.md
@@ -10,40 +10,28 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 specific language governing permissions and limitations under the License.
 -->

-# Loaders
+# Single files

-Adapters (textual inversion, LoRA, hypernetworks) allow you to modify a diffusion model to generate images in a specific style without training or finetuning the entire model. The adapter weights are very portable because they're typically only a tiny fraction of the pretrained model weights. 🤗 Diffusers provides an easy-to-use `LoaderMixin` API to load adapter weights.
+Diffusers supports loading pretrained pipeline (or model) weights stored in a single file, such as a `ckpt` or `safetensors` file. These single file types are typically produced from community trained models. There are three classes for loading single file weights:

-<Tip warning={true}>
+- [`FromSingleFileMixin`] supports loading pretrained pipeline weights stored in a single file, which can either be a `ckpt` or `safetensors` file.
+- [`FromOriginalVAEMixin`] supports loading a pretrained [`AutoencoderKL`] from pretrained ControlNet weights stored in a single file, which can either be a `ckpt` or `safetensors` file.
+- [`FromOriginalControlnetMixin`] supports loading pretrained ControlNet weights stored in a single file, which can either be a `ckpt` or `safetensors` file.

-🧪 The `LoaderMixin`s are highly experimental and prone to future changes. To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with `huggingface-cli login`.
+<Tip>

-</Tip>
-
-## UNet2DConditionLoadersMixin
-
-[[autodoc]] loaders.UNet2DConditionLoadersMixin
-
-## TextualInversionLoaderMixin
-
-[[autodoc]] loaders.TextualInversionLoaderMixin
-
-## StableDiffusionXLLoraLoaderMixin
-
-[[autodoc]] loaders.StableDiffusionXLLoraLoaderMixin
+To learn more about how to load single file weights, see the [Load different Stable Diffusion formats](../../using-diffusers/other-formats) loading guide.

-## LoraLoaderMixin
-
-[[autodoc]] loaders.LoraLoaderMixin
+</Tip>

 ## FromSingleFileMixin

-[[autodoc]] loaders.FromSingleFileMixin
+[[autodoc]] loaders.single_file.FromSingleFileMixin

-## FromOriginalControlnetMixin
+## FromOriginalVAEMixin

-[[autodoc]] loaders.FromOriginalControlnetMixin
+[[autodoc]] loaders.single_file.FromOriginalVAEMixin

-## FromOriginalVAEMixin
+## FromOriginalControlnetMixin

-[[autodoc]] loaders.FromOriginalVAEMixin
+[[autodoc]] loaders.single_file.FromOriginalControlnetMixin
\ No newline at end of file
--- a/docs/source/en/api/loaders/textual_inversion.md
+++ b/docs/source/en/api/loaders/textual_inversion.md
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Textual Inversion
+
+Textual Inversion is a training method for personalizing models by learning new text embeddings from a few example images. The file produced from training is extremely small (a few KBs) and the new embeddings can be loaded into the text encoder.
+
+[`TextualInversionLoaderMixin`] provides a function for loading Textual Inversion embeddings from Diffusers and Automatic1111 into the text encoder and loading a special token to activate the embeddings.
+
+<Tip>
+
+To learn more about how to load Textual Inversion embeddings, see the [Textual Inversion](../../using-diffusers/loading_adapters#textual-inversion) loading guide.
+
+</Tip>
+
+## TextualInversionLoaderMixin
+
+[[autodoc]] loaders.textual_inversion.TextualInversionLoaderMixin
\ No newline at end of file
--- a/docs/source/en/api/loaders/unet.md
+++ b/docs/source/en/api/loaders/unet.md
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# UNet
+
+Some training methods - like LoRA and Custom Diffusion - typically target the UNet's attention layers, but these training methods can also target other non-attention layers. Instead of training all of a model's parameters, only a subset of the parameters are trained, which is faster and more efficient. This class is useful if you're *only* loading weights into a UNet. If you need to load weights into the text encoder or a text encoder and UNet, try using the [`~loaders.LoraLoaderMixin.load_lora_weights`] function instead.
+
+The [`UNet2DConditionLoadersMixin`] class provides functions for loading and saving weights, fusing and unfusing LoRAs, disabling and enabling LoRAs, and setting and deleting adapters.
+
+<Tip>
+
+To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.
+
+</Tip>
+
+## UNet2DConditionLoadersMixin
+
+[[autodoc]] loaders.unet.UNet2DConditionLoadersMixin
\ No newline at end of file
--- a/src/diffusers/loaders/lora.py
+++ b/src/diffusers/loaders/lora.py
--- a/src/diffusers/loaders/single_file.py
+++ b/src/diffusers/loaders/single_file.py
@@ -288,12 +288,15 @@ class FromSingleFileMixin:


 class FromOriginalVAEMixin:
+    """
+    Load pretrained ControlNet weights saved in the `.ckpt` or `.safetensors` format into an [`AutoencoderKL`].
+    """
+
    @classmethod
    def from_single_file(cls, pretrained_model_link_or_path, **kwargs):
        r"""
-        Instantiate a [`AutoencoderKL`] from pretrained controlnet weights saved in the original `.ckpt` or
-        `.safetensors` format. The pipeline is format. The pipeline is set in evaluation mode (`model.eval()`) by
-        default.
+        Instantiate a [`AutoencoderKL`] from pretrained ControlNet weights saved in the original `.ckpt` or
+        `.safetensors` format. The pipeline is set in evaluation mode (`model.eval()`) by default.

        Parameters:
            pretrained_model_link_or_path (`str` or `os.PathLike`, *optional*):
@@ -348,8 +351,8 @@ class FromOriginalVAEMixin:

        <Tip warning={true}>

-            Make sure to pass both `image_size` and `scaling_factor` to `from_single_file()` if you want to load
-            a VAE that does accompany a stable diffusion model of v2 or higher or SDXL.
+            Make sure to pass both `image_size` and `scaling_factor` to `from_single_file()` if you're loading
+            a VAE from SDXL or a Stable Diffusion v2 model or higher.

        </Tip>

@@ -482,10 +485,14 @@ class FromOriginalVAEMixin:


 class FromOriginalControlnetMixin:
+    """
+    Load pretrained ControlNet weights saved in the `.ckpt` or `.safetensors` format into a [`ControlNetModel`].
+    """
+
    @classmethod
    def from_single_file(cls, pretrained_model_link_or_path, **kwargs):
        r"""
-        Instantiate a [`ControlNetModel`] from pretrained controlnet weights saved in the original `.ckpt` or
+        Instantiate a [`ControlNetModel`] from pretrained ControlNet weights saved in the original `.ckpt` or
        `.safetensors` format. The pipeline is set in evaluation mode (`model.eval()`) by default.

        Parameters:

--- a/src/diffusers/loaders/textual_inversion.py
+++ b/src/diffusers/loaders/textual_inversion.py
@@ -116,7 +116,7 @@ def load_textual_inversion_state_dicts(pretrained_model_name_or_paths, **kwargs)

 class TextualInversionLoaderMixin:
    r"""
-    Load textual inversion tokens and embeddings to the tokenizer and text encoder.
+    Load Textual Inversion tokens and embeddings to the tokenizer and text encoder.
    """

    def maybe_convert_prompt(self, prompt: Union[str, List[str]], tokenizer: "PreTrainedTokenizer"):  # noqa: F821
@@ -276,7 +276,7 @@ class TextualInversionLoaderMixin:
        **kwargs,
    ):
        r"""
-        Load textual inversion embeddings into the text encoder of [`StableDiffusionPipeline`] (both 🤗 Diffusers and
+        Load Textual Inversion embeddings into the text encoder of [`StableDiffusionPipeline`] (both 🤗 Diffusers and
        Automatic1111 formats are supported).

        Parameters:
@@ -335,7 +335,7 @@ class TextualInversionLoaderMixin:

        Example:

-        To load a textual inversion embedding vector in 🤗 Diffusers format:
+        To load a Textual Inversion embedding vector in 🤗 Diffusers format:

        ```py
        from diffusers import StableDiffusionPipeline
@@ -352,7 +352,7 @@ class TextualInversionLoaderMixin:
        image.save("cat-backpack.png")
        ```

-        To load a textual inversion embedding vector in Automatic1111 format, make sure to download the vector first
+        To load a Textual Inversion embedding vector in Automatic1111 format, make sure to download the vector first
        (for example from [civitAI](https://civitai.com/models/3036?modelVersionId=9857)) and then load the vector
        locally:


--- a/src/diffusers/loaders/unet.py
+++ b/src/diffusers/loaders/unet.py
@@ -53,6 +53,10 @@ CUSTOM_DIFFUSION_WEIGHT_NAME_SAFE = "pytorch_custom_diffusion_weights.safetensor


 class UNet2DConditionLoadersMixin:
+    """
+    Load LoRA layers into a [`UNet2DCondtionModel`].
+    """
+
    text_encoder_name = TEXT_ENCODER_NAME
    unet_name = UNET_NAME

@@ -107,6 +111,19 @@ class UNet2DConditionLoadersMixin:
                guarantee the timeliness or safety of the source, and you should refer to the mirror site for more
                information.

+        Example:
+
+        ```py
+        from diffusers import AutoPipelineForText2Image
+        import torch
+
+        pipeline = AutoPipelineForText2Image.from_pretrained(
+            "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
+        ).to("cuda")
+        pipeline.unet.load_attn_procs(
+            "jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors", adapter_name="cinematic"
+        )
+        ```
        """
        from ..models.attention_processor import CustomDiffusionAttnProcessor
        from ..models.lora import LoRACompatibleConv, LoRACompatibleLinear, LoRAConv2dLayer, LoRALinearLayer
@@ -393,12 +410,12 @@ class UNet2DConditionLoadersMixin:
        **kwargs,
    ):
        r"""
-        Save an attention processor to a directory so that it can be reloaded using the
+        Save attention processor layers to a directory so that it can be reloaded with the
        [`~loaders.UNet2DConditionLoadersMixin.load_attn_procs`] method.

        Arguments:
            save_directory (`str` or `os.PathLike`):
-                Directory to save an attention processor to. Will be created if it doesn't exist.
+                Directory to save an attention processor to (will be created if it doesn't exist).
            is_main_process (`bool`, *optional*, defaults to `True`):
                Whether the process calling this is the main process or not. Useful during distributed training and you
                need to call this function on all processes. In this case, set `is_main_process=True` only on the main
@@ -408,7 +425,21 @@ class UNet2DConditionLoadersMixin:
                replace `torch.save` with another method. Can be configured with the environment variable
                `DIFFUSERS_SAVE_MODE`.
            safe_serialization (`bool`, *optional*, defaults to `True`):
-                Whether to save the model using `safetensors` or the traditional PyTorch way with `pickle`.
+                Whether to save the model using `safetensors` or with `pickle`.
+
+        Example:
+
+        ```py
+        import torch
+        from diffusers import DiffusionPipeline
+
+        pipeline = DiffusionPipeline.from_pretrained(
+            "CompVis/stable-diffusion-v1-4",
+            torch_dtype=torch.float16,
+        ).to("cuda")
+        pipeline.unet.load_attn_procs("path-to-save-model", weight_name="pytorch_custom_diffusion_weights.bin")
+        pipeline.unet.save_attn_procs("path-to-save-model", weight_name="pytorch_custom_diffusion_weights.bin")
+        ```
        """
        from ..models.attention_processor import (
            CustomDiffusionAttnProcessor,
@@ -507,14 +538,30 @@ class UNet2DConditionLoadersMixin:
        weights: Optional[Union[List[float], float]] = None,
    ):
        """
-        Sets the adapter layers for the unet.
+        Set the currently active adapters for use in the UNet.

        Args:
            adapter_names (`List[str]` or `str`):
                The names of the adapters to use.
-            weights (`Union[List[float], float]`, *optional*):
+            adapter_weights (`Union[List[float], float]`, *optional*):
                The adapter(s) weights to use with the UNet. If `None`, the weights are set to `1.0` for all the
                adapters.
+
+        Example:
+
+        ```py
+        from diffusers import AutoPipelineForText2Image
+        import torch
+
+        pipeline = AutoPipelineForText2Image.from_pretrained(
+            "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
+        ).to("cuda")
+        pipeline.load_lora_weights(
+            "jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors", adapter_name="cinematic"
+        )
+        pipeline.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel")
+        pipeline.set_adapters(["cinematic", "pixel"], adapter_weights=[0.5, 0.5])
+        ```
        """
        if not USE_PEFT_BACKEND:
            raise ValueError("PEFT backend is required for `set_adapters()`.")
@@ -535,7 +582,22 @@ class UNet2DConditionLoadersMixin:

    def disable_lora(self):
        """
-        Disables the active LoRA layers for the unet.
+        Disable the UNet's active LoRA layers.
+
+        Example:
+
+        ```py
+        from diffusers import AutoPipelineForText2Image
+        import torch
+
+        pipeline = AutoPipelineForText2Image.from_pretrained(
+            "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
+        ).to("cuda")
+        pipeline.load_lora_weights(
+            "jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors", adapter_name="cinematic"
+        )
+        pipeline.disable_lora()
+        ```
        """
        if not USE_PEFT_BACKEND:
            raise ValueError("PEFT backend is required for this method.")
@@ -543,7 +605,22 @@ class UNet2DConditionLoadersMixin:

    def enable_lora(self):
        """
-        Enables the active LoRA layers for the unet.
+        Enable the UNet's active LoRA layers.
+
+        Example:
+
+        ```py
+        from diffusers import AutoPipelineForText2Image
+        import torch
+
+        pipeline = AutoPipelineForText2Image.from_pretrained(
+            "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
+        ).to("cuda")
+        pipeline.load_lora_weights(
+            "jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors", adapter_name="cinematic"
+        )
+        pipeline.enable_lora()
+        ```
        """
        if not USE_PEFT_BACKEND:
            raise ValueError("PEFT backend is required for this method.")
@@ -551,10 +628,26 @@ class UNet2DConditionLoadersMixin:

    def delete_adapters(self, adapter_names: Union[List[str], str]):
        """
+        Delete an adapter's LoRA layers from the UNet.
+
        Args:
-        Deletes the LoRA layers of `adapter_name` for the unet.
            adapter_names (`Union[List[str], str]`):
-                The names of the adapter to delete. Can be a single string or a list of strings
+                The names (single string or list of strings) of the adapter to delete.
+
+        Example:
+
+        ```py
+        from diffusers import AutoPipelineForText2Image
+        import torch
+
+        pipeline = AutoPipelineForText2Image.from_pretrained(
+            "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
+        ).to("cuda")
+        pipeline.load_lora_weights(
+            "jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors", adapter_names="cinematic"
+        )
+        pipeline.delete_adapters("cinematic")
+        ```
        """
        if not USE_PEFT_BACKEND:
            raise ValueError("PEFT backend is required for this method.")