Unverified Commit 1f020876 authored by Steven Liu's avatar Steven Liu Committed by GitHub
Browse files

[docs] More API stuff (#3835)

* clean up loaders

* clean up rest of main class apis

* apply feedback
parent 95ea538c
...@@ -149,7 +149,7 @@ ...@@ -149,7 +149,7 @@
- local: api/utilities - local: api/utilities
title: Utilities title: Utilities
- local: api/image_processor - local: api/image_processor
title: Vae Image Processor title: VAE Image Processor
title: Main Classes title: Main Classes
- sections: - sections:
- local: api/pipelines/overview - local: api/pipelines/overview
......
...@@ -12,8 +12,13 @@ specific language governing permissions and limitations under the License. ...@@ -12,8 +12,13 @@ specific language governing permissions and limitations under the License.
# Configuration # Configuration
Schedulers from [`~schedulers.scheduling_utils.SchedulerMixin`] and models from [`ModelMixin`] inherit from [`ConfigMixin`] which conveniently takes care of storing all the parameters that are Schedulers from [`~schedulers.scheduling_utils.SchedulerMixin`] and models from [`ModelMixin`] inherit from [`ConfigMixin`] which stores all the parameters that are passed to their respective `__init__` methods in a JSON-configuration file.
passed to their respective `__init__` methods in a JSON-configuration file.
<Tip>
To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with `huggingface-cli login`.
</Tip>
## ConfigMixin ## ConfigMixin
......
...@@ -12,12 +12,12 @@ specific language governing permissions and limitations under the License. ...@@ -12,12 +12,12 @@ specific language governing permissions and limitations under the License.
# Pipelines # Pipelines
The [`DiffusionPipeline`] is the easiest way to load any pretrained diffusion pipeline from the [Hub](https://huggingface.co/models?library=diffusers) and use it for inference. The [`DiffusionPipeline`] is the quickest way to load any pretrained diffusion pipeline from the [Hub](https://huggingface.co/models?library=diffusers) for inference.
<Tip> <Tip>
You shouldn't use the [`DiffusionPipeline`] class for training or finetuning a diffusion model. Individual You shouldn't use the [`DiffusionPipeline`] class for training or finetuning a diffusion model. Individual
components (for example, [`UNetModel`] and [`UNetConditionModel`]) of diffusion pipelines are usually trained individually, so we suggest directly working with instead. components (for example, [`UNet2DModel`] and [`UNet2DConditionModel`]) of diffusion pipelines are usually trained individually, so we suggest directly working with them instead.
</Tip> </Tip>
......
...@@ -10,24 +10,18 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o ...@@ -10,24 +10,18 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License. specific language governing permissions and limitations under the License.
--> -->
# Image Processor for VAE # VAE Image Processor
Image processor provides a unified API for Stable Diffusion pipelines to prepare their image inputs for VAE encoding, as well as post-processing their outputs once decoded. This includes transformations such as resizing, normalization, and conversion between PIL Image, PyTorch, and Numpy arrays.
All pipelines with VAE image processor will accept image inputs in the format of PIL Image, PyTorch tensor, or Numpy array, and will able to return outputs in the format of PIL Image, Pytorch tensor, and Numpy array based on the `output_type` argument from the user. Additionally, the User can pass encoded image latents directly to the pipeline, or ask the pipeline to return latents as output with `output_type = 'pt'` argument. This allows you to take the generated latents from one pipeline and pass it to another pipeline as input, without ever having to leave the latent space. It also makes it much easier to use multiple pipelines together, by passing PyTorch tensors directly between different pipelines.
# Image Processor for VAE adapted to LDM3D
LDM3D Image processor does the same as the Image processor for VAE but accepts both RGB and depth inputs and will return RGB and depth outputs.
The [`VaeImageProcessor`] provides a unified API for [`StableDiffusionPipeline`]'s to prepare image inputs for VAE encoding and post-processing outputs once they're decoded. This includes transformations such as resizing, normalization, and conversion between PIL Image, PyTorch, and NumPy arrays.
All pipelines with [`VaeImageProcessor`] accepts PIL Image, PyTorch tensor, or NumPy arrays as image inputs and returns outputs based on the `output_type` argument by the user. You can pass encoded image latents directly to the pipeline and return latents from the pipeline as a specific output with the `output_type` argument (for example `output_type="pt"`). This allows you to take the generated latents from one pipeline and pass it to another pipeline as input without leaving the latent space. It also makes it much easier to use multiple pipelines together by passing PyTorch tensors directly between different pipelines.
## VaeImageProcessor ## VaeImageProcessor
[[autodoc]] image_processor.VaeImageProcessor [[autodoc]] image_processor.VaeImageProcessor
## VaeImageProcessorLDM3D ## VaeImageProcessorLDM3D
The [`VaeImageProcessorLDM3D`] accepts RGB and depth inputs and returns RGB and depth outputs.
[[autodoc]] image_processor.VaeImageProcessorLDM3D [[autodoc]] image_processor.VaeImageProcessorLDM3D
\ No newline at end of file
...@@ -12,31 +12,26 @@ specific language governing permissions and limitations under the License. ...@@ -12,31 +12,26 @@ specific language governing permissions and limitations under the License.
# Loaders # Loaders
There are many ways to train adapter neural networks for diffusion models, such as Adapters (textual inversion, LoRA, hypernetworks) allow you to modify a diffusion model to generate images in a specific style without training or finetuning the entire model. The adapter weights are typically only a tiny fraction of the pretrained model's which making them very portable. 🤗 Diffusers provides an easy-to-use `LoaderMixin` API to load adapter weights.
- [Textual Inversion](./training/text_inversion.mdx)
- [LoRA](https://github.com/cloneofsimo/lora)
- [Hypernetworks](https://arxiv.org/abs/1609.09106)
Such adapter neural networks often only consist of a fraction of the number of weights compared <Tip warning={true}>
to the pretrained model and as such are very portable. The Diffusers library offers an easy-to-use
API to load such adapter neural networks via the [`loaders.py` module](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders.py).
**Note**: This module is still highly experimental and prone to future changes. 🧪 The `LoaderMixins` are highly experimental and prone to future changes. To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with `huggingface-cli login`.
## LoaderMixins </Tip>
### UNet2DConditionLoadersMixin ## UNet2DConditionLoadersMixin
[[autodoc]] loaders.UNet2DConditionLoadersMixin [[autodoc]] loaders.UNet2DConditionLoadersMixin
### TextualInversionLoaderMixin ## TextualInversionLoaderMixin
[[autodoc]] loaders.TextualInversionLoaderMixin [[autodoc]] loaders.TextualInversionLoaderMixin
### LoraLoaderMixin ## LoraLoaderMixin
[[autodoc]] loaders.LoraLoaderMixin [[autodoc]] loaders.LoraLoaderMixin
### FromCkptMixin ## FromCkptMixin
[[autodoc]] loaders.FromCkptMixin [[autodoc]] loaders.FromCkptMixin
...@@ -12,12 +12,9 @@ specific language governing permissions and limitations under the License. ...@@ -12,12 +12,9 @@ specific language governing permissions and limitations under the License.
# Logging # Logging
🧨 Diffusers has a centralized logging system, so that you can setup the verbosity of the library easily. 🤗 Diffusers has a centralized logging system to easily manage the verbosity of the library. The default verbosity is set to `WARNING`.
Currently the default verbosity of the library is `WARNING`. To change the verbosity level, use one of the direct setters. For instance, to change the verbosity to the `INFO` level.
To change the level of verbosity, just use one of the direct setters. For instance, here is how to change the verbosity
to the INFO level.
```python ```python
import diffusers import diffusers
...@@ -33,7 +30,7 @@ DIFFUSERS_VERBOSITY=error ./myprogram.py ...@@ -33,7 +30,7 @@ DIFFUSERS_VERBOSITY=error ./myprogram.py
``` ```
Additionally, some `warnings` can be disabled by setting the environment variable Additionally, some `warnings` can be disabled by setting the environment variable
`DIFFUSERS_NO_ADVISORY_WARNINGS` to a true value, like *1*. This will disable any warning that is logged using `DIFFUSERS_NO_ADVISORY_WARNINGS` to a true value, like `1`. This disables any warning logged by
[`logger.warning_advice`]. For example: [`logger.warning_advice`]. For example:
```bash ```bash
...@@ -52,20 +49,21 @@ logger.warning("WARN") ...@@ -52,20 +49,21 @@ logger.warning("WARN")
``` ```
All the methods of this logging module are documented below, the main ones are All methods of the logging module are documented below. The main methods are
[`logging.get_verbosity`] to get the current level of verbosity in the logger and [`logging.get_verbosity`] to get the current level of verbosity in the logger and
[`logging.set_verbosity`] to set the verbosity to the level of your choice. In order (from the least [`logging.set_verbosity`] to set the verbosity to the level of your choice.
verbose to the most verbose), those levels (with their corresponding int values in parenthesis) are:
In order from the least verbose to the most verbose:
- `diffusers.logging.CRITICAL` or `diffusers.logging.FATAL` (int value, 50): only report the most
critical errors. | Method | Integer value | Description |
- `diffusers.logging.ERROR` (int value, 40): only report errors. |----------------------------------------------------------:|--------------:|----------------------------------------------------:|
- `diffusers.logging.WARNING` or `diffusers.logging.WARN` (int value, 30): only reports error and | `diffusers.logging.CRITICAL` or `diffusers.logging.FATAL` | 50 | only report the most critical errors |
warnings. This is the default level used by the library. | `diffusers.logging.ERROR` | 40 | only report errors |
- `diffusers.logging.INFO` (int value, 20): reports error, warnings and basic information. | `diffusers.logging.WARNING` or `diffusers.logging.WARN` | 30 | only report errors and warnings (default) |
- `diffusers.logging.DEBUG` (int value, 10): report all information. | `diffusers.logging.INFO` | 20 | only report errors, warnings, and basic information |
| `diffusers.logging.DEBUG` | 10 | report all information |
By default, `tqdm` progress bars will be displayed during model download. [`logging.disable_progress_bar`] and [`logging.enable_progress_bar`] can be used to suppress or unsuppress this behavior.
By default, `tqdm` progress bars are displayed during model download. [`logging.disable_progress_bar`] and [`logging.enable_progress_bar`] are used to enable or disable this behavior.
## Base setters ## Base setters
......
...@@ -10,11 +10,9 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o ...@@ -10,11 +10,9 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License. specific language governing permissions and limitations under the License.
--> -->
# BaseOutputs # Outputs
All models have outputs that are subclasses of [`~utils.BaseOutput`]. Those are All models outputs are subclasses of [`~utils.BaseOutput`], data structures containing all the information returned by the model. The outputs can also be used as tuples or dictionaries.
data structures containing all the information returned by the model, but they can also be used as tuples or
dictionaries.
For example: For example:
......
...@@ -81,10 +81,9 @@ class FrozenDict(OrderedDict): ...@@ -81,10 +81,9 @@ class FrozenDict(OrderedDict):
class ConfigMixin: class ConfigMixin:
r""" r"""
Base class for all configuration classes. Stores all configuration parameters under `self.config` Also handles all Base class for all configuration classes. All configuration parameters are stored under `self.config`. Also
methods for loading/downloading/saving classes inheriting from [`ConfigMixin`] with provides the [`~ConfigMixin.from_config`] and [`~ConfigMixin.save_config`] methods for loading, downloading, and
- [`~ConfigMixin.from_config`] saving classes that inherit from [`ConfigMixin`].
- [`~ConfigMixin.save_config`]
Class attributes: Class attributes:
- **config_name** (`str`) -- A filename under which the config should stored when calling - **config_name** (`str`) -- A filename under which the config should stored when calling
...@@ -92,7 +91,7 @@ class ConfigMixin: ...@@ -92,7 +91,7 @@ class ConfigMixin:
- **ignore_for_config** (`List[str]`) -- A list of attributes that should not be saved in the config (should be - **ignore_for_config** (`List[str]`) -- A list of attributes that should not be saved in the config (should be
overridden by subclass). overridden by subclass).
- **has_compatibles** (`bool`) -- Whether the class has compatible classes (should be overridden by subclass). - **has_compatibles** (`bool`) -- Whether the class has compatible classes (should be overridden by subclass).
- **_deprecated_kwargs** (`List[str]`) -- Keyword arguments that are deprecated. Note that the init function - **_deprecated_kwargs** (`List[str]`) -- Keyword arguments that are deprecated. Note that the `init` function
should only have a `kwargs` argument if at least one argument is deprecated (should be overridden by should only have a `kwargs` argument if at least one argument is deprecated (should be overridden by
subclass). subclass).
""" """
...@@ -139,12 +138,12 @@ class ConfigMixin: ...@@ -139,12 +138,12 @@ class ConfigMixin:
def save_config(self, save_directory: Union[str, os.PathLike], push_to_hub: bool = False, **kwargs): def save_config(self, save_directory: Union[str, os.PathLike], push_to_hub: bool = False, **kwargs):
""" """
Save a configuration object to the directory `save_directory`, so that it can be re-loaded using the Save a configuration object to the directory specified in `save_directory` so that it can be reloaded using the
[`~ConfigMixin.from_config`] class method. [`~ConfigMixin.from_config`] class method.
Args: Args:
save_directory (`str` or `os.PathLike`): save_directory (`str` or `os.PathLike`):
Directory where the configuration JSON file will be saved (will be created if it does not exist). Directory where the configuration JSON file is saved (will be created if it does not exist).
""" """
if os.path.isfile(save_directory): if os.path.isfile(save_directory):
raise AssertionError(f"Provided path ({save_directory}) should be a directory, not a file") raise AssertionError(f"Provided path ({save_directory}) should be a directory, not a file")
...@@ -164,15 +163,14 @@ class ConfigMixin: ...@@ -164,15 +163,14 @@ class ConfigMixin:
Parameters: Parameters:
config (`Dict[str, Any]`): config (`Dict[str, Any]`):
A config dictionary from which the Python class will be instantiated. Make sure to only load A config dictionary from which the Python class is instantiated. Make sure to only load configuration
configuration files of compatible classes. files of compatible classes.
return_unused_kwargs (`bool`, *optional*, defaults to `False`): return_unused_kwargs (`bool`, *optional*, defaults to `False`):
Whether kwargs that are not consumed by the Python class should be returned or not. Whether kwargs that are not consumed by the Python class should be returned or not.
kwargs (remaining dictionary of keyword arguments, *optional*): kwargs (remaining dictionary of keyword arguments, *optional*):
Can be used to update the configuration object (after it is loaded) and initiate the Python class. Can be used to update the configuration object (after it is loaded) and initiate the Python class.
`**kwargs` are directly passed to the underlying scheduler/model's `__init__` method and eventually `**kwargs` are passed directly to the underlying scheduler/model's `__init__` method and eventually
overwrite same named arguments in `config`. overwrite the same named arguments in `config`.
Returns: Returns:
[`ModelMixin`] or [`SchedulerMixin`]: [`ModelMixin`] or [`SchedulerMixin`]:
...@@ -280,16 +278,16 @@ class ConfigMixin: ...@@ -280,16 +278,16 @@ class ConfigMixin:
Whether or not to force the (re-)download of the model weights and configuration files, overriding the Whether or not to force the (re-)download of the model weights and configuration files, overriding the
cached versions if they exist. cached versions if they exist.
resume_download (`bool`, *optional*, defaults to `False`): resume_download (`bool`, *optional*, defaults to `False`):
Whether or not to resume downloading the model weights and configuration files. If set to False, any Whether or not to resume downloading the model weights and configuration files. If set to `False`, any
incompletely downloaded files are deleted. incompletely downloaded files are deleted.
proxies (`Dict[str, str]`, *optional*): proxies (`Dict[str, str]`, *optional*):
A dictionary of proxy servers to use by protocol or endpoint, for example, `{'http': 'foo.bar:3128', A dictionary of proxy servers to use by protocol or endpoint, for example, `{'http': 'foo.bar:3128',
'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request. 'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request.
output_loading_info(`bool`, *optional*, defaults to `False`): output_loading_info(`bool`, *optional*, defaults to `False`):
Whether or not to also return a dictionary containing missing keys, unexpected keys and error messages. Whether or not to also return a dictionary containing missing keys, unexpected keys and error messages.
local_files_only(`bool`, *optional*, defaults to `False`): local_files_only (`bool`, *optional*, defaults to `False`):
Whether to only load local model weights and configuration files or not. If set to True, the model Whether to only load local model weights and configuration files or not. If set to `True`, the model
wont be downloaded from the Hub. won't be downloaded from the Hub.
use_auth_token (`str` or *bool*, *optional*): use_auth_token (`str` or *bool*, *optional*):
The token to use as HTTP bearer authorization for remote files. If `True`, the token generated from The token to use as HTTP bearer authorization for remote files. If `True`, the token generated from
`diffusers-cli login` (stored in `~/.huggingface`) is used. `diffusers-cli login` (stored in `~/.huggingface`) is used.
...@@ -307,14 +305,6 @@ class ConfigMixin: ...@@ -307,14 +305,6 @@ class ConfigMixin:
`dict`: `dict`:
A dictionary of all the parameters stored in a JSON configuration file. A dictionary of all the parameters stored in a JSON configuration file.
<Tip>
To use private or [gated models](https://huggingface.co/docs/hub/models-gated#gated-models), log-in with
`huggingface-cli login`. You can also activate the special
["offline-mode"](https://huggingface.co/transformers/installation.html#offline-mode) to use this method in a
firewalled environment.
</Tip>
""" """
cache_dir = kwargs.pop("cache_dir", DIFFUSERS_CACHE) cache_dir = kwargs.pop("cache_dir", DIFFUSERS_CACHE)
force_download = kwargs.pop("force_download", False) force_download = kwargs.pop("force_download", False)
...@@ -536,10 +526,11 @@ class ConfigMixin: ...@@ -536,10 +526,11 @@ class ConfigMixin:
def to_json_string(self) -> str: def to_json_string(self) -> str:
""" """
Serializes this instance to a JSON string. Serializes the configuration instance to a JSON string.
Returns: Returns:
`str`: String containing all the attributes that make up this configuration instance in JSON format. `str`:
String containing all the attributes that make up the configuration instance in JSON format.
""" """
config_dict = self._internal_dict if hasattr(self, "_internal_dict") else {} config_dict = self._internal_dict if hasattr(self, "_internal_dict") else {}
config_dict["_class_name"] = self.__class__.__name__ config_dict["_class_name"] = self.__class__.__name__
...@@ -560,11 +551,11 @@ class ConfigMixin: ...@@ -560,11 +551,11 @@ class ConfigMixin:
def to_json_file(self, json_file_path: Union[str, os.PathLike]): def to_json_file(self, json_file_path: Union[str, os.PathLike]):
""" """
Save this instance to a JSON file. Save the configuration instance's parameters to a JSON file.
Args: Args:
json_file_path (`str` or `os.PathLike`): json_file_path (`str` or `os.PathLike`):
Path to the JSON file in which this configuration instance's parameters will be saved. Path to the JSON file to save a configuration instance's parameters.
""" """
with open(json_file_path, "w", encoding="utf-8") as writer: with open(json_file_path, "w", encoding="utf-8") as writer:
writer.write(self.to_json_string()) writer.write(self.to_json_string())
......
...@@ -26,19 +26,18 @@ from .utils import CONFIG_NAME, PIL_INTERPOLATION, deprecate ...@@ -26,19 +26,18 @@ from .utils import CONFIG_NAME, PIL_INTERPOLATION, deprecate
class VaeImageProcessor(ConfigMixin): class VaeImageProcessor(ConfigMixin):
""" """
Image Processor for VAE Image processor for VAE.
Args: Args:
do_resize (`bool`, *optional*, defaults to `True`): do_resize (`bool`, *optional*, defaults to `True`):
Whether to downscale the image's (height, width) dimensions to multiples of `vae_scale_factor`. Can accept Whether to downscale the image's (height, width) dimensions to multiples of `vae_scale_factor`. Can accept
`height` and `width` arguments from `preprocess` method `height` and `width` arguments from [`image_processor.VaeImageProcessor.preprocess`] method.
vae_scale_factor (`int`, *optional*, defaults to `8`): vae_scale_factor (`int`, *optional*, defaults to `8`):
VAE scale factor. If `do_resize` is True, the image will be automatically resized to multiples of this VAE scale factor. If `do_resize` is `True`, the image is automatically resized to multiples of this factor.
factor.
resample (`str`, *optional*, defaults to `lanczos`): resample (`str`, *optional*, defaults to `lanczos`):
Resampling filter to use when resizing the image. Resampling filter to use when resizing the image.
do_normalize (`bool`, *optional*, defaults to `True`): do_normalize (`bool`, *optional*, defaults to `True`):
Whether to normalize the image to [-1,1] Whether to normalize the image to [-1,1].
do_convert_rgb (`bool`, *optional*, defaults to be `False`): do_convert_rgb (`bool`, *optional*, defaults to be `False`):
Whether to convert the images to RGB format. Whether to convert the images to RGB format.
""" """
...@@ -75,7 +74,7 @@ class VaeImageProcessor(ConfigMixin): ...@@ -75,7 +74,7 @@ class VaeImageProcessor(ConfigMixin):
@staticmethod @staticmethod
def pil_to_numpy(images: Union[List[PIL.Image.Image], PIL.Image.Image]) -> np.ndarray: def pil_to_numpy(images: Union[List[PIL.Image.Image], PIL.Image.Image]) -> np.ndarray:
""" """
Convert a PIL image or a list of PIL images to numpy arrays. Convert a PIL image or a list of PIL images to NumPy arrays.
""" """
if not isinstance(images, list): if not isinstance(images, list):
images = [images] images = [images]
...@@ -87,7 +86,7 @@ class VaeImageProcessor(ConfigMixin): ...@@ -87,7 +86,7 @@ class VaeImageProcessor(ConfigMixin):
@staticmethod @staticmethod
def numpy_to_pt(images: np.ndarray) -> torch.FloatTensor: def numpy_to_pt(images: np.ndarray) -> torch.FloatTensor:
""" """
Convert a numpy image to a pytorch tensor Convert a NumPy image to a PyTorch tensor.
""" """
if images.ndim == 3: if images.ndim == 3:
images = images[..., None] images = images[..., None]
...@@ -98,7 +97,7 @@ class VaeImageProcessor(ConfigMixin): ...@@ -98,7 +97,7 @@ class VaeImageProcessor(ConfigMixin):
@staticmethod @staticmethod
def pt_to_numpy(images: torch.FloatTensor) -> np.ndarray: def pt_to_numpy(images: torch.FloatTensor) -> np.ndarray:
""" """
Convert a pytorch tensor to a numpy image Convert a PyTorch tensor to a NumPy image.
""" """
images = images.cpu().permute(0, 2, 3, 1).float().numpy() images = images.cpu().permute(0, 2, 3, 1).float().numpy()
return images return images
...@@ -106,14 +105,14 @@ class VaeImageProcessor(ConfigMixin): ...@@ -106,14 +105,14 @@ class VaeImageProcessor(ConfigMixin):
@staticmethod @staticmethod
def normalize(images): def normalize(images):
""" """
Normalize an image array to [-1,1] Normalize an image array to [-1,1].
""" """
return 2.0 * images - 1.0 return 2.0 * images - 1.0
@staticmethod @staticmethod
def denormalize(images): def denormalize(images):
""" """
Denormalize an image array to [0,1] Denormalize an image array to [0,1].
""" """
return (images / 2 + 0.5).clamp(0, 1) return (images / 2 + 0.5).clamp(0, 1)
...@@ -132,7 +131,7 @@ class VaeImageProcessor(ConfigMixin): ...@@ -132,7 +131,7 @@ class VaeImageProcessor(ConfigMixin):
width: Optional[int] = None, width: Optional[int] = None,
) -> PIL.Image.Image: ) -> PIL.Image.Image:
""" """
Resize a PIL image. Both height and width will be downscaled to the next integer multiple of `vae_scale_factor` Resize a PIL image. Both height and width are downscaled to the next integer multiple of `vae_scale_factor`.
""" """
if height is None: if height is None:
height = image.height height = image.height
...@@ -152,7 +151,7 @@ class VaeImageProcessor(ConfigMixin): ...@@ -152,7 +151,7 @@ class VaeImageProcessor(ConfigMixin):
width: Optional[int] = None, width: Optional[int] = None,
) -> torch.Tensor: ) -> torch.Tensor:
""" """
Preprocess the image input, accepted formats are PIL images, numpy arrays or pytorch tensors" Preprocess the image input. Accepted formats are PIL images, NumPy arrays or PyTorch tensors.
""" """
supported_formats = (PIL.Image.Image, np.ndarray, torch.Tensor) supported_formats = (PIL.Image.Image, np.ndarray, torch.Tensor)
if isinstance(image, supported_formats): if isinstance(image, supported_formats):
...@@ -255,18 +254,17 @@ class VaeImageProcessor(ConfigMixin): ...@@ -255,18 +254,17 @@ class VaeImageProcessor(ConfigMixin):
class VaeImageProcessorLDM3D(VaeImageProcessor): class VaeImageProcessorLDM3D(VaeImageProcessor):
""" """
Image Processor for VAE LDM3D. Image processor for VAE LDM3D.
Args: Args:
do_resize (`bool`, *optional*, defaults to `True`): do_resize (`bool`, *optional*, defaults to `True`):
Whether to downscale the image's (height, width) dimensions to multiples of `vae_scale_factor`. Whether to downscale the image's (height, width) dimensions to multiples of `vae_scale_factor`.
vae_scale_factor (`int`, *optional*, defaults to `8`): vae_scale_factor (`int`, *optional*, defaults to `8`):
VAE scale factor. If `do_resize` is True, the image will be automatically resized to multiples of this VAE scale factor. If `do_resize` is `True`, the image is automatically resized to multiples of this factor.
factor.
resample (`str`, *optional*, defaults to `lanczos`): resample (`str`, *optional*, defaults to `lanczos`):
Resampling filter to use when resizing the image. Resampling filter to use when resizing the image.
do_normalize (`bool`, *optional*, defaults to `True`): do_normalize (`bool`, *optional*, defaults to `True`):
Whether to normalize the image to [-1,1] Whether to normalize the image to [-1,1].
""" """
config_name = CONFIG_NAME config_name = CONFIG_NAME
...@@ -284,7 +282,7 @@ class VaeImageProcessorLDM3D(VaeImageProcessor): ...@@ -284,7 +282,7 @@ class VaeImageProcessorLDM3D(VaeImageProcessor):
@staticmethod @staticmethod
def numpy_to_pil(images): def numpy_to_pil(images):
""" """
Convert a numpy image or a batch of images to a PIL image. Convert a NumPy image or a batch of images to a PIL image.
""" """
if images.ndim == 3: if images.ndim == 3:
images = images[None, ...] images = images[None, ...]
...@@ -310,7 +308,7 @@ class VaeImageProcessorLDM3D(VaeImageProcessor): ...@@ -310,7 +308,7 @@ class VaeImageProcessorLDM3D(VaeImageProcessor):
def numpy_to_depth(self, images): def numpy_to_depth(self, images):
""" """
Convert a numpy depth image or a batch of images to a PIL image. Convert a NumPy depth image or a batch of images to a PIL image.
""" """
if images.ndim == 3: if images.ndim == 3:
images = images[None, ...] images = images[None, ...]
......
...@@ -115,63 +115,50 @@ class UNet2DConditionLoadersMixin: ...@@ -115,63 +115,50 @@ class UNet2DConditionLoadersMixin:
def load_attn_procs(self, pretrained_model_name_or_path_or_dict: Union[str, Dict[str, torch.Tensor]], **kwargs): def load_attn_procs(self, pretrained_model_name_or_path_or_dict: Union[str, Dict[str, torch.Tensor]], **kwargs):
r""" r"""
Load pretrained attention processor layers into `UNet2DConditionModel`. Attention processor layers have to be Load pretrained attention processor layers into [`UNet2DConditionModel`]. Attention processor layers have to be
defined in defined in
[`cross_attention.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/cross_attention.py) [`cross_attention.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/cross_attention.py)
and be a `torch.nn.Module` class. and be a `torch.nn.Module` class.
<Tip warning={true}>
This function is experimental and might change in the future.
</Tip>
Parameters: Parameters:
pretrained_model_name_or_path_or_dict (`str` or `os.PathLike` or `dict`): pretrained_model_name_or_path_or_dict (`str` or `os.PathLike` or `dict`):
Can be either: Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co. - A string, the model id (for example `google/ddpm-celebahq-256`) of a pretrained model hosted on
Valid model ids should have an organization name, like `google/ddpm-celebahq-256`. the Hub.
- A path to a *directory* containing model weights saved using [`~ModelMixin.save_config`], e.g., - A path to a directory (for example `./my_model_directory`) containing the model weights saved
`./my_model_directory/`. with [`ModelMixin.save_pretrained`].
- A [torch state - A [torch state
dict](https://pytorch.org/tutorials/beginner/saving_loading_models.html#what-is-a-state-dict). dict](https://pytorch.org/tutorials/beginner/saving_loading_models.html#what-is-a-state-dict).
cache_dir (`Union[str, os.PathLike]`, *optional*): cache_dir (`Union[str, os.PathLike]`, *optional*):
Path to a directory in which a downloaded pretrained model configuration should be cached if the Path to a directory where a downloaded pretrained model configuration is cached if the standard cache
standard cache should not be used. is not used.
force_download (`bool`, *optional*, defaults to `False`): force_download (`bool`, *optional*, defaults to `False`):
Whether or not to force the (re-)download of the model weights and configuration files, overriding the Whether or not to force the (re-)download of the model weights and configuration files, overriding the
cached versions if they exist. cached versions if they exist.
resume_download (`bool`, *optional*, defaults to `False`): resume_download (`bool`, *optional*, defaults to `False`):
Whether or not to delete incompletely received files. Will attempt to resume the download if such a Whether or not to resume downloading the model weights and configuration files. If set to `False`, any
file exists. incompletely downloaded files are deleted.
proxies (`Dict[str, str]`, *optional*): proxies (`Dict[str, str]`, *optional*):
A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128', A dictionary of proxy servers to use by protocol or endpoint, for example, `{'http': 'foo.bar:3128',
'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request. 'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request.
local_files_only(`bool`, *optional*, defaults to `False`): local_files_only (`bool`, *optional*, defaults to `False`):
Whether or not to only look at local files (i.e., do not try to download the model). Whether to only load local model weights and configuration files or not. If set to `True`, the model
won't be downloaded from the Hub.
use_auth_token (`str` or *bool*, *optional*): use_auth_token (`str` or *bool*, *optional*):
The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated The token to use as HTTP bearer authorization for remote files. If `True`, the token generated from
when running `diffusers-cli login` (stored in `~/.huggingface`). `diffusers-cli login` (stored in `~/.huggingface`) is used.
revision (`str`, *optional*, defaults to `"main"`): revision (`str`, *optional*, defaults to `"main"`):
The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a The specific model version to use. It can be a branch name, a tag name, a commit id, or any identifier
git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any allowed by Git.
identifier allowed by git.
subfolder (`str`, *optional*, defaults to `""`): subfolder (`str`, *optional*, defaults to `""`):
In case the relevant files are located inside a subfolder of the model repo (either remote in The subfolder location of a model file within a larger model repository on the Hub or locally.
huggingface.co or downloaded locally), you can specify the folder name here.
mirror (`str`, *optional*): mirror (`str`, *optional*):
Mirror source to accelerate downloads in China. If you are from China and have an accessibility Mirror source to resolve accessibility issues if you’re downloading a model in China. We do not
problem, you can set this option to resolve it. Note that we do not guarantee the timeliness or safety. guarantee the timeliness or safety of the source, and you should refer to the mirror site for more
Please refer to the mirror site for more information. information.
<Tip>
It is required to be logged in (`huggingface-cli login`) when you want to use private or [gated
models](https://huggingface.co/docs/hub/models-gated#gated-models).
</Tip>
""" """
cache_dir = kwargs.pop("cache_dir", DIFFUSERS_CACHE) cache_dir = kwargs.pop("cache_dir", DIFFUSERS_CACHE)
...@@ -349,20 +336,21 @@ class UNet2DConditionLoadersMixin: ...@@ -349,20 +336,21 @@ class UNet2DConditionLoadersMixin:
**kwargs, **kwargs,
): ):
r""" r"""
Save an attention processor to a directory, so that it can be re-loaded using the Save an attention processor to a directory so that it can be reloaded using the
[`~loaders.UNet2DConditionLoadersMixin.load_attn_procs`] method. [`~loaders.UNet2DConditionLoadersMixin.load_attn_procs`] method.
Arguments: Arguments:
save_directory (`str` or `os.PathLike`): save_directory (`str` or `os.PathLike`):
Directory to which to save. Will be created if it doesn't exist. Directory to save an attention processor to. Will be created if it doesn't exist.
is_main_process (`bool`, *optional*, defaults to `True`): is_main_process (`bool`, *optional*, defaults to `True`):
Whether the process calling this is the main process or not. Useful when in distributed training like Whether the process calling this is the main process or not. Useful during distributed training and you
TPUs and need to call this function on all processes. In this case, set `is_main_process=True` only on need to call this function on all processes. In this case, set `is_main_process=True` only on the main
the main process to avoid race conditions. process to avoid race conditions.
save_function (`Callable`): save_function (`Callable`):
The function to use to save the state dictionary. Useful on distributed training like TPUs when one The function to use to save the state dictionary. Useful during distributed training when you need to
need to replace `torch.save` by another method. Can be configured with the environment variable replace `torch.save` with another method. Can be configured with the environment variable
`DIFFUSERS_SAVE_MODE`. `DIFFUSERS_SAVE_MODE`.
""" """
weight_name = weight_name or deprecate( weight_name = weight_name or deprecate(
"weights_name", "weights_name",
...@@ -418,15 +406,14 @@ class UNet2DConditionLoadersMixin: ...@@ -418,15 +406,14 @@ class UNet2DConditionLoadersMixin:
class TextualInversionLoaderMixin: class TextualInversionLoaderMixin:
r""" r"""
Mixin class for loading textual inversion tokens and embeddings to the tokenizer and text encoder. Load textual inversion tokens and embeddings to the tokenizer and text encoder.
""" """
def maybe_convert_prompt(self, prompt: Union[str, List[str]], tokenizer: "PreTrainedTokenizer"): def maybe_convert_prompt(self, prompt: Union[str, List[str]], tokenizer: "PreTrainedTokenizer"):
r""" r"""
Maybe convert a prompt into a "multi vector"-compatible prompt. If the prompt includes a token that corresponds Processes prompts that include a special token corresponding to a multi-vector textual inversion embedding to
to a multi-vector textual inversion embedding, this function will process the prompt so that the special token be replaced with multiple special tokens each corresponding to one of the vectors. If the prompt has no textual
is replaced with multiple special tokens each corresponding to one of the vectors. If the prompt has no textual inversion token or if the textual inversion token is a single vector, the input prompt is returned.
inversion token or a textual inversion token that is a single vector, the input prompt is simply returned.
Parameters: Parameters:
prompt (`str` or list of `str`): prompt (`str` or list of `str`):
...@@ -486,78 +473,61 @@ class TextualInversionLoaderMixin: ...@@ -486,78 +473,61 @@ class TextualInversionLoaderMixin:
**kwargs, **kwargs,
): ):
r""" r"""
Load textual inversion embeddings into the text encoder of stable diffusion pipelines. Both `diffusers` and Load textual inversion embeddings into the text encoder of [`StableDiffusionPipeline`] (both 🤗 Diffusers and
`Automatic1111` formats are supported (see example below). Automatic1111 formats are supported).
<Tip warning={true}>
This function is experimental and might change in the future.
</Tip>
Parameters: Parameters:
pretrained_model_name_or_path (`str` or `os.PathLike` or `List[str or os.PathLike]` or `Dict` or `List[Dict]`): pretrained_model_name_or_path (`str` or `os.PathLike` or `List[str or os.PathLike]` or `Dict` or `List[Dict]`):
Can be either: Can be either one of the following or a list of them:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co. - A string, the *model id* (for example `sd-concepts-library/low-poly-hd-logos-icons`) of a
Valid model ids should have an organization name, like pretrained model hosted on the Hub.
`"sd-concepts-library/low-poly-hd-logos-icons"`. - A path to a *directory* (for example `./my_text_inversion_directory/`) containing the textual
- A path to a *directory* containing textual inversion weights, e.g. inversion weights.
`./my_text_inversion_directory/`. - A path to a *file* (for example `./my_text_inversions.pt`) containing textual inversion weights.
- A path to a *file* containing textual inversion weights, e.g. `./my_text_inversions.pt`.
- A [torch state - A [torch state
dict](https://pytorch.org/tutorials/beginner/saving_loading_models.html#what-is-a-state-dict). dict](https://pytorch.org/tutorials/beginner/saving_loading_models.html#what-is-a-state-dict).
Or a list of those elements.
token (`str` or `List[str]`, *optional*): token (`str` or `List[str]`, *optional*):
Override the token to use for the textual inversion weights. If `pretrained_model_name_or_path` is a Override the token to use for the textual inversion weights. If `pretrained_model_name_or_path` is a
list, then `token` must also be a list of equal length. list, then `token` must also be a list of equal length.
weight_name (`str`, *optional*): weight_name (`str`, *optional*):
Name of a custom weight file. This should be used in two cases: Name of a custom weight file. This should be used when:
- The saved textual inversion file is in `diffusers` format, but was saved under a specific weight - The saved textual inversion file is in 🤗 Diffusers format, but was saved under a specific weight
name, such as `text_inv.bin`. name such as `text_inv.bin`.
- The saved textual inversion file is in the "Automatic1111" form. - The saved textual inversion file is in the Automatic1111 format.
cache_dir (`Union[str, os.PathLike]`, *optional*): cache_dir (`Union[str, os.PathLike]`, *optional*):
Path to a directory in which a downloaded pretrained model configuration should be cached if the Path to a directory where a downloaded pretrained model configuration is cached if the standard cache
standard cache should not be used. is not used.
force_download (`bool`, *optional*, defaults to `False`): force_download (`bool`, *optional*, defaults to `False`):
Whether or not to force the (re-)download of the model weights and configuration files, overriding the Whether or not to force the (re-)download of the model weights and configuration files, overriding the
cached versions if they exist. cached versions if they exist.
resume_download (`bool`, *optional*, defaults to `False`): resume_download (`bool`, *optional*, defaults to `False`):
Whether or not to delete incompletely received files. Will attempt to resume the download if such a Whether or not to resume downloading the model weights and configuration files. If set to `False`, any
file exists. incompletely downloaded files are deleted.
proxies (`Dict[str, str]`, *optional*): proxies (`Dict[str, str]`, *optional*):
A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128', A dictionary of proxy servers to use by protocol or endpoint, for example, `{'http': 'foo.bar:3128',
'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request. 'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request.
local_files_only(`bool`, *optional*, defaults to `False`): local_files_only (`bool`, *optional*, defaults to `False`):
Whether or not to only look at local files (i.e., do not try to download the model). Whether to only load local model weights and configuration files or not. If set to `True`, the model
won't be downloaded from the Hub.
use_auth_token (`str` or *bool*, *optional*): use_auth_token (`str` or *bool*, *optional*):
The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated The token to use as HTTP bearer authorization for remote files. If `True`, the token generated from
when running `diffusers-cli login` (stored in `~/.huggingface`). `diffusers-cli login` (stored in `~/.huggingface`) is used.
revision (`str`, *optional*, defaults to `"main"`): revision (`str`, *optional*, defaults to `"main"`):
The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a The specific model version to use. It can be a branch name, a tag name, a commit id, or any identifier
git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any allowed by Git.
identifier allowed by git.
subfolder (`str`, *optional*, defaults to `""`): subfolder (`str`, *optional*, defaults to `""`):
In case the relevant files are located inside a subfolder of the model repo (either remote in The subfolder location of a model file within a larger model repository on the Hub or locally.
huggingface.co or downloaded locally), you can specify the folder name here.
mirror (`str`, *optional*): mirror (`str`, *optional*):
Mirror source to accelerate downloads in China. If you are from China and have an accessibility Mirror source to resolve accessibility issues if you're downloading a model in China. We do not
problem, you can set this option to resolve it. Note that we do not guarantee the timeliness or safety. guarantee the timeliness or safety of the source, and you should refer to the mirror site for more
Please refer to the mirror site for more information. information.
<Tip>
It is required to be logged in (`huggingface-cli login`) when you want to use private or [gated
models](https://huggingface.co/docs/hub/models-gated#gated-models).
</Tip>
Example: Example:
To load a textual inversion embedding vector in `diffusers` format: To load a textual inversion embedding vector in 🤗 Diffusers format:
```py ```py
from diffusers import StableDiffusionPipeline from diffusers import StableDiffusionPipeline
...@@ -574,8 +544,9 @@ class TextualInversionLoaderMixin: ...@@ -574,8 +544,9 @@ class TextualInversionLoaderMixin:
image.save("cat-backpack.png") image.save("cat-backpack.png")
``` ```
To load a textual inversion embedding vector in Automatic1111 format, make sure to first download the vector, To load a textual inversion embedding vector in Automatic1111 format, make sure to download the vector first
e.g. from [civitAI](https://civitai.com/models/3036?modelVersionId=9857) and then load the vector locally: (for example from [civitAI](https://civitai.com/models/3036?modelVersionId=9857)) and then load the vector
locally:
```py ```py
from diffusers import StableDiffusionPipeline from diffusers import StableDiffusionPipeline
...@@ -766,78 +737,56 @@ class TextualInversionLoaderMixin: ...@@ -766,78 +737,56 @@ class TextualInversionLoaderMixin:
class LoraLoaderMixin: class LoraLoaderMixin:
r""" r"""
Utility class for handling the loading LoRA layers into UNet (of class [`UNet2DConditionModel`]) and Text Encoder Load LoRA layers into [`UNet2DConditionModel`] and
(of class [`CLIPTextModel`](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPTextModel)). [`CLIPTextModel`](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPTextModel).
<Tip warning={true}>
This function is experimental and might change in the future.
</Tip>
""" """
text_encoder_name = TEXT_ENCODER_NAME text_encoder_name = TEXT_ENCODER_NAME
unet_name = UNET_NAME unet_name = UNET_NAME
def load_lora_weights(self, pretrained_model_name_or_path_or_dict: Union[str, Dict[str, torch.Tensor]], **kwargs): def load_lora_weights(self, pretrained_model_name_or_path_or_dict: Union[str, Dict[str, torch.Tensor]], **kwargs):
r""" r"""
Load pretrained attention processor layers (such as LoRA) into [`UNet2DConditionModel`] and Load pretrained LoRA attention processor layers into [`UNet2DConditionModel`] and
[`CLIPTextModel`](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPTextModel)). [`CLIPTextModel`](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPTextModel).
<Tip warning={true}>
We support loading A1111 formatted LoRA checkpoints in a limited capacity.
This function is experimental and might change in the future.
</Tip>
Parameters: Parameters:
pretrained_model_name_or_path_or_dict (`str` or `os.PathLike` or `dict`): pretrained_model_name_or_path_or_dict (`str` or `os.PathLike` or `dict`):
Can be either: Can be either:
- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co. - A string, the *model id* (for example `google/ddpm-celebahq-256`) of a pretrained model hosted on
Valid model ids should have an organization name, like `google/ddpm-celebahq-256`. the Hub.
- A path to a *directory* containing model weights saved using [`~ModelMixin.save_config`], e.g., - A path to a *directory* (for example `./my_model_directory`) containing the model weights saved
`./my_model_directory/`. with [`ModelMixin.save_pretrained`].
- A [torch state - A [torch state
dict](https://pytorch.org/tutorials/beginner/saving_loading_models.html#what-is-a-state-dict). dict](https://pytorch.org/tutorials/beginner/saving_loading_models.html#what-is-a-state-dict).
cache_dir (`Union[str, os.PathLike]`, *optional*): cache_dir (`Union[str, os.PathLike]`, *optional*):
Path to a directory in which a downloaded pretrained model configuration should be cached if the Path to a directory where a downloaded pretrained model configuration is cached if the standard cache
standard cache should not be used. is not used.
force_download (`bool`, *optional*, defaults to `False`): force_download (`bool`, *optional*, defaults to `False`):
Whether or not to force the (re-)download of the model weights and configuration files, overriding the Whether or not to force the (re-)download of the model weights and configuration files, overriding the
cached versions if they exist. cached versions if they exist.
resume_download (`bool`, *optional*, defaults to `False`): resume_download (`bool`, *optional*, defaults to `False`):
Whether or not to delete incompletely received files. Will attempt to resume the download if such a Whether or not to resume downloading the model weights and configuration files. If set to `False`, any
file exists. incompletely downloaded files are deleted.
proxies (`Dict[str, str]`, *optional*): proxies (`Dict[str, str]`, *optional*):
A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128', A dictionary of proxy servers to use by protocol or endpoint, for example, `{'http': 'foo.bar:3128',
'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request. 'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request.
local_files_only(`bool`, *optional*, defaults to `False`): local_files_only (`bool`, *optional*, defaults to `False`):
Whether or not to only look at local files (i.e., do not try to download the model). Whether to only load local model weights and configuration files or not. If set to `True`, the model
won't be downloaded from the Hub.
use_auth_token (`str` or *bool*, *optional*): use_auth_token (`str` or *bool*, *optional*):
The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated The token to use as HTTP bearer authorization for remote files. If `True`, the token generated from
when running `diffusers-cli login` (stored in `~/.huggingface`). `diffusers-cli login` (stored in `~/.huggingface`) is used.
revision (`str`, *optional*, defaults to `"main"`): revision (`str`, *optional*, defaults to `"main"`):
The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a The specific model version to use. It can be a branch name, a tag name, a commit id, or any identifier
git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any allowed by Git.
identifier allowed by git.
subfolder (`str`, *optional*, defaults to `""`): subfolder (`str`, *optional*, defaults to `""`):
In case the relevant files are located inside a subfolder of the model repo (either remote in The subfolder location of a model file within a larger model repository on the Hub or locally.
huggingface.co or downloaded locally), you can specify the folder name here.
mirror (`str`, *optional*): mirror (`str`, *optional*):
Mirror source to accelerate downloads in China. If you are from China and have an accessibility Mirror source to resolve accessibility issues if you're downloading a model in China. We do not
problem, you can set this option to resolve it. Note that we do not guarantee the timeliness or safety. guarantee the timeliness or safety of the source, and you should refer to the mirror site for more
Please refer to the mirror site for more information. information.
<Tip>
It is required to be logged in (`huggingface-cli login`) when you want to use private or [gated
models](https://huggingface.co/docs/hub/models-gated#gated-models).
</Tip>
""" """
# Load the main state dict first which has the LoRA layers for either of # Load the main state dict first which has the LoRA layers for either of
# UNet and text encoder or both. # UNet and text encoder or both.
...@@ -1062,7 +1011,7 @@ class LoraLoaderMixin: ...@@ -1062,7 +1011,7 @@ class LoraLoaderMixin:
proxies (`Dict[str, str]`, *optional*): proxies (`Dict[str, str]`, *optional*):
A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128', A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request. 'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request.
local_files_only(`bool`, *optional*, defaults to `False`): local_files_only (`bool`, *optional*, defaults to `False`):
Whether or not to only look at local files (i.e., do not try to download the model). Whether or not to only look at local files (i.e., do not try to download the model).
use_auth_token (`str` or *bool*, *optional*): use_auth_token (`str` or *bool*, *optional*):
The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
...@@ -1210,26 +1159,23 @@ class LoraLoaderMixin: ...@@ -1210,26 +1159,23 @@ class LoraLoaderMixin:
safe_serialization: bool = False, safe_serialization: bool = False,
): ):
r""" r"""
Save the LoRA parameters corresponding to the UNet and the text encoder. Save the LoRA parameters corresponding to the UNet and text encoder.
Arguments: Arguments:
save_directory (`str` or `os.PathLike`): save_directory (`str` or `os.PathLike`):
Directory to which to save. Will be created if it doesn't exist. Directory to save LoRA parameters to. Will be created if it doesn't exist.
unet_lora_layers (`Dict[str, torch.nn.Module]` or `Dict[str, torch.Tensor]`): unet_lora_layers (`Dict[str, torch.nn.Module]` or `Dict[str, torch.Tensor]`):
State dict of the LoRA layers corresponding to the UNet. Specifying this helps to make the State dict of the LoRA layers corresponding to the UNet.
serialization process easier and cleaner. Values can be both LoRA torch.nn.Modules layers or torch
weights.
text_encoder_lora_layers (`Dict[str, torch.nn.Module] or `Dict[str, torch.Tensor]`): text_encoder_lora_layers (`Dict[str, torch.nn.Module] or `Dict[str, torch.Tensor]`):
State dict of the LoRA layers corresponding to the `text_encoder`. Since the `text_encoder` comes from State dict of the LoRA layers corresponding to the `text_encoder`. Must explicitly pass the text
`transformers`, we cannot rejig it. That is why we have to explicitly pass the text encoder LoRA state encoder LoRA state dict because it comes 🤗 Transformers.
dict. Values can be both LoRA torch.nn.Modules layers or torch weights.
is_main_process (`bool`, *optional*, defaults to `True`): is_main_process (`bool`, *optional*, defaults to `True`):
Whether the process calling this is the main process or not. Useful when in distributed training like Whether the process calling this is the main process or not. Useful during distributed training and you
TPUs and need to call this function on all processes. In this case, set `is_main_process=True` only on need to call this function on all processes. In this case, set `is_main_process=True` only on the main
the main process to avoid race conditions. process to avoid race conditions.
save_function (`Callable`): save_function (`Callable`):
The function to use to save the state dictionary. Useful on distributed training like TPUs when one The function to use to save the state dictionary. Useful during distributed training when you need to
need to replace `torch.save` by another method. Can be configured with the environment variable replace `torch.save` with another method. Can be configured with the environment variable
`DIFFUSERS_SAVE_MODE`. `DIFFUSERS_SAVE_MODE`.
""" """
if os.path.isfile(save_directory): if os.path.isfile(save_directory):
...@@ -1331,73 +1277,72 @@ class LoraLoaderMixin: ...@@ -1331,73 +1277,72 @@ class LoraLoaderMixin:
class FromCkptMixin: class FromCkptMixin:
"""This helper class allows to directly load .ckpt stable diffusion file_extension """
into the respective classes.""" Load model weights saved in the `.ckpt` format into a [`DiffusionPipeline`].
"""
@classmethod @classmethod
def from_ckpt(cls, pretrained_model_link_or_path, **kwargs): def from_ckpt(cls, pretrained_model_link_or_path, **kwargs):
r""" r"""
Instantiate a PyTorch diffusion pipeline from pre-trained pipeline weights saved in the original .ckpt format. Instantiate a [`DiffusionPipeline`] from pretrained pipeline weights saved in the `.ckpt` format. The pipeline
is set in evaluation mode (`model.eval()`) by default.
The pipeline is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated).
Parameters: Parameters:
pretrained_model_link_or_path (`str` or `os.PathLike`, *optional*): pretrained_model_link_or_path (`str` or `os.PathLike`, *optional*):
Can be either: Can be either:
- A link to the .ckpt file on the Hub. Should be in the format - A link to the `.ckpt` file (for example
`"https://huggingface.co/<repo_id>/blob/main/<path_to_file>"` `"https://huggingface.co/<repo_id>/blob/main/<path_to_file>.ckpt"`) on the Hub.
- A path to a *file* containing all pipeline weights. - A path to a *file* containing all pipeline weights.
torch_dtype (`str` or `torch.dtype`, *optional*): torch_dtype (`str` or `torch.dtype`, *optional*):
Override the default `torch.dtype` and load the model under this dtype. If `"auto"` is passed the dtype Override the default `torch.dtype` and load the model with another dtype. If `"auto"` is passed, the
will be automatically derived from the model's weights. dtype is automatically derived from the model's weights.
force_download (`bool`, *optional*, defaults to `False`): force_download (`bool`, *optional*, defaults to `False`):
Whether or not to force the (re-)download of the model weights and configuration files, overriding the Whether or not to force the (re-)download of the model weights and configuration files, overriding the
cached versions if they exist. cached versions if they exist.
cache_dir (`Union[str, os.PathLike]`, *optional*): cache_dir (`Union[str, os.PathLike]`, *optional*):
Path to a directory in which a downloaded pretrained model configuration should be cached if the Path to a directory where a downloaded pretrained model configuration is cached if the standard cache
standard cache should not be used. is not used.
resume_download (`bool`, *optional*, defaults to `False`): resume_download (`bool`, *optional*, defaults to `False`):
Whether or not to delete incompletely received files. Will attempt to resume the download if such a Whether or not to resume downloading the model weights and configuration files. If set to `False`, any
file exists. incompletely downloaded files are deleted.
proxies (`Dict[str, str]`, *optional*): proxies (`Dict[str, str]`, *optional*):
A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128', A dictionary of proxy servers to use by protocol or endpoint, for example, `{'http': 'foo.bar:3128',
'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request. 'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request.
local_files_only (`bool`, *optional*, defaults to `False`): local_files_only (`bool`, *optional*, defaults to `False`):
Whether or not to only look at local files (i.e., do not try to download the model). Whether to only load local model weights and configuration files or not. If set to True, the model
won't be downloaded from the Hub.
use_auth_token (`str` or *bool*, *optional*): use_auth_token (`str` or *bool*, *optional*):
The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated The token to use as HTTP bearer authorization for remote files. If `True`, the token generated from
when running `huggingface-cli login` (stored in `~/.huggingface`). `diffusers-cli login` (stored in `~/.huggingface`) is used.
revision (`str`, *optional*, defaults to `"main"`): revision (`str`, *optional*, defaults to `"main"`):
The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a The specific model version to use. It can be a branch name, a tag name, a commit id, or any identifier
git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any allowed by Git.
identifier allowed by git.
use_safetensors (`bool`, *optional*, defaults to `None`): use_safetensors (`bool`, *optional*, defaults to `None`):
If set to `None`, the pipeline will load the `safetensors` weights if they're available **and** if the If set to `None`, the safetensors weights are downloaded if they're available **and** if the
`safetensors` library is installed. If set to `True`, the pipeline will forcibly load the models from safetensors library is installed. If set to `True`, the model is forcibly loaded from safetensors
`safetensors` weights. If set to `False` the pipeline will *not* use `safetensors`. weights. If set to `False`, safetensors weights are not loaded.
extract_ema (`bool`, *optional*, defaults to `False`): Only relevant for extract_ema (`bool`, *optional*, defaults to `False`):
checkpoints that have both EMA and non-EMA weights. Whether to extract the EMA weights or not. Defaults Whether to extract the EMA weights or not. Pass `True` to extract the EMA weights which usually yield
to `False`. Pass `True` to extract the EMA weights. EMA weights usually yield higher quality images for higher quality images for inference. Non-EMA weights are usually better to continue finetuning.
inference. Non-EMA weights are usually better to continue fine-tuning.
upcast_attention (`bool`, *optional*, defaults to `None`): upcast_attention (`bool`, *optional*, defaults to `None`):
Whether the attention computation should always be upcasted. This is necessary when running stable Whether the attention computation should always be upcasted.
image_size (`int`, *optional*, defaults to 512): image_size (`int`, *optional*, defaults to 512):
The image size that the model was trained on. Use 512 for Stable Diffusion v1.X and Stable Diffusion v2 The image size the model was trained on. Use 512 for all Stable Diffusion v1 models and the Stable
Base. Use 768 for Stable Diffusion v2. Diffusion v2 base model. Use 768 for Stable Diffusion v2.
prediction_type (`str`, *optional*): prediction_type (`str`, *optional*):
The prediction type that the model was trained on. Use `'epsilon'` for Stable Diffusion v1.X and Stable The prediction type the model was trained on. Use `'epsilon'` for all Stable Diffusion v1 models and
Diffusion v2 Base. Use `'v_prediction'` for Stable Diffusion v2. the Stable Diffusion v2 base model. Use `'v_prediction'` for Stable Diffusion v2.
num_in_channels (`int`, *optional*, defaults to None): num_in_channels (`int`, *optional*, defaults to `None`):
The number of input channels. If `None`, it will be automatically inferred. The number of input channels. If `None`, it will be automatically inferred.
scheduler_type (`str`, *optional*, defaults to 'pndm'): scheduler_type (`str`, *optional*, defaults to `"pndm"`):
Type of scheduler to use. Should be one of `["pndm", "lms", "heun", "euler", "euler-ancestral", "dpm", Type of scheduler to use. Should be one of `["pndm", "lms", "heun", "euler", "euler-ancestral", "dpm",
"ddim"]`. "ddim"]`.
load_safety_checker (`bool`, *optional*, defaults to `True`): load_safety_checker (`bool`, *optional*, defaults to `True`):
Whether to load the safety checker or not. Defaults to `True`. Whether to load the safety checker or not.
kwargs (remaining dictionary of keyword arguments, *optional*): kwargs (remaining dictionary of keyword arguments, *optional*):
Can be used to overwrite load - and saveable variables - *i.e.* the pipeline components - of the Can be used to overwrite load and saveable variables (for example the pipeline components of the
specific pipeline class. The overwritten components are then directly passed to the pipelines specific pipeline class). The overwritten components are directly passed to the pipelines `__init__`
`__init__` method. See example below for more information. method. See example below for more information.
Examples: Examples:
......
...@@ -83,8 +83,8 @@ class FlaxImagePipelineOutput(BaseOutput): ...@@ -83,8 +83,8 @@ class FlaxImagePipelineOutput(BaseOutput):
Args: Args:
images (`List[PIL.Image.Image]` or `np.ndarray`) images (`List[PIL.Image.Image]` or `np.ndarray`)
List of denoised PIL images of length `batch_size` or numpy array of shape `(batch_size, height, width, List of denoised PIL images of length `batch_size` or NumPy array of shape `(batch_size, height, width,
num_channels)`. PIL images or numpy array present the denoised images of the diffusion pipeline. num_channels)`.
""" """
images: Union[List[PIL.Image.Image], np.ndarray] images: Union[List[PIL.Image.Image], np.ndarray]
......
...@@ -115,8 +115,8 @@ class ImagePipelineOutput(BaseOutput): ...@@ -115,8 +115,8 @@ class ImagePipelineOutput(BaseOutput):
Args: Args:
images (`List[PIL.Image.Image]` or `np.ndarray`) images (`List[PIL.Image.Image]` or `np.ndarray`)
List of denoised PIL images of length `batch_size` or numpy array of shape `(batch_size, height, width, List of denoised PIL images of length `batch_size` or NumPy array of shape `(batch_size, height, width,
num_channels)`. PIL images or numpy array present the denoised images of the diffusion pipeline. num_channels)`.
""" """
images: Union[List[PIL.Image.Image], np.ndarray] images: Union[List[PIL.Image.Image], np.ndarray]
...@@ -129,8 +129,7 @@ class AudioPipelineOutput(BaseOutput): ...@@ -129,8 +129,7 @@ class AudioPipelineOutput(BaseOutput):
Args: Args:
audios (`np.ndarray`) audios (`np.ndarray`)
List of denoised samples of shape `(batch_size, num_channels, sample_rate)`. Numpy array present the List of denoised audio samples of a NumPy array of shape `(batch_size, num_channels, sample_rate)`.
denoised audio samples of the diffusion pipeline.
""" """
audios: np.ndarray audios: np.ndarray
...@@ -458,20 +457,20 @@ def load_sub_model( ...@@ -458,20 +457,20 @@ def load_sub_model(
class DiffusionPipeline(ConfigMixin): class DiffusionPipeline(ConfigMixin):
r""" r"""
Base class for all models. Base class for all pipelines.
[`DiffusionPipeline`] takes care of storing all components (models, schedulers, processors) for diffusion pipelines [`DiffusionPipeline`] stores all components (models, schedulers, and processors) for diffusion pipelines and
and handles methods for loading, downloading and saving models as well as a few methods common to all pipelines to: provides methods for loading, downloading and saving models. It also includes methods to:
- move all PyTorch modules to the device of your choice - move all PyTorch modules to the device of your choice
- enabling/disabling the progress bar for the denoising iteration - enabling/disabling the progress bar for the denoising iteration
Class attributes: Class attributes:
- **config_name** (`str`) -- name of the config file that will store the class and module names of all - **config_name** (`str`) -- The configuration filename that stores the class and module names of all the
components of the diffusion pipeline. diffusion pipeline's components.
- **_optional_components** (List[`str`]) -- list of all components that are optional so they don't have to be - **_optional_components** (List[`str`]) -- List of all optional components that don't have to be passed to the
passed for the pipeline to function (should be overridden by subclasses). pipeline to function (should be overridden by subclasses).
""" """
config_name = "model_index.json" config_name = "model_index.json"
_optional_components = [] _optional_components = []
...@@ -541,17 +540,17 @@ class DiffusionPipeline(ConfigMixin): ...@@ -541,17 +540,17 @@ class DiffusionPipeline(ConfigMixin):
variant: Optional[str] = None, variant: Optional[str] = None,
): ):
""" """
Save all variables of the pipeline that can be saved and loaded as well as the pipelines configuration file to Save all saveable variables of the pipeline to a directory. A pipeline variable can be saved and loaded if its
a directory. A pipeline variable can be saved and loaded if its class implements both a save and loading class implements both a save and loading method. The pipeline is easily reloaded using the
method. The pipeline can easily be re-loaded using the [`~DiffusionPipeline.from_pretrained`] class method. [`~DiffusionPipeline.from_pretrained`] class method.
Arguments: Arguments:
save_directory (`str` or `os.PathLike`): save_directory (`str` or `os.PathLike`):
Directory to which to save. Will be created if it doesn't exist. Directory to save a pipeline to. Will be created if it doesn't exist.
safe_serialization (`bool`, *optional*, defaults to `False`): safe_serialization (`bool`, *optional*, defaults to `False`):
Whether to save the model using `safetensors` or the traditional PyTorch way (that uses `pickle`). Whether to save the model using `safetensors` or the traditional PyTorch way with `pickle`.
variant (`str`, *optional*): variant (`str`, *optional*):
If specified, weights are saved in the format pytorch_model.<variant>.bin. If specified, weights are saved in the format `pytorch_model.<variant>.bin`.
""" """
model_index_dict = dict(self.config) model_index_dict = dict(self.config)
model_index_dict.pop("_class_name", None) model_index_dict.pop("_class_name", None)
...@@ -714,69 +713,51 @@ class DiffusionPipeline(ConfigMixin): ...@@ -714,69 +713,51 @@ class DiffusionPipeline(ConfigMixin):
@classmethod @classmethod
def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.PathLike]], **kwargs): def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.PathLike]], **kwargs):
r""" r"""
Instantiate a PyTorch diffusion pipeline from pre-trained pipeline weights. Instantiate a PyTorch diffusion pipeline from pretrained pipeline weights.
The pipeline is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated). The pipeline is set in evaluation mode (`model.eval()`) by default.
The warning *Weights from XXX not initialized from pretrained model* means that the weights of XXX do not come If you get the error message below, you need to finetune the weights for your downstream task:
pretrained with the rest of the model. It is up to you to train those weights with a downstream fine-tuning
task.
The warning *Weights from XXX not used in YYY* means that the layer XXX is not used by YYY, therefore those ```
weights are discarded. Some weights of UNet2DConditionModel were not initialized from the model checkpoint at runwayml/stable-diffusion-v1-5 and are newly initialized because the shapes did not match:
- conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 9, 3, 3]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
```
Parameters: Parameters:
pretrained_model_name_or_path (`str` or `os.PathLike`, *optional*): pretrained_model_name_or_path (`str` or `os.PathLike`, *optional*):
Can be either: Can be either:
- A string, the *repo id* of a pretrained pipeline hosted inside a model repo on - A string, the *repo id* (for example `CompVis/ldm-text2im-large-256`) of a pretrained pipeline
https://huggingface.co/ Valid repo ids have to be located under a user or organization name, like hosted on the Hub.
`CompVis/ldm-text2im-large-256`. - A path to a *directory* (for example `./my_pipeline_directory/`) containing pipeline weights
- A path to a *directory* containing pipeline weights saved using saved using
[`~DiffusionPipeline.save_pretrained`], e.g., `./my_pipeline_directory/`. [`~DiffusionPipeline.save_pretrained`].
torch_dtype (`str` or `torch.dtype`, *optional*): torch_dtype (`str` or `torch.dtype`, *optional*):
Override the default `torch.dtype` and load the model under this dtype. If `"auto"` is passed the dtype Override the default `torch.dtype` and load the model with another dtype. If "auto" is passed, the
will be automatically derived from the model's weights. dtype is automatically derived from the model's weights.
custom_pipeline (`str`, *optional*): custom_pipeline (`str`, *optional*):
<Tip warning={true}> <Tip warning={true}>
This is an experimental feature and is likely to change in the future. 🧪 This is an experimental feature and may change in the future.
</Tip> </Tip>
Can be either: Can be either:
- A string, the *repo id* of a custom pipeline hosted inside a model repo on - A string, the *repo id* (for example `hf-internal-testing/diffusers-dummy-pipeline`) of a custom
https://huggingface.co/. Valid repo ids have to be located under a user or organization name, pipeline hosted on the Hub. The repository must contain a file called pipeline.py that defines
like `hf-internal-testing/diffusers-dummy-pipeline`. the custom pipeline.
<Tip>
It is required that the model repo has a file, called `pipeline.py` that defines the custom
pipeline.
</Tip>
- A string, the *file name* of a community pipeline hosted on GitHub under - A string, the *file name* of a community pipeline hosted on GitHub under
https://github.com/huggingface/diffusers/tree/main/examples/community. Valid file names have to [Community](https://github.com/huggingface/diffusers/tree/main/examples/community). Valid file
match exactly the file name without `.py` located under the above link, *e.g.* names must match the file name and not the pipeline script (`clip_guided_stable_diffusion`
`clip_guided_stable_diffusion`. instead of `clip_guided_stable_diffusion.py`). Community pipelines are always loaded from the
current main branch of GitHub.
<Tip> - A path to a directory (`./my_pipeline_directory/`) containing a custom pipeline. The directory
must contain a file called `pipeline.py` that defines the custom pipeline.
Community pipelines are always loaded from the current `main` branch of GitHub.
</Tip>
- A path to a *directory* containing a custom pipeline, e.g., `./my_pipeline_directory/`.
<Tip>
It is required that the directory has a file, called `pipeline.py` that defines the custom
pipeline.
</Tip>
For more information on how to load and create custom pipelines, please have a look at [Loading and For more information on how to load and create custom pipelines, please have a look at [Loading and
Adding Custom Adding Custom
...@@ -786,78 +767,71 @@ class DiffusionPipeline(ConfigMixin): ...@@ -786,78 +767,71 @@ class DiffusionPipeline(ConfigMixin):
Whether or not to force the (re-)download of the model weights and configuration files, overriding the Whether or not to force the (re-)download of the model weights and configuration files, overriding the
cached versions if they exist. cached versions if they exist.
cache_dir (`Union[str, os.PathLike]`, *optional*): cache_dir (`Union[str, os.PathLike]`, *optional*):
Path to a directory in which a downloaded pretrained model configuration should be cached if the Path to a directory where a downloaded pretrained model configuration is cached if the standard cache
standard cache should not be used. is not used.
resume_download (`bool`, *optional*, defaults to `False`): resume_download (`bool`, *optional*, defaults to `False`):
Whether or not to delete incompletely received files. Will attempt to resume the download if such a Whether or not to resume downloading the model weights and configuration files. If set to `False`, any
file exists. incompletely downloaded files are deleted.
proxies (`Dict[str, str]`, *optional*): proxies (`Dict[str, str]`, *optional*):
A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128', A dictionary of proxy servers to use by protocol or endpoint, for example, `{'http': 'foo.bar:3128',
'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request. 'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request.
output_loading_info(`bool`, *optional*, defaults to `False`): output_loading_info(`bool`, *optional*, defaults to `False`):
Whether or not to also return a dictionary containing missing keys, unexpected keys and error messages. Whether or not to also return a dictionary containing missing keys, unexpected keys and error messages.
local_files_only(`bool`, *optional*, defaults to `False`): local_files_only (`bool`, *optional*, defaults to `False`):
Whether or not to only look at local files (i.e., do not try to download the model). Whether to only load local model weights and configuration files or not. If set to `True`, the model
won't be downloaded from the Hub.
use_auth_token (`str` or *bool*, *optional*): use_auth_token (`str` or *bool*, *optional*):
The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated The token to use as HTTP bearer authorization for remote files. If `True`, the token generated from
when running `huggingface-cli login` (stored in `~/.huggingface`). `diffusers-cli login` (stored in `~/.huggingface`) is used.
revision (`str`, *optional*, defaults to `"main"`): revision (`str`, *optional*, defaults to `"main"`):
The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a The specific model version to use. It can be a branch name, a tag name, a commit id, or any identifier
git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any allowed by Git.
identifier allowed by git. custom_revision (`str`, *optional*, defaults to `"main"`):
custom_revision (`str`, *optional*, defaults to `"main"` when loading from the Hub and to local version of `diffusers` when loading from GitHub):
The specific model version to use. It can be a branch name, a tag name, or a commit id similar to The specific model version to use. It can be a branch name, a tag name, or a commit id similar to
`revision` when loading a custom pipeline from the Hub. It can be a diffusers version when loading a `revision` when loading a custom pipeline from the Hub. It can be a 🤗 Diffusers version when loading a
custom pipeline from GitHub. custom pipeline from GitHub, otherwise it defaults to `"main"` when loading from the Hub.
mirror (`str`, *optional*): mirror (`str`, *optional*):
Mirror source to accelerate downloads in China. If you are from China and have an accessibility Mirror source to resolve accessibility issues if you’re downloading a model in China. We do not
problem, you can set this option to resolve it. Note that we do not guarantee the timeliness or safety. guarantee the timeliness or safety of the source, and you should refer to the mirror site for more
Please refer to the mirror site for more information. specify the folder name here. information.
device_map (`str` or `Dict[str, Union[int, str, torch.device]]`, *optional*): device_map (`str` or `Dict[str, Union[int, str, torch.device]]`, *optional*):
A map that specifies where each submodule should go. It doesn't need to be refined to each A map that specifies where each submodule should go. It doesnt need to be defined for each
parameter/buffer name, once a given module name is inside, every submodule of it will be sent to the parameter/buffer name; once a given module name is inside, every submodule of it will be sent to the
same device. same device.
To have Accelerate compute the most optimized `device_map` automatically, set `device_map="auto"`. For Set `device_map="auto"` to have 🤗 Accelerate automatically compute the most optimized `device_map`. For
more information about each option see [designing a device more information about each option see [designing a device
map](https://hf.co/docs/accelerate/main/en/usage_guides/big_modeling#designing-a-device-map). map](https://hf.co/docs/accelerate/main/en/usage_guides/big_modeling#designing-a-device-map).
max_memory (`Dict`, *optional*): max_memory (`Dict`, *optional*):
A dictionary device identifier to maximum memory. Will default to the maximum memory available for each A dictionary device identifier for the maximum memory. Will default to the maximum memory available for
GPU and the available CPU RAM if unset. each GPU and the available CPU RAM if unset.
offload_folder (`str` or `os.PathLike`, *optional*): offload_folder (`str` or `os.PathLike`, *optional*):
If the `device_map` contains any value `"disk"`, the folder where we will offload weights. The path to offload weights if device_map contains the value `"disk"`.
offload_state_dict (`bool`, *optional*): offload_state_dict (`bool`, *optional*):
If `True`, will temporarily offload the CPU state dict to the hard drive to avoid getting out of CPU If `True`, temporarily offloads the CPU state dict to the hard drive to avoid running out of CPU RAM if
RAM if the weight of the CPU state dict + the biggest shard of the checkpoint does not fit. Defaults to the weight of the CPU state dict + the biggest shard of the checkpoint does not fit. Defaults to `True`
`True` when there is some disk offload. when there is some disk offload.
low_cpu_mem_usage (`bool`, *optional*, defaults to `True` if torch version >= 1.9.0 else `False`): low_cpu_mem_usage (`bool`, *optional*, defaults to `True` if torch version >= 1.9.0 else `False`):
Speed up model loading by not initializing the weights and only loading the pre-trained weights. This Speed up model loading only loading the pretrained weights and not initializing the weights. This also
also tries to not use more than 1x model size in CPU memory (including peak memory) while loading the tries to not use more than 1x model size in CPU memory (including peak memory) while loading the model.
model. This is only supported when torch version >= 1.9.0. If you are using an older version of torch, Only supported for PyTorch >= 1.9.0. If you are using an older version of PyTorch, setting this
setting this argument to `True` will raise an error. argument to `True` will raise an error.
use_safetensors (`bool`, *optional*, defaults to `None`): use_safetensors (`bool`, *optional*, defaults to `None`):
If set to `None`, the pipeline will load the `safetensors` weights if they're available **and** if the If set to `None`, the safetensors weights are downloaded if they're available **and** if the
`safetensors` library is installed. If set to `True`, the pipeline will forcibly load the models from safetensors library is installed. If set to `True`, the model is forcibly loaded from safetensors
`safetensors` weights. If set to `False` the pipeline will *not* use `safetensors`. weights. If set to `False`, safetensors weights are not loaded.
kwargs (remaining dictionary of keyword arguments, *optional*): kwargs (remaining dictionary of keyword arguments, *optional*):
Can be used to overwrite load - and saveable variables - *i.e.* the pipeline components - of the Can be used to overwrite load and saveable variables (the pipeline components of the specific pipeline
specific pipeline class. The overwritten components are then directly passed to the pipelines class). The overwritten components are passed directly to the pipelines `__init__` method. See example
`__init__` method. See example below for more information. below for more information.
variant (`str`, *optional*): variant (`str`, *optional*):
If specified load weights from `variant` filename, *e.g.* pytorch_model.<variant>.bin. `variant` is Load weights from a specified variant filename such as `"fp16"` or `"ema"`. This is ignored when
ignored when using `from_flax`. loading `from_flax`.
<Tip>
It is required to be logged in (`huggingface-cli login`) when you want to use private or [gated
models](https://huggingface.co/docs/hub/models-gated#gated-models), *e.g.* `"runwayml/stable-diffusion-v1-5"`
</Tip>
<Tip> <Tip>
Activate the special ["offline-mode"](https://huggingface.co/diffusers/installation.html#offline-mode) to use To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with
this method in a firewalled environment. `huggingface-cli login`.
</Tip> </Tip>
...@@ -1108,12 +1082,12 @@ class DiffusionPipeline(ConfigMixin): ...@@ -1108,12 +1082,12 @@ class DiffusionPipeline(ConfigMixin):
Parameters: Parameters:
pretrained_model_name (`str` or `os.PathLike`, *optional*): pretrained_model_name (`str` or `os.PathLike`, *optional*):
A string, the repository id (for example `CompVis/ldm-text2im-large-256`) of a pretrained pipeline A string, the *repository id* (for example `CompVis/ldm-text2im-large-256`) of a pretrained pipeline
hosted on the Hub. hosted on the Hub.
custom_pipeline (`str`, *optional*): custom_pipeline (`str`, *optional*):
Can be either: Can be either:
- A string, the repository id (for example `CompVis/ldm-text2im-large-256`) of a pretrained - A string, the *repository id* (for example `CompVis/ldm-text2im-large-256`) of a pretrained
pipeline hosted on the Hub. The repository must contain a file called `pipeline.py` that defines pipeline hosted on the Hub. The repository must contain a file called `pipeline.py` that defines
the custom pipeline. the custom pipeline.
...@@ -1139,27 +1113,26 @@ class DiffusionPipeline(ConfigMixin): ...@@ -1139,27 +1113,26 @@ class DiffusionPipeline(ConfigMixin):
Whether or not to force the (re-)download of the model weights and configuration files, overriding the Whether or not to force the (re-)download of the model weights and configuration files, overriding the
cached versions if they exist. cached versions if they exist.
resume_download (`bool`, *optional*, defaults to `False`): resume_download (`bool`, *optional*, defaults to `False`):
Whether or not to resume downloading the model weights and configuration files. If set to False, any Whether or not to resume downloading the model weights and configuration files. If set to `False`, any
incompletely downloaded files are deleted. incompletely downloaded files are deleted.
proxies (`Dict[str, str]`, *optional*): proxies (`Dict[str, str]`, *optional*):
A dictionary of proxy servers to use by protocol or endpoint, for example, `{'http': 'foo.bar:3128', A dictionary of proxy servers to use by protocol or endpoint, for example, `{'http': 'foo.bar:3128',
'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request. 'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request.
output_loading_info(`bool`, *optional*, defaults to `False`): output_loading_info(`bool`, *optional*, defaults to `False`):
Whether or not to also return a dictionary containing missing keys, unexpected keys and error messages. Whether or not to also return a dictionary containing missing keys, unexpected keys and error messages.
local_files_only(`bool`, *optional*, defaults to `False`): local_files_only (`bool`, *optional*, defaults to `False`):
Whether to only load local model weights and configuration files or not. If set to True, the model Whether to only load local model weights and configuration files or not. If set to `True`, the model
wont be downloaded from the Hub. won't be downloaded from the Hub.
use_auth_token (`str` or *bool*, *optional*): use_auth_token (`str` or *bool*, *optional*):
The token to use as HTTP bearer authorization for remote files. If `True`, the token generated from The token to use as HTTP bearer authorization for remote files. If `True`, the token generated from
`diffusers-cli login` (stored in `~/.huggingface`) is used. `diffusers-cli login` (stored in `~/.huggingface`) is used.
revision (`str`, *optional*, defaults to `"main"`): revision (`str`, *optional*, defaults to `"main"`):
The specific model version to use. It can be a branch name, a tag name, a commit id, or any identifier The specific model version to use. It can be a branch name, a tag name, a commit id, or any identifier
allowed by Git. allowed by Git.
custom_revision (`str`, *optional*, defaults to `"main"` when loading from the Hub and to local version of custom_revision (`str`, *optional*, defaults to `"main"`):
`diffusers` when loading from GitHub):
The specific model version to use. It can be a branch name, a tag name, or a commit id similar to The specific model version to use. It can be a branch name, a tag name, or a commit id similar to
`revision` when loading a custom pipeline from the Hub. It can be a diffusers version when loading a `revision` when loading a custom pipeline from the Hub. It can be a 🤗 Diffusers version when loading a
custom pipeline from GitHub. custom pipeline from GitHub, otherwise it defaults to `"main"` when loading from the Hub.
mirror (`str`, *optional*): mirror (`str`, *optional*):
Mirror source to resolve accessibility issues if you're downloading a model in China. We do not Mirror source to resolve accessibility issues if you're downloading a model in China. We do not
guarantee the timeliness or safety of the source, and you should refer to the mirror site for more guarantee the timeliness or safety of the source, and you should refer to the mirror site for more
...@@ -1365,9 +1338,11 @@ class DiffusionPipeline(ConfigMixin): ...@@ -1365,9 +1338,11 @@ class DiffusionPipeline(ConfigMixin):
@property @property
def components(self) -> Dict[str, Any]: def components(self) -> Dict[str, Any]:
r""" r"""
The `self.components` property can be useful to run different pipelines with the same weights and The `self.components` property can be useful to run different pipelines with the same weights and
configurations to not have to re-allocate memory. configurations without reallocating additional memory.
Returns (`dict`):
A dictionary containing all the modules needed to initialize the pipeline.
Examples: Examples:
...@@ -1382,9 +1357,6 @@ class DiffusionPipeline(ConfigMixin): ...@@ -1382,9 +1357,6 @@ class DiffusionPipeline(ConfigMixin):
>>> img2img = StableDiffusionImg2ImgPipeline(**text2img.components) >>> img2img = StableDiffusionImg2ImgPipeline(**text2img.components)
>>> inpaint = StableDiffusionInpaintPipeline(**text2img.components) >>> inpaint = StableDiffusionInpaintPipeline(**text2img.components)
``` ```
Returns:
A dictionary containing all the modules needed to initialize the pipeline.
""" """
expected_modules, optional_parameters = self._get_signature_keys(self) expected_modules, optional_parameters = self._get_signature_keys(self)
components = { components = {
...@@ -1402,7 +1374,7 @@ class DiffusionPipeline(ConfigMixin): ...@@ -1402,7 +1374,7 @@ class DiffusionPipeline(ConfigMixin):
@staticmethod @staticmethod
def numpy_to_pil(images): def numpy_to_pil(images):
""" """
Convert a numpy image or a batch of images to a PIL image. Convert a NumPy image or a batch of images to a PIL image.
""" """
return numpy_to_pil(images) return numpy_to_pil(images)
...@@ -1426,13 +1398,17 @@ class DiffusionPipeline(ConfigMixin): ...@@ -1426,13 +1398,17 @@ class DiffusionPipeline(ConfigMixin):
def enable_xformers_memory_efficient_attention(self, attention_op: Optional[Callable] = None): def enable_xformers_memory_efficient_attention(self, attention_op: Optional[Callable] = None):
r""" r"""
Enable memory efficient attention as implemented in xformers. Enable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/).
When this option is enabled, you should observe lower GPU memory usage and a potential speed up during
inference. Speed up during training is not guaranteed.
When this option is enabled, you should observe lower GPU memory usage and a potential speed up at inference <Tip warning={true}>
time. Speed up at training time is not guaranteed.
Warning: When Memory Efficient Attention and Sliced attention are both enabled, the Memory Efficient Attention ⚠️ When memory efficient attention and sliced attention are both enabled, memory efficient attention takes
is used. precedent.
</Tip>
Parameters: Parameters:
attention_op (`Callable`, *optional*): attention_op (`Callable`, *optional*):
...@@ -1458,7 +1434,7 @@ class DiffusionPipeline(ConfigMixin): ...@@ -1458,7 +1434,7 @@ class DiffusionPipeline(ConfigMixin):
def disable_xformers_memory_efficient_attention(self): def disable_xformers_memory_efficient_attention(self):
r""" r"""
Disable memory efficient attention as implemented in xformers. Disable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/).
""" """
self.set_use_memory_efficient_attention_xformers(False) self.set_use_memory_efficient_attention_xformers(False)
...@@ -1486,8 +1462,8 @@ class DiffusionPipeline(ConfigMixin): ...@@ -1486,8 +1462,8 @@ class DiffusionPipeline(ConfigMixin):
r""" r"""
Enable sliced attention computation. Enable sliced attention computation.
When this option is enabled, the attention module will split the input tensor in slices, to compute attention When this option is enabled, the attention module splits the input tensor in slices to compute attention in
in several steps. This is useful to save some memory in exchange for a small speed decrease. several steps. This is useful to save some memory in exchange for a small speed decrease.
Args: Args:
slice_size (`str` or `int`, *optional*, defaults to `"auto"`): slice_size (`str` or `int`, *optional*, defaults to `"auto"`):
...@@ -1500,8 +1476,8 @@ class DiffusionPipeline(ConfigMixin): ...@@ -1500,8 +1476,8 @@ class DiffusionPipeline(ConfigMixin):
def disable_attention_slicing(self): def disable_attention_slicing(self):
r""" r"""
Disable sliced attention computation. If `enable_attention_slicing` was previously invoked, this method will go Disable sliced attention computation. If `enable_attention_slicing` was previously called, attention is
back to computing attention in one step. computed in one step.
""" """
# set slice_size = `None` to disable `attention slicing` # set slice_size = `None` to disable `attention slicing`
self.enable_attention_slicing(None) self.enable_attention_slicing(None)
......
...@@ -68,11 +68,11 @@ class ImageTextPipelineOutput(BaseOutput): ...@@ -68,11 +68,11 @@ class ImageTextPipelineOutput(BaseOutput):
Args: Args:
images (`List[PIL.Image.Image]` or `np.ndarray`) images (`List[PIL.Image.Image]` or `np.ndarray`)
List of denoised PIL images of length `batch_size` or numpy array of shape `(batch_size, height, width, List of denoised PIL images of length `batch_size` or NumPy array of shape `(batch_size, height, width,
num_channels)`. PIL images or numpy array present the denoised images of the diffusion pipeline. num_channels)`.
text (`List[str]` or `List[List[str]]`) text (`List[str]` or `List[List[str]]`)
List of generated text strings of length `batch_size` or a list of list of strings whose outer list has List of generated text strings of length `batch_size` or a list of list of strings whose outer list has
length `batch_size`. Text generated by the diffusion pipeline. length `batch_size`.
""" """
images: Optional[Union[List[PIL.Image.Image], np.ndarray]] images: Optional[Union[List[PIL.Image.Image], np.ndarray]]
......
...@@ -124,22 +124,19 @@ def get_logger(name: Optional[str] = None) -> logging.Logger: ...@@ -124,22 +124,19 @@ def get_logger(name: Optional[str] = None) -> logging.Logger:
def get_verbosity() -> int: def get_verbosity() -> int:
""" """
Return the current level for the 🤗 Diffusers' root logger as an int. Return the current level for the 🤗 Diffusers' root logger as an `int`.
Returns: Returns:
`int`: The logging level. `int`:
Logging level integers which can be one of:
<Tip> - `50`: `diffusers.logging.CRITICAL` or `diffusers.logging.FATAL`
- `40`: `diffusers.logging.ERROR`
- `30`: `diffusers.logging.WARNING` or `diffusers.logging.WARN`
- `20`: `diffusers.logging.INFO`
- `10`: `diffusers.logging.DEBUG`
🤗 Diffusers has following logging levels: """
- 50: `diffusers.logging.CRITICAL` or `diffusers.logging.FATAL`
- 40: `diffusers.logging.ERROR`
- 30: `diffusers.logging.WARNING` or `diffusers.logging.WARN`
- 20: `diffusers.logging.INFO`
- 10: `diffusers.logging.DEBUG`
</Tip>"""
_configure_library_root_logger() _configure_library_root_logger()
return _get_library_root_logger().getEffectiveLevel() return _get_library_root_logger().getEffectiveLevel()
...@@ -151,7 +148,7 @@ def set_verbosity(verbosity: int) -> None: ...@@ -151,7 +148,7 @@ def set_verbosity(verbosity: int) -> None:
Args: Args:
verbosity (`int`): verbosity (`int`):
Logging level, e.g., one of: Logging level which can be one of:
- `diffusers.logging.CRITICAL` or `diffusers.logging.FATAL` - `diffusers.logging.CRITICAL` or `diffusers.logging.FATAL`
- `diffusers.logging.ERROR` - `diffusers.logging.ERROR`
...@@ -185,7 +182,7 @@ def set_verbosity_error(): ...@@ -185,7 +182,7 @@ def set_verbosity_error():
def disable_default_handler() -> None: def disable_default_handler() -> None:
"""Disable the default handler of the HuggingFace Diffusers' root logger.""" """Disable the default handler of the 🤗 Diffusers' root logger."""
_configure_library_root_logger() _configure_library_root_logger()
...@@ -194,7 +191,7 @@ def disable_default_handler() -> None: ...@@ -194,7 +191,7 @@ def disable_default_handler() -> None:
def enable_default_handler() -> None: def enable_default_handler() -> None:
"""Enable the default handler of the HuggingFace Diffusers' root logger.""" """Enable the default handler of the 🤗 Diffusers' root logger."""
_configure_library_root_logger() _configure_library_root_logger()
...@@ -241,9 +238,9 @@ def enable_propagation() -> None: ...@@ -241,9 +238,9 @@ def enable_propagation() -> None:
def enable_explicit_format() -> None: def enable_explicit_format() -> None:
""" """
Enable explicit formatting for every HuggingFace Diffusers' logger. The explicit formatter is as follows: Enable explicit formatting for every 🤗 Diffusers' logger. The explicit formatter is as follows:
``` ```
[LEVELNAME|FILENAME|LINE NUMBER] TIME >> MESSAGE [LEVELNAME|FILENAME|LINE NUMBER] TIME >> MESSAGE
``` ```
All handlers currently bound to the root logger are affected by this method. All handlers currently bound to the root logger are affected by this method.
""" """
...@@ -256,7 +253,7 @@ def enable_explicit_format() -> None: ...@@ -256,7 +253,7 @@ def enable_explicit_format() -> None:
def reset_format() -> None: def reset_format() -> None:
""" """
Resets the formatting for HuggingFace Diffusers' loggers. Resets the formatting for 🤗 Diffusers' loggers.
All handlers currently bound to the root logger are affected by this method. All handlers currently bound to the root logger are affected by this method.
""" """
......
...@@ -41,12 +41,12 @@ class BaseOutput(OrderedDict): ...@@ -41,12 +41,12 @@ class BaseOutput(OrderedDict):
""" """
Base class for all model outputs as dataclass. Has a `__getitem__` that allows indexing by integer or slice (like a Base class for all model outputs as dataclass. Has a `__getitem__` that allows indexing by integer or slice (like a
tuple) or strings (like a dictionary) that will ignore the `None` attributes. Otherwise behaves like a regular tuple) or strings (like a dictionary) that will ignore the `None` attributes. Otherwise behaves like a regular
python dictionary. Python dictionary.
<Tip warning={true}> <Tip warning={true}>
You can't unpack a `BaseOutput` directly. Use the [`~utils.BaseOutput.to_tuple`] method to convert it to a tuple You can't unpack a [`BaseOutput`] directly. Use the [`~utils.BaseOutput.to_tuple`] method to convert it to a tuple
before. first.
</Tip> </Tip>
""" """
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment