Unverified Commit 98730c5d authored by Tolga Cangöz's avatar Tolga Cangöz Committed by GitHub
Browse files

Errata (#8322)

* Fix typos

* Trim trailing whitespaces

* Remove a trailing whitespace

* chore: Update MarigoldDepthPipeline checkpoint to prs-eth/marigold-lcm-v1-0

* Revert "chore: Update MarigoldDepthPipeline checkpoint to prs-eth/marigold-lcm-v1-0"

This reverts commit fd742b30b4258106008a6af4d0dd4664904f8595.

* pokemon -> naruto

* `DPMSolverMultistep` -> `DPMSolverMultistepScheduler`

* Improve Markdown stylization

* Improve style

* Improve style

* Refactor pipeline variable names for consistency

* up style
parent 7ebd3594
...@@ -16,7 +16,7 @@ aMUSEd was introduced in [aMUSEd: An Open MUSE Reproduction](https://huggingface ...@@ -16,7 +16,7 @@ aMUSEd was introduced in [aMUSEd: An Open MUSE Reproduction](https://huggingface
Amused is a lightweight text to image model based off of the [MUSE](https://arxiv.org/abs/2301.00704) architecture. Amused is particularly useful in applications that require a lightweight and fast model such as generating many images quickly at once. Amused is a lightweight text to image model based off of the [MUSE](https://arxiv.org/abs/2301.00704) architecture. Amused is particularly useful in applications that require a lightweight and fast model such as generating many images quickly at once.
Amused is a vqvae token based transformer that can generate an image in fewer forward passes than many diffusion models. In contrast with muse, it uses the smaller text encoder CLIP-L/14 instead of t5-xxl. Due to its small parameter count and few forward pass generation process, amused can generate many images quickly. This benefit is seen particularly at larger batch sizes. Amused is a vqvae token based transformer that can generate an image in fewer forward passes than many diffusion models. In contrast with muse, it uses the smaller text encoder CLIP-L/14 instead of t5-xxl. Due to its small parameter count and few forward pass generation process, amused can generate many images quickly. This benefit is seen particularly at larger batch sizes.
The abstract from the paper is: The abstract from the paper is:
......
...@@ -11,12 +11,12 @@ specific language governing permissions and limitations under the License. ...@@ -11,12 +11,12 @@ specific language governing permissions and limitations under the License.
Kandinsky 3 is created by [Vladimir Arkhipkin](https://github.com/oriBetelgeuse),[Anastasia Maltseva](https://github.com/NastyaMittseva),[Igor Pavlov](https://github.com/boomb0om),[Andrei Filatov](https://github.com/anvilarth),[Arseniy Shakhmatov](https://github.com/cene555),[Andrey Kuznetsov](https://github.com/kuznetsoffandrey),[Denis Dimitrov](https://github.com/denndimitrov), [Zein Shaheen](https://github.com/zeinsh) Kandinsky 3 is created by [Vladimir Arkhipkin](https://github.com/oriBetelgeuse),[Anastasia Maltseva](https://github.com/NastyaMittseva),[Igor Pavlov](https://github.com/boomb0om),[Andrei Filatov](https://github.com/anvilarth),[Arseniy Shakhmatov](https://github.com/cene555),[Andrey Kuznetsov](https://github.com/kuznetsoffandrey),[Denis Dimitrov](https://github.com/denndimitrov), [Zein Shaheen](https://github.com/zeinsh)
The description from it's Github page: The description from it's Github page:
*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.* *Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.*
Its architecture includes 3 main components: Its architecture includes 3 main components:
1. [FLAN-UL2](https://huggingface.co/google/flan-ul2), which is an encoder decoder model based on the T5 architecture. 1. [FLAN-UL2](https://huggingface.co/google/flan-ul2), which is an encoder decoder model based on the T5 architecture.
2. New U-Net architecture featuring BigGAN-deep blocks doubles depth while maintaining the same number of parameters. 2. New U-Net architecture featuring BigGAN-deep blocks doubles depth while maintaining the same number of parameters.
3. Sber-MoVQGAN is a decoder proven to have superior results in image restoration. 3. Sber-MoVQGAN is a decoder proven to have superior results in image restoration.
......
...@@ -25,11 +25,11 @@ You can find additional information about LEDITS++ on the [project page](https:/ ...@@ -25,11 +25,11 @@ You can find additional information about LEDITS++ on the [project page](https:/
</Tip> </Tip>
<Tip warning={true}> <Tip warning={true}>
Due to some backward compatability issues with the current diffusers implementation of [`~schedulers.DPMSolverMultistepScheduler`] this implementation of LEdits++ can no longer guarantee perfect inversion. Due to some backward compatability issues with the current diffusers implementation of [`~schedulers.DPMSolverMultistepScheduler`] this implementation of LEdits++ can no longer guarantee perfect inversion.
This issue is unlikely to have any noticeable effects on applied use-cases. However, we provide an alternative implementation that guarantees perfect inversion in a dedicated [GitHub repo](https://github.com/ml-research/ledits_pp). This issue is unlikely to have any noticeable effects on applied use-cases. However, we provide an alternative implementation that guarantees perfect inversion in a dedicated [GitHub repo](https://github.com/ml-research/ledits_pp).
</Tip> </Tip>
We provide two distinct pipelines based on different pre-trained models. We provide two distinct pipelines based on different pre-trained models.
## LEditsPPPipelineStableDiffusion ## LEditsPPPipelineStableDiffusion
[[autodoc]] pipelines.ledits_pp.LEditsPPPipelineStableDiffusion [[autodoc]] pipelines.ledits_pp.LEditsPPPipelineStableDiffusion
......
...@@ -14,10 +14,10 @@ specific language governing permissions and limitations under the License. ...@@ -14,10 +14,10 @@ specific language governing permissions and limitations under the License.
![marigold](https://marigoldmonodepth.github.io/images/teaser_collage_compressed.jpg) ![marigold](https://marigoldmonodepth.github.io/images/teaser_collage_compressed.jpg)
Marigold was proposed in [Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation](https://huggingface.co/papers/2312.02145), a CVPR 2024 Oral paper by [Bingxin Ke](http://www.kebingxin.com/), [Anton Obukhov](https://www.obukhov.ai/), [Shengyu Huang](https://shengyuh.github.io/), [Nando Metzger](https://nandometzger.github.io/), [Rodrigo Caye Daudt](https://rcdaudt.github.io/), and [Konrad Schindler](https://scholar.google.com/citations?user=FZuNgqIAAAAJ&hl=en). Marigold was proposed in [Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation](https://huggingface.co/papers/2312.02145), a CVPR 2024 Oral paper by [Bingxin Ke](http://www.kebingxin.com/), [Anton Obukhov](https://www.obukhov.ai/), [Shengyu Huang](https://shengyuh.github.io/), [Nando Metzger](https://nandometzger.github.io/), [Rodrigo Caye Daudt](https://rcdaudt.github.io/), and [Konrad Schindler](https://scholar.google.com/citations?user=FZuNgqIAAAAJ&hl=en).
The idea is to repurpose the rich generative prior of Text-to-Image Latent Diffusion Models (LDMs) for traditional computer vision tasks. The idea is to repurpose the rich generative prior of Text-to-Image Latent Diffusion Models (LDMs) for traditional computer vision tasks.
Initially, this idea was explored to fine-tune Stable Diffusion for Monocular Depth Estimation, as shown in the teaser above. Initially, this idea was explored to fine-tune Stable Diffusion for Monocular Depth Estimation, as shown in the teaser above.
Later, Later,
- [Tianfu Wang](https://tianfwang.github.io/) trained the first Latent Consistency Model (LCM) of Marigold, which unlocked fast single-step inference; - [Tianfu Wang](https://tianfwang.github.io/) trained the first Latent Consistency Model (LCM) of Marigold, which unlocked fast single-step inference;
- [Kevin Qu](https://www.linkedin.com/in/kevin-qu-b3417621b/?locale=en_US) extended the approach to Surface Normals Estimation; - [Kevin Qu](https://www.linkedin.com/in/kevin-qu-b3417621b/?locale=en_US) extended the approach to Surface Normals Estimation;
- [Anton Obukhov](https://www.obukhov.ai/) contributed the pipelines and documentation into diffusers (enabled and supported by [YiYi Xu](https://yiyixuxu.github.io/) and [Sayak Paul](https://sayak.dev/)). - [Anton Obukhov](https://www.obukhov.ai/) contributed the pipelines and documentation into diffusers (enabled and supported by [YiYi Xu](https://yiyixuxu.github.io/) and [Sayak Paul](https://sayak.dev/)).
...@@ -28,7 +28,7 @@ The abstract from the paper is: ...@@ -28,7 +28,7 @@ The abstract from the paper is:
## Available Pipelines ## Available Pipelines
Each pipeline supports one Computer Vision task, which takes an input RGB image as input and produces a *prediction* of the modality of interest, such as a depth map of the input image. Each pipeline supports one Computer Vision task, which takes an input RGB image as input and produces a *prediction* of the modality of interest, such as a depth map of the input image.
Currently, the following tasks are implemented: Currently, the following tasks are implemented:
| Pipeline | Predicted Modalities | Demos | | Pipeline | Predicted Modalities | Demos |
...@@ -39,7 +39,7 @@ Currently, the following tasks are implemented: ...@@ -39,7 +39,7 @@ Currently, the following tasks are implemented:
## Available Checkpoints ## Available Checkpoints
The original checkpoints can be found under the [PRS-ETH](https://huggingface.co/prs-eth/) Hugging Face organization. The original checkpoints can be found under the [PRS-ETH](https://huggingface.co/prs-eth/) Hugging Face organization.
<Tip> <Tip>
...@@ -49,11 +49,11 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) ...@@ -49,11 +49,11 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
<Tip warning={true}> <Tip warning={true}>
Marigold pipelines were designed and tested only with `DDIMScheduler` and `LCMScheduler`. Marigold pipelines were designed and tested only with `DDIMScheduler` and `LCMScheduler`.
Depending on the scheduler, the number of inference steps required to get reliable predictions varies, and there is no universal value that works best across schedulers. Depending on the scheduler, the number of inference steps required to get reliable predictions varies, and there is no universal value that works best across schedulers.
Because of that, the default value of `num_inference_steps` in the `__call__` method of the pipeline is set to `None` (see the API reference). Because of that, the default value of `num_inference_steps` in the `__call__` method of the pipeline is set to `None` (see the API reference).
Unless set explicitly, its value will be taken from the checkpoint configuration `model_index.json`. Unless set explicitly, its value will be taken from the checkpoint configuration `model_index.json`.
This is done to ensure high-quality predictions when calling the pipeline with just the `image` argument. This is done to ensure high-quality predictions when calling the pipeline with just the `image` argument.
</Tip> </Tip>
......
...@@ -37,7 +37,7 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.m ...@@ -37,7 +37,7 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.m
## Inference with under 8GB GPU VRAM ## Inference with under 8GB GPU VRAM
Run the [`PixArtAlphaPipeline`] with under 8GB GPU VRAM by loading the text encoder in 8-bit precision. Let's walk through a full-fledged example. Run the [`PixArtAlphaPipeline`] with under 8GB GPU VRAM by loading the text encoder in 8-bit precision. Let's walk through a full-fledged example.
First, install the [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) library: First, install the [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) library:
...@@ -75,10 +75,10 @@ with torch.no_grad(): ...@@ -75,10 +75,10 @@ with torch.no_grad():
prompt_embeds, prompt_attention_mask, negative_embeds, negative_prompt_attention_mask = pipe.encode_prompt(prompt) prompt_embeds, prompt_attention_mask, negative_embeds, negative_prompt_attention_mask = pipe.encode_prompt(prompt)
``` ```
Since text embeddings have been computed, remove the `text_encoder` and `pipe` from the memory, and free up som GPU VRAM: Since text embeddings have been computed, remove the `text_encoder` and `pipe` from the memory, and free up some GPU VRAM:
```python ```python
import gc import gc
def flush(): def flush():
gc.collect() gc.collect()
...@@ -99,7 +99,7 @@ pipe = PixArtAlphaPipeline.from_pretrained( ...@@ -99,7 +99,7 @@ pipe = PixArtAlphaPipeline.from_pretrained(
).to("cuda") ).to("cuda")
latents = pipe( latents = pipe(
negative_prompt=None, negative_prompt=None,
prompt_embeds=prompt_embeds, prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_embeds, negative_prompt_embeds=negative_embeds,
prompt_attention_mask=prompt_attention_mask, prompt_attention_mask=prompt_attention_mask,
...@@ -146,4 +146,3 @@ While loading the `text_encoder`, you set `load_in_8bit` to `True`. You could al ...@@ -146,4 +146,3 @@ While loading the `text_encoder`, you set `load_in_8bit` to `True`. You could al
[[autodoc]] PixArtAlphaPipeline [[autodoc]] PixArtAlphaPipeline
- all - all
- __call__ - __call__
\ No newline at end of file
...@@ -39,7 +39,7 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) ...@@ -39,7 +39,7 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
## Inference with under 8GB GPU VRAM ## Inference with under 8GB GPU VRAM
Run the [`PixArtSigmaPipeline`] with under 8GB GPU VRAM by loading the text encoder in 8-bit precision. Let's walk through a full-fledged example. Run the [`PixArtSigmaPipeline`] with under 8GB GPU VRAM by loading the text encoder in 8-bit precision. Let's walk through a full-fledged example.
First, install the [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) library: First, install the [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) library:
...@@ -59,7 +59,6 @@ text_encoder = T5EncoderModel.from_pretrained( ...@@ -59,7 +59,6 @@ text_encoder = T5EncoderModel.from_pretrained(
subfolder="text_encoder", subfolder="text_encoder",
load_in_8bit=True, load_in_8bit=True,
device_map="auto", device_map="auto",
) )
pipe = PixArtSigmaPipeline.from_pretrained( pipe = PixArtSigmaPipeline.from_pretrained(
"PixArt-alpha/PixArt-Sigma-XL-2-1024-MS", "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
...@@ -77,10 +76,10 @@ with torch.no_grad(): ...@@ -77,10 +76,10 @@ with torch.no_grad():
prompt_embeds, prompt_attention_mask, negative_embeds, negative_prompt_attention_mask = pipe.encode_prompt(prompt) prompt_embeds, prompt_attention_mask, negative_embeds, negative_prompt_attention_mask = pipe.encode_prompt(prompt)
``` ```
Since text embeddings have been computed, remove the `text_encoder` and `pipe` from the memory, and free up som GPU VRAM: Since text embeddings have been computed, remove the `text_encoder` and `pipe` from the memory, and free up some GPU VRAM:
```python ```python
import gc import gc
def flush(): def flush():
gc.collect() gc.collect()
...@@ -101,7 +100,7 @@ pipe = PixArtSigmaPipeline.from_pretrained( ...@@ -101,7 +100,7 @@ pipe = PixArtSigmaPipeline.from_pretrained(
).to("cuda") ).to("cuda")
latents = pipe( latents = pipe(
negative_prompt=None, negative_prompt=None,
prompt_embeds=prompt_embeds, prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_embeds, negative_prompt_embeds=negative_embeds,
prompt_attention_mask=prompt_attention_mask, prompt_attention_mask=prompt_attention_mask,
...@@ -148,4 +147,3 @@ While loading the `text_encoder`, you set `load_in_8bit` to `True`. You could al ...@@ -148,4 +147,3 @@ While loading the `text_encoder`, you set `load_in_8bit` to `True`. You could al
[[autodoc]] PixArtSigmaPipeline [[autodoc]] PixArtSigmaPipeline
- all - all
- __call__ - __call__
\ No newline at end of file
...@@ -177,7 +177,7 @@ inpaint = StableDiffusionInpaintPipeline(**text2img.components) ...@@ -177,7 +177,7 @@ inpaint = StableDiffusionInpaintPipeline(**text2img.components)
The Stable Diffusion pipelines are automatically supported in [Gradio](https://github.com/gradio-app/gradio/), a library that makes creating beautiful and user-friendly machine learning apps on the web a breeze. First, make sure you have Gradio installed: The Stable Diffusion pipelines are automatically supported in [Gradio](https://github.com/gradio-app/gradio/), a library that makes creating beautiful and user-friendly machine learning apps on the web a breeze. First, make sure you have Gradio installed:
``` ```sh
pip install -U gradio pip install -U gradio
``` ```
...@@ -209,4 +209,4 @@ gr.Interface.from_pipeline(pipe).launch() ...@@ -209,4 +209,4 @@ gr.Interface.from_pipeline(pipe).launch()
``` ```
By default, the web demo runs on a local server. If you'd like to share it with others, you can generate a temporary public By default, the web demo runs on a local server. If you'd like to share it with others, you can generate a temporary public
link by setting `share=True` in `launch()`. Or, you can host your demo on [Hugging Face Spaces](https://huggingface.co/spaces)https://huggingface.co/spaces for a permanent link. link by setting `share=True` in `launch()`. Or, you can host your demo on [Hugging Face Spaces](https://huggingface.co/spaces)https://huggingface.co/spaces for a permanent link.
\ No newline at end of file \ No newline at end of file
...@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License. ...@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# EDMDPMSolverMultistepScheduler # EDMDPMSolverMultistepScheduler
`EDMDPMSolverMultistepScheduler` is a [Karras formulation](https://huggingface.co/papers/2206.00364) of `DPMSolverMultistep`, a multistep scheduler from [DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps](https://huggingface.co/papers/2206.00927) and [DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models](https://huggingface.co/papers/2211.01095) by Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. `EDMDPMSolverMultistepScheduler` is a [Karras formulation](https://huggingface.co/papers/2206.00364) of `DPMSolverMultistepScheduler`, a multistep scheduler from [DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps](https://huggingface.co/papers/2206.00927) and [DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models](https://huggingface.co/papers/2211.01095) by Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu.
DPMSolver (and the improved version DPMSolver++) is a fast dedicated high-order solver for diffusion ODEs with convergence order guarantee. Empirically, DPMSolver sampling with only 20 steps can generate high-quality DPMSolver (and the improved version DPMSolver++) is a fast dedicated high-order solver for diffusion ODEs with convergence order guarantee. Empirically, DPMSolver sampling with only 20 steps can generate high-quality
samples, and it can generate quite good samples even in 10 steps. samples, and it can generate quite good samples even in 10 steps.
......
...@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License. ...@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# DPMSolverMultistepScheduler # DPMSolverMultistepScheduler
`DPMSolverMultistep` is a multistep scheduler from [DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps](https://huggingface.co/papers/2206.00927) and [DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models](https://huggingface.co/papers/2211.01095) by Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. `DPMSolverMultistepScheduler` is a multistep scheduler from [DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps](https://huggingface.co/papers/2206.00927) and [DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models](https://huggingface.co/papers/2211.01095) by Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu.
DPMSolver (and the improved version DPMSolver++) is a fast dedicated high-order solver for diffusion ODEs with convergence order guarantee. Empirically, DPMSolver sampling with only 20 steps can generate high-quality DPMSolver (and the improved version DPMSolver++) is a fast dedicated high-order solver for diffusion ODEs with convergence order guarantee. Empirically, DPMSolver sampling with only 20 steps can generate high-quality
samples, and it can generate quite good samples even in 10 steps. samples, and it can generate quite good samples even in 10 steps.
......
...@@ -36,7 +36,7 @@ Then load and enable the [`DeepCacheSDHelper`](https://github.com/horseee/DeepCa ...@@ -36,7 +36,7 @@ Then load and enable the [`DeepCacheSDHelper`](https://github.com/horseee/DeepCa
image = pipe("a photo of an astronaut on a moon").images[0] image = pipe("a photo of an astronaut on a moon").images[0]
``` ```
The `set_params` method accepts two arguments: `cache_interval` and `cache_branch_id`. `cache_interval` means the frequency of feature caching, specified as the number of steps between each cache operation. `cache_branch_id` identifies which branch of the network (ordered from the shallowest to the deepest layer) is responsible for executing the caching processes. The `set_params` method accepts two arguments: `cache_interval` and `cache_branch_id`. `cache_interval` means the frequency of feature caching, specified as the number of steps between each cache operation. `cache_branch_id` identifies which branch of the network (ordered from the shallowest to the deepest layer) is responsible for executing the caching processes.
Opting for a lower `cache_branch_id` or a larger `cache_interval` can lead to faster inference speed at the expense of reduced image quality (ablation experiments of these two hyperparameters can be found in the [paper](https://arxiv.org/abs/2312.00858)). Once those arguments are set, use the `enable` or `disable` methods to activate or deactivate the `DeepCacheSDHelper`. Opting for a lower `cache_branch_id` or a larger `cache_interval` can lead to faster inference speed at the expense of reduced image quality (ablation experiments of these two hyperparameters can be found in the [paper](https://arxiv.org/abs/2312.00858)). Once those arguments are set, use the `enable` or `disable` methods to activate or deactivate the `DeepCacheSDHelper`.
<div class="flex justify-center"> <div class="flex justify-center">
......
...@@ -188,7 +188,7 @@ def latents_to_rgb(latents): ...@@ -188,7 +188,7 @@ def latents_to_rgb(latents):
```py ```py
def decode_tensors(pipe, step, timestep, callback_kwargs): def decode_tensors(pipe, step, timestep, callback_kwargs):
latents = callback_kwargs["latents"] latents = callback_kwargs["latents"]
image = latents_to_rgb(latents) image = latents_to_rgb(latents)
image.save(f"{step}.png") image.save(f"{step}.png")
......
...@@ -138,15 +138,15 @@ Because Marigold's latent space is compatible with the base Stable Diffusion, it ...@@ -138,15 +138,15 @@ Because Marigold's latent space is compatible with the base Stable Diffusion, it
```diff ```diff
import diffusers import diffusers
import torch import torch
pipe = diffusers.MarigoldDepthPipeline.from_pretrained( pipe = diffusers.MarigoldDepthPipeline.from_pretrained(
"prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torch_dtype=torch.float16 "prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torch_dtype=torch.float16
).to("cuda") ).to("cuda")
+ pipe.vae = diffusers.AutoencoderTiny.from_pretrained( + pipe.vae = diffusers.AutoencoderTiny.from_pretrained(
+ "madebyollin/taesd", torch_dtype=torch.float16 + "madebyollin/taesd", torch_dtype=torch.float16
+ ).cuda() + ).cuda()
image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg") image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg")
depth = pipe(image) depth = pipe(image)
``` ```
...@@ -156,13 +156,13 @@ As suggested in [Optimizations](../optimization/torch2.0#torch.compile), adding ...@@ -156,13 +156,13 @@ As suggested in [Optimizations](../optimization/torch2.0#torch.compile), adding
```diff ```diff
import diffusers import diffusers
import torch import torch
pipe = diffusers.MarigoldDepthPipeline.from_pretrained( pipe = diffusers.MarigoldDepthPipeline.from_pretrained(
"prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torch_dtype=torch.float16 "prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torch_dtype=torch.float16
).to("cuda") ).to("cuda")
+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True) + pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg") image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg")
depth = pipe(image) depth = pipe(image)
``` ```
...@@ -208,7 +208,7 @@ model_paper_kwargs = { ...@@ -208,7 +208,7 @@ model_paper_kwargs = {
diffusers.schedulers.LCMScheduler: { diffusers.schedulers.LCMScheduler: {
"num_inference_steps": 4, "num_inference_steps": 4,
"ensemble_size": 5, "ensemble_size": 5,
}, },
} }
image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg") image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg")
...@@ -261,7 +261,7 @@ model_paper_kwargs = { ...@@ -261,7 +261,7 @@ model_paper_kwargs = {
diffusers.schedulers.LCMScheduler: { diffusers.schedulers.LCMScheduler: {
"num_inference_steps": 4, "num_inference_steps": 4,
"ensemble_size": 10, "ensemble_size": 10,
}, },
} }
image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg") image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg")
...@@ -415,7 +415,7 @@ image = diffusers.utils.load_image( ...@@ -415,7 +415,7 @@ image = diffusers.utils.load_image(
pipe = diffusers.MarigoldDepthPipeline.from_pretrained( pipe = diffusers.MarigoldDepthPipeline.from_pretrained(
"prs-eth/marigold-depth-lcm-v1-0", torch_dtype=torch.float16, variant="fp16" "prs-eth/marigold-depth-lcm-v1-0", torch_dtype=torch.float16, variant="fp16"
).to("cuda") ).to(device)
depth_image = pipe(image, generator=generator).prediction depth_image = pipe(image, generator=generator).prediction
depth_image = pipe.image_processor.visualize_depth(depth_image, color_map="binary") depth_image = pipe.image_processor.visualize_depth(depth_image, color_map="binary")
...@@ -423,10 +423,10 @@ depth_image[0].save("motorcycle_controlnet_depth.png") ...@@ -423,10 +423,10 @@ depth_image[0].save("motorcycle_controlnet_depth.png")
controlnet = diffusers.ControlNetModel.from_pretrained( controlnet = diffusers.ControlNetModel.from_pretrained(
"diffusers/controlnet-depth-sdxl-1.0", torch_dtype=torch.float16, variant="fp16" "diffusers/controlnet-depth-sdxl-1.0", torch_dtype=torch.float16, variant="fp16"
).to("cuda") ).to(device)
pipe = diffusers.StableDiffusionXLControlNetPipeline.from_pretrained( pipe = diffusers.StableDiffusionXLControlNetPipeline.from_pretrained(
"SG161222/RealVisXL_V4.0", torch_dtype=torch.float16, variant="fp16", controlnet=controlnet "SG161222/RealVisXL_V4.0", torch_dtype=torch.float16, variant="fp16", controlnet=controlnet
).to("cuda") ).to(device)
pipe.scheduler = diffusers.DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, use_karras_sigmas=True) pipe.scheduler = diffusers.DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, use_karras_sigmas=True)
controlnet_out = pipe( controlnet_out = pipe(
......
...@@ -134,7 +134,7 @@ sigmas = [14.615, 6.315, 3.771, 2.181, 1.342, 0.862, 0.555, 0.380, 0.234, 0.113, ...@@ -134,7 +134,7 @@ sigmas = [14.615, 6.315, 3.771, 2.181, 1.342, 0.862, 0.555, 0.380, 0.234, 0.113,
prompt = "anthropomorphic capybara wearing a suit and working with a computer" prompt = "anthropomorphic capybara wearing a suit and working with a computer"
generator = torch.Generator(device='cuda').manual_seed(123) generator = torch.Generator(device='cuda').manual_seed(123)
image = pipeline( image = pipeline(
prompt=prompt, prompt=prompt,
num_inference_steps=10, num_inference_steps=10,
sigmas=sigmas, sigmas=sigmas,
generator=generator generator=generator
......
...@@ -34,7 +34,7 @@ Stable Diffusion XL은 Dustin Podell, Zion English, Kyle Lacey, Andreas Blattman ...@@ -34,7 +34,7 @@ Stable Diffusion XL은 Dustin Podell, Zion English, Kyle Lacey, Andreas Blattman
SDXL을 사용하기 전에 `transformers`, `accelerate`, `safetensors``invisible_watermark`를 설치하세요. SDXL을 사용하기 전에 `transformers`, `accelerate`, `safetensors``invisible_watermark`를 설치하세요.
다음과 같이 라이브러리를 설치할 수 있습니다: 다음과 같이 라이브러리를 설치할 수 있습니다:
``` ```sh
pip install transformers pip install transformers
pip install accelerate pip install accelerate
pip install safetensors pip install safetensors
...@@ -46,7 +46,7 @@ pip install invisible-watermark>=0.2.0 ...@@ -46,7 +46,7 @@ pip install invisible-watermark>=0.2.0
Stable Diffusion XL로 이미지를 생성할 때 워터마크가 보이지 않도록 추가하는 것을 권장하는데, 이는 다운스트림(downstream) 어플리케이션에서 기계에 합성되었는지를 식별하는데 도움을 줄 수 있습니다. 그렇게 하려면 [invisible_watermark 라이브러리](https://pypi.org/project/invisible-watermark/)를 통해 설치해주세요: Stable Diffusion XL로 이미지를 생성할 때 워터마크가 보이지 않도록 추가하는 것을 권장하는데, 이는 다운스트림(downstream) 어플리케이션에서 기계에 합성되었는지를 식별하는데 도움을 줄 수 있습니다. 그렇게 하려면 [invisible_watermark 라이브러리](https://pypi.org/project/invisible-watermark/)를 통해 설치해주세요:
``` ```sh
pip install invisible-watermark>=0.2.0 pip install invisible-watermark>=0.2.0
``` ```
...@@ -75,11 +75,11 @@ prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" ...@@ -75,11 +75,11 @@ prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt).images[0] image = pipe(prompt=prompt).images[0]
``` ```
### Image-to-image ### Image-to-image
*image-to-image*를 위해 다음과 같이 SDXL을 사용할 수 있습니다: *image-to-image*를 위해 다음과 같이 SDXL을 사용할 수 있습니다:
```py ```py
import torch import torch
from diffusers import StableDiffusionXLImg2ImgPipeline from diffusers import StableDiffusionXLImg2ImgPipeline
from diffusers.utils import load_image from diffusers.utils import load_image
...@@ -99,7 +99,7 @@ image = pipe(prompt, image=init_image).images[0] ...@@ -99,7 +99,7 @@ image = pipe(prompt, image=init_image).images[0]
*inpainting*를 위해 다음과 같이 SDXL을 사용할 수 있습니다: *inpainting*를 위해 다음과 같이 SDXL을 사용할 수 있습니다:
```py ```py
import torch import torch
from diffusers import StableDiffusionXLInpaintPipeline from diffusers import StableDiffusionXLInpaintPipeline
from diffusers.utils import load_image from diffusers.utils import load_image
...@@ -352,7 +352,7 @@ out-of-memory 에러가 난다면, [`StableDiffusionXLPipeline.enable_model_cpu_ ...@@ -352,7 +352,7 @@ out-of-memory 에러가 난다면, [`StableDiffusionXLPipeline.enable_model_cpu_
**참고** Stable Diffusion XL을 `torch`가 2.0 버전 미만에서 실행시키고 싶을 때, xformers 어텐션을 사용해주세요: **참고** Stable Diffusion XL을 `torch`가 2.0 버전 미만에서 실행시키고 싶을 때, xformers 어텐션을 사용해주세요:
``` ```sh
pip install xformers pip install xformers
``` ```
......
...@@ -93,13 +93,13 @@ cd diffusers ...@@ -93,13 +93,13 @@ cd diffusers
**PyTorch의 경우** **PyTorch의 경우**
``` ```sh
pip install -e ".[torch]" pip install -e ".[torch]"
``` ```
**Flax의 경우** **Flax의 경우**
``` ```sh
pip install -e ".[flax]" pip install -e ".[flax]"
``` ```
......
...@@ -19,13 +19,13 @@ specific language governing permissions and limitations under the License. ...@@ -19,13 +19,13 @@ specific language governing permissions and limitations under the License.
다음 명령어로 ONNX Runtime를 지원하는 🤗 Optimum를 설치합니다: 다음 명령어로 ONNX Runtime를 지원하는 🤗 Optimum를 설치합니다:
``` ```sh
pip install optimum["onnxruntime"] pip install optimum["onnxruntime"]
``` ```
## Stable Diffusion 추론 ## Stable Diffusion 추론
아래 코드는 ONNX 런타임을 사용하는 방법을 보여줍니다. `StableDiffusionPipeline` 대신 `OnnxStableDiffusionPipeline`을 사용해야 합니다. 아래 코드는 ONNX 런타임을 사용하는 방법을 보여줍니다. `StableDiffusionPipeline` 대신 `OnnxStableDiffusionPipeline`을 사용해야 합니다.
PyTorch 모델을 불러오고 즉시 ONNX 형식으로 변환하려는 경우 `export=True`로 설정합니다. PyTorch 모델을 불러오고 즉시 ONNX 형식으로 변환하려는 경우 `export=True`로 설정합니다.
```python ```python
...@@ -38,7 +38,7 @@ images = pipe(prompt).images[0] ...@@ -38,7 +38,7 @@ images = pipe(prompt).images[0]
pipe.save_pretrained("./onnx-stable-diffusion-v1-5") pipe.save_pretrained("./onnx-stable-diffusion-v1-5")
``` ```
파이프라인을 ONNX 형식으로 오프라인으로 내보내고 나중에 추론에 사용하려는 경우, 파이프라인을 ONNX 형식으로 오프라인으로 내보내고 나중에 추론에 사용하려는 경우,
[`optimum-cli export`](https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli) 명령어를 사용할 수 있습니다: [`optimum-cli export`](https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli) 명령어를 사용할 수 있습니다:
```bash ```bash
...@@ -47,7 +47,7 @@ optimum-cli export onnx --model runwayml/stable-diffusion-v1-5 sd_v15_onnx/ ...@@ -47,7 +47,7 @@ optimum-cli export onnx --model runwayml/stable-diffusion-v1-5 sd_v15_onnx/
그 다음 추론을 수행합니다: 그 다음 추론을 수행합니다:
```python ```python
from optimum.onnxruntime import ORTStableDiffusionPipeline from optimum.onnxruntime import ORTStableDiffusionPipeline
model_id = "sd_v15_onnx" model_id = "sd_v15_onnx"
......
...@@ -19,7 +19,7 @@ specific language governing permissions and limitations under the License. ...@@ -19,7 +19,7 @@ specific language governing permissions and limitations under the License.
다음 명령어로 🤗 Optimum을 설치합니다: 다음 명령어로 🤗 Optimum을 설치합니다:
``` ```sh
pip install optimum["openvino"] pip install optimum["openvino"]
``` ```
......
...@@ -59,7 +59,7 @@ image ...@@ -59,7 +59,7 @@ image
먼저 `compel` 라이브러리를 설치해야합니다: 먼저 `compel` 라이브러리를 설치해야합니다:
``` ```sh
pip install compel pip install compel
``` ```
......
...@@ -95,13 +95,13 @@ cd diffusers ...@@ -95,13 +95,13 @@ cd diffusers
**PyTorch** **PyTorch**
``` ```sh
pip install -e ".[torch]" pip install -e ".[torch]"
``` ```
**Flax** **Flax**
``` ```sh
pip install -e ".[flax]" pip install -e ".[flax]"
``` ```
......
...@@ -25,7 +25,7 @@ from diffusers.utils.torch_utils import randn_tensor ...@@ -25,7 +25,7 @@ from diffusers.utils.torch_utils import randn_tensor
EXAMPLE_DOC_STRING = """ EXAMPLE_DOC_STRING = """
Examples: Examples:
``` ```py
from io import BytesIO from io import BytesIO
import requests import requests
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment