Unverified Commit f040c27d authored by Tolga Cangöz's avatar Tolga Cangöz Committed by GitHub
Browse files

Errata - Fix typos and improve style (#8571)



* Fix typos

* Fix typos & up style

* chore: Update numbers

---------
Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
parent 138fac70
...@@ -63,14 +63,14 @@ Let's walk through more detailed design decisions for each class. ...@@ -63,14 +63,14 @@ Let's walk through more detailed design decisions for each class.
Pipelines are designed to be easy to use (therefore do not follow [*Simple over easy*](#simple-over-easy) 100%), are not feature complete, and should loosely be seen as examples of how to use [models](#models) and [schedulers](#schedulers) for inference. Pipelines are designed to be easy to use (therefore do not follow [*Simple over easy*](#simple-over-easy) 100%), are not feature complete, and should loosely be seen as examples of how to use [models](#models) and [schedulers](#schedulers) for inference.
The following design principles are followed: The following design principles are followed:
- Pipelines follow the single-file policy. All pipelines can be found in individual directories under src/diffusers/pipelines. One pipeline folder corresponds to one diffusion paper/project/release. Multiple pipeline files can be gathered in one pipeline folder, as it’s done for [`src/diffusers/pipelines/stable-diffusion`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion). If pipelines share similar functionality, one can make use of the [#Copied from mechanism](https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251). - Pipelines follow the single-file policy. All pipelines can be found in individual directories under src/diffusers/pipelines. One pipeline folder corresponds to one diffusion paper/project/release. Multiple pipeline files can be gathered in one pipeline folder, as it’s done for [`src/diffusers/pipelines/stable-diffusion`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion). If pipelines share similar functionality, one can make use of the [# Copied from mechanism](https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251).
- Pipelines all inherit from [`DiffusionPipeline`]. - Pipelines all inherit from [`DiffusionPipeline`].
- Every pipeline consists of different model and scheduler components, that are documented in the [`model_index.json` file](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json), are accessible under the same name as attributes of the pipeline and can be shared between pipelines with [`DiffusionPipeline.components`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components) function. - Every pipeline consists of different model and scheduler components, that are documented in the [`model_index.json` file](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json), are accessible under the same name as attributes of the pipeline and can be shared between pipelines with [`DiffusionPipeline.components`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components) function.
- Every pipeline should be loadable via the [`DiffusionPipeline.from_pretrained`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained) function. - Every pipeline should be loadable via the [`DiffusionPipeline.from_pretrained`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained) function.
- Pipelines should be used **only** for inference. - Pipelines should be used **only** for inference.
- Pipelines should be very readable, self-explanatory, and easy to tweak. - Pipelines should be very readable, self-explanatory, and easy to tweak.
- Pipelines should be designed to build on top of each other and be easy to integrate into higher-level APIs. - Pipelines should be designed to build on top of each other and be easy to integrate into higher-level APIs.
- Pipelines are **not** intended to be feature-complete user interfaces. For future complete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner). - Pipelines are **not** intended to be feature-complete user interfaces. For feature-complete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner).
- Every pipeline should have one and only one way to run it via a `__call__` method. The naming of the `__call__` arguments should be shared across all pipelines. - Every pipeline should have one and only one way to run it via a `__call__` method. The naming of the `__call__` arguments should be shared across all pipelines.
- Pipelines should be named after the task they are intended to solve. - Pipelines should be named after the task they are intended to solve.
- In almost all cases, novel diffusion pipelines shall be implemented in a new pipeline folder/file. - In almost all cases, novel diffusion pipelines shall be implemented in a new pipeline folder/file.
...@@ -81,7 +81,7 @@ Models are designed as configurable toolboxes that are natural extensions of [Py ...@@ -81,7 +81,7 @@ Models are designed as configurable toolboxes that are natural extensions of [Py
The following design principles are followed: The following design principles are followed:
- Models correspond to **a type of model architecture**. *E.g.* the [`UNet2DConditionModel`] class is used for all UNet variations that expect 2D image inputs and are conditioned on some context. - Models correspond to **a type of model architecture**. *E.g.* the [`UNet2DConditionModel`] class is used for all UNet variations that expect 2D image inputs and are conditioned on some context.
- All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_condition.py), [`transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformer_2d.py), etc... - All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unets/unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_condition.py), [`transformers/transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformers/transformer_2d.py), etc...
- Models **do not** follow the single-file policy and should make use of smaller model building blocks, such as [`attention.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py), [`resnet.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py), [`embeddings.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py), etc... **Note**: This is in stark contrast to Transformers' modeling files and shows that models do not really follow the single-file policy. - Models **do not** follow the single-file policy and should make use of smaller model building blocks, such as [`attention.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py), [`resnet.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py), [`embeddings.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py), etc... **Note**: This is in stark contrast to Transformers' modeling files and shows that models do not really follow the single-file policy.
- Models intend to expose complexity, just like PyTorch's `Module` class, and give clear error messages. - Models intend to expose complexity, just like PyTorch's `Module` class, and give clear error messages.
- Models all inherit from `ModelMixin` and `ConfigMixin`. - Models all inherit from `ModelMixin` and `ConfigMixin`.
...@@ -90,7 +90,7 @@ The following design principles are followed: ...@@ -90,7 +90,7 @@ The following design principles are followed:
- To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different. - To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different.
- Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work. - Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work.
- The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and - The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and
readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py). readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
### Schedulers ### Schedulers
...@@ -100,7 +100,7 @@ The following design principles are followed: ...@@ -100,7 +100,7 @@ The following design principles are followed:
- All schedulers are found in [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers). - All schedulers are found in [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
- Schedulers are **not** allowed to import from large utils files and shall be kept very self-contained. - Schedulers are **not** allowed to import from large utils files and shall be kept very self-contained.
- One scheduler Python file corresponds to one scheduler algorithm (as might be defined in a paper). - One scheduler Python file corresponds to one scheduler algorithm (as might be defined in a paper).
- If schedulers share similar functionalities, we can make use of the `#Copied from` mechanism. - If schedulers share similar functionalities, we can make use of the `# Copied from` mechanism.
- Schedulers all inherit from `SchedulerMixin` and `ConfigMixin`. - Schedulers all inherit from `SchedulerMixin` and `ConfigMixin`.
- Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](./docs/source/en/using-diffusers/schedulers.md). - Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](./docs/source/en/using-diffusers/schedulers.md).
- Every scheduler has to have a `set_num_inference_steps`, and a `step` function. `set_num_inference_steps(...)` has to be called before every denoising process, *i.e.* before `step(...)` is called. - Every scheduler has to have a `set_num_inference_steps`, and a `step` function. `set_num_inference_steps(...)` has to be called before every denoising process, *i.e.* before `step(...)` is called.
......
...@@ -67,7 +67,7 @@ Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggi ...@@ -67,7 +67,7 @@ Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggi
## Quickstart ## Quickstart
Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 25.000+ checkpoints): Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 27.000+ checkpoints):
```python ```python
from diffusers import DiffusionPipeline from diffusers import DiffusionPipeline
...@@ -209,7 +209,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9 ...@@ -209,7 +209,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9
- https://github.com/deep-floyd/IF - https://github.com/deep-floyd/IF
- https://github.com/bentoml/BentoML - https://github.com/bentoml/BentoML
- https://github.com/bmaltais/kohya_ss - https://github.com/bmaltais/kohya_ss
- +11.000 other amazing GitHub repositories 💪 - +12.000 other amazing GitHub repositories 💪
Thank you for using us ❤️. Thank you for using us ❤️.
......
...@@ -11,7 +11,7 @@ specific language governing permissions and limitations under the License. ...@@ -11,7 +11,7 @@ specific language governing permissions and limitations under the License.
Kandinsky 3 is created by [Vladimir Arkhipkin](https://github.com/oriBetelgeuse),[Anastasia Maltseva](https://github.com/NastyaMittseva),[Igor Pavlov](https://github.com/boomb0om),[Andrei Filatov](https://github.com/anvilarth),[Arseniy Shakhmatov](https://github.com/cene555),[Andrey Kuznetsov](https://github.com/kuznetsoffandrey),[Denis Dimitrov](https://github.com/denndimitrov), [Zein Shaheen](https://github.com/zeinsh) Kandinsky 3 is created by [Vladimir Arkhipkin](https://github.com/oriBetelgeuse),[Anastasia Maltseva](https://github.com/NastyaMittseva),[Igor Pavlov](https://github.com/boomb0om),[Andrei Filatov](https://github.com/anvilarth),[Arseniy Shakhmatov](https://github.com/cene555),[Andrey Kuznetsov](https://github.com/kuznetsoffandrey),[Denis Dimitrov](https://github.com/denndimitrov), [Zein Shaheen](https://github.com/zeinsh)
The description from it's Github page: The description from it's GitHub page:
*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.* *Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.*
......
...@@ -63,7 +63,7 @@ Let's walk through more in-detail design decisions for each class. ...@@ -63,7 +63,7 @@ Let's walk through more in-detail design decisions for each class.
Pipelines are designed to be easy to use (therefore do not follow [*Simple over easy*](#simple-over-easy) 100%), are not feature complete, and should loosely be seen as examples of how to use [models](#models) and [schedulers](#schedulers) for inference. Pipelines are designed to be easy to use (therefore do not follow [*Simple over easy*](#simple-over-easy) 100%), are not feature complete, and should loosely be seen as examples of how to use [models](#models) and [schedulers](#schedulers) for inference.
The following design principles are followed: The following design principles are followed:
- Pipelines follow the single-file policy. All pipelines can be found in individual directories under src/diffusers/pipelines. One pipeline folder corresponds to one diffusion paper/project/release. Multiple pipeline files can be gathered in one pipeline folder, as it’s done for [`src/diffusers/pipelines/stable-diffusion`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion). If pipelines share similar functionality, one can make use of the [#Copied from mechanism](https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251). - Pipelines follow the single-file policy. All pipelines can be found in individual directories under src/diffusers/pipelines. One pipeline folder corresponds to one diffusion paper/project/release. Multiple pipeline files can be gathered in one pipeline folder, as it’s done for [`src/diffusers/pipelines/stable-diffusion`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion). If pipelines share similar functionality, one can make use of the [# Copied from mechanism](https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251).
- Pipelines all inherit from [`DiffusionPipeline`]. - Pipelines all inherit from [`DiffusionPipeline`].
- Every pipeline consists of different model and scheduler components, that are documented in the [`model_index.json` file](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json), are accessible under the same name as attributes of the pipeline and can be shared between pipelines with [`DiffusionPipeline.components`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components) function. - Every pipeline consists of different model and scheduler components, that are documented in the [`model_index.json` file](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json), are accessible under the same name as attributes of the pipeline and can be shared between pipelines with [`DiffusionPipeline.components`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components) function.
- Every pipeline should be loadable via the [`DiffusionPipeline.from_pretrained`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained) function. - Every pipeline should be loadable via the [`DiffusionPipeline.from_pretrained`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained) function.
...@@ -81,7 +81,7 @@ Models are designed as configurable toolboxes that are natural extensions of [Py ...@@ -81,7 +81,7 @@ Models are designed as configurable toolboxes that are natural extensions of [Py
The following design principles are followed: The following design principles are followed:
- Models correspond to **a type of model architecture**. *E.g.* the [`UNet2DConditionModel`] class is used for all UNet variations that expect 2D image inputs and are conditioned on some context. - Models correspond to **a type of model architecture**. *E.g.* the [`UNet2DConditionModel`] class is used for all UNet variations that expect 2D image inputs and are conditioned on some context.
- All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_condition.py), [`transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformer_2d.py), etc... - All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unets/unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_condition.py), [`transformers/transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformers/transformer_2d.py), etc...
- Models **do not** follow the single-file policy and should make use of smaller model building blocks, such as [`attention.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py), [`resnet.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py), [`embeddings.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py), etc... **Note**: This is in stark contrast to Transformers' modeling files and shows that models do not really follow the single-file policy. - Models **do not** follow the single-file policy and should make use of smaller model building blocks, such as [`attention.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py), [`resnet.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py), [`embeddings.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py), etc... **Note**: This is in stark contrast to Transformers' modeling files and shows that models do not really follow the single-file policy.
- Models intend to expose complexity, just like PyTorch's `Module` class, and give clear error messages. - Models intend to expose complexity, just like PyTorch's `Module` class, and give clear error messages.
- Models all inherit from `ModelMixin` and `ConfigMixin`. - Models all inherit from `ModelMixin` and `ConfigMixin`.
...@@ -90,7 +90,7 @@ The following design principles are followed: ...@@ -90,7 +90,7 @@ The following design principles are followed:
- To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different. - To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different.
- Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work. - Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work.
- The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and - The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and
readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py). readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
### Schedulers ### Schedulers
...@@ -100,9 +100,9 @@ The following design principles are followed: ...@@ -100,9 +100,9 @@ The following design principles are followed:
- All schedulers are found in [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers). - All schedulers are found in [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
- Schedulers are **not** allowed to import from large utils files and shall be kept very self-contained. - Schedulers are **not** allowed to import from large utils files and shall be kept very self-contained.
- One scheduler Python file corresponds to one scheduler algorithm (as might be defined in a paper). - One scheduler Python file corresponds to one scheduler algorithm (as might be defined in a paper).
- If schedulers share similar functionalities, we can make use of the `#Copied from` mechanism. - If schedulers share similar functionalities, we can make use of the `# Copied from` mechanism.
- Schedulers all inherit from `SchedulerMixin` and `ConfigMixin`. - Schedulers all inherit from `SchedulerMixin` and `ConfigMixin`.
- Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](../using-diffusers/schedulers.md). - Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](../using-diffusers/schedulers).
- Every scheduler has to have a `set_num_inference_steps`, and a `step` function. `set_num_inference_steps(...)` has to be called before every denoising process, *i.e.* before `step(...)` is called. - Every scheduler has to have a `set_num_inference_steps`, and a `step` function. `set_num_inference_steps(...)` has to be called before every denoising process, *i.e.* before `step(...)` is called.
- Every scheduler exposes the timesteps to be "looped over" via a `timesteps` attribute, which is an array of timesteps the model will be called upon. - Every scheduler exposes the timesteps to be "looped over" via a `timesteps` attribute, which is an array of timesteps the model will be called upon.
- The `step(...)` function takes a predicted model output and the "current" sample (x_t) and returns the "previous", slightly more denoised sample (x_t-1). - The `step(...)` function takes a predicted model output and the "current" sample (x_t) and returns the "previous", slightly more denoised sample (x_t-1).
......
...@@ -57,7 +57,7 @@ Diffusers에서는 이러한 철학을 파이프라인과 스케줄러에 모두 ...@@ -57,7 +57,7 @@ Diffusers에서는 이러한 철학을 파이프라인과 스케줄러에 모두
파이프라인은 사용하기 쉽도록 설계되었으며 (따라서 [*쉬움보다는 간단함을*](#쉬움보다는-간단함을)을 100% 따르지는 않음), feature-complete하지 않으며, 추론을 위한 [모델](#모델)[스케줄러](#스케줄러)를 사용하는 방법의 예시로 간주될 수 있습니다. 파이프라인은 사용하기 쉽도록 설계되었으며 (따라서 [*쉬움보다는 간단함을*](#쉬움보다는-간단함을)을 100% 따르지는 않음), feature-complete하지 않으며, 추론을 위한 [모델](#모델)[스케줄러](#스케줄러)를 사용하는 방법의 예시로 간주될 수 있습니다.
다음과 같은 설계 원칙을 따릅니다: 다음과 같은 설계 원칙을 따릅니다:
- 파이프라인은 단일 파일 정책을 따릅니다. 모든 파이프라인은 src/diffusers/pipelines의 개별 디렉토리에 있습니다. 하나의 파이프라인 폴더는 하나의 diffusion 논문/프로젝트/릴리스에 해당합니다. 여러 파이프라인 파일은 하나의 파이프라인 폴더에 모을 수 있습니다. 예를 들어 [`src/diffusers/pipelines/stable-diffusion`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion)에서 그렇게 하고 있습니다. 파이프라인이 유사한 기능을 공유하는 경우, [#Copied from mechanism](https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251)을 사용할 수 있습니다. - 파이프라인은 단일 파일 정책을 따릅니다. 모든 파이프라인은 src/diffusers/pipelines의 개별 디렉토리에 있습니다. 하나의 파이프라인 폴더는 하나의 diffusion 논문/프로젝트/릴리스에 해당합니다. 여러 파이프라인 파일은 하나의 파이프라인 폴더에 모을 수 있습니다. 예를 들어 [`src/diffusers/pipelines/stable-diffusion`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion)에서 그렇게 하고 있습니다. 파이프라인이 유사한 기능을 공유하는 경우, [# Copied from mechanism](https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251)을 사용할 수 있습니다.
- 파이프라인은 모두 [`DiffusionPipeline`]을 상속합니다. - 파이프라인은 모두 [`DiffusionPipeline`]을 상속합니다.
- 각 파이프라인은 서로 다른 모델 및 스케줄러 구성 요소로 구성되어 있으며, 이는 [`model_index.json` 파일](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json)에 문서화되어 있으며, 파이프라인의 속성 이름과 동일한 이름으로 액세스할 수 있으며, [`DiffusionPipeline.components`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components) 함수를 통해 파이프라인 간에 공유할 수 있습니다. - 각 파이프라인은 서로 다른 모델 및 스케줄러 구성 요소로 구성되어 있으며, 이는 [`model_index.json` 파일](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json)에 문서화되어 있으며, 파이프라인의 속성 이름과 동일한 이름으로 액세스할 수 있으며, [`DiffusionPipeline.components`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components) 함수를 통해 파이프라인 간에 공유할 수 있습니다.
- 각 파이프라인은 [`DiffusionPipeline.from_pretrained`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained) 함수를 통해 로드할 수 있어야 합니다. - 각 파이프라인은 [`DiffusionPipeline.from_pretrained`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained) 함수를 통해 로드할 수 있어야 합니다.
...@@ -93,7 +93,7 @@ Diffusers에서는 이러한 철학을 파이프라인과 스케줄러에 모두 ...@@ -93,7 +93,7 @@ Diffusers에서는 이러한 철학을 파이프라인과 스케줄러에 모두
- 모든 스케줄러는 [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers)에서 찾을 수 있습니다. - 모든 스케줄러는 [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers)에서 찾을 수 있습니다.
- 스케줄러는 큰 유틸리티 파일에서 가져오지 **않아야** 하며, 자체 포함성을 유지해야 합니다. - 스케줄러는 큰 유틸리티 파일에서 가져오지 **않아야** 하며, 자체 포함성을 유지해야 합니다.
- 하나의 스케줄러 Python 파일은 하나의 스케줄러 알고리즘(논문에서 정의된 것과 같은)에 해당합니다. - 하나의 스케줄러 Python 파일은 하나의 스케줄러 알고리즘(논문에서 정의된 것과 같은)에 해당합니다.
- 스케줄러가 유사한 기능을 공유하는 경우, `#Copied from` 메커니즘을 사용할 수 있습니다. - 스케줄러가 유사한 기능을 공유하는 경우, `# Copied from` 메커니즘을 사용할 수 있습니다.
- 모든 스케줄러는 `SchedulerMixin``ConfigMixin`을 상속합니다. - 모든 스케줄러는 `SchedulerMixin``ConfigMixin`을 상속합니다.
- [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) 메서드를 사용하여 스케줄러를 쉽게 교체할 수 있습니다. 자세한 내용은 [여기](../using-diffusers/schedulers.md)에서 설명합니다. - [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) 메서드를 사용하여 스케줄러를 쉽게 교체할 수 있습니다. 자세한 내용은 [여기](../using-diffusers/schedulers.md)에서 설명합니다.
- 모든 스케줄러는 `set_num_inference_steps``step` 함수를 가져야 합니다. `set_num_inference_steps(...)`는 각 노이즈 제거 과정(즉, `step(...)`이 호출되기 전) 이전에 호출되어야 합니다. - 모든 스케줄러는 `set_num_inference_steps``step` 함수를 가져야 합니다. `set_num_inference_steps(...)`는 각 노이즈 제거 과정(즉, `step(...)`이 호출되기 전) 이전에 호출되어야 합니다.
......
...@@ -58,7 +58,7 @@ outputs = pipeline( ...@@ -58,7 +58,7 @@ outputs = pipeline(
) )
``` ```
더 많은 정보를 얻기 위해, Optimum Habana의 [문서](https://huggingface.co/docs/optimum/habana/usage_guides/stable_diffusion)와 공식 Github 저장소에 제공된 [예시](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion)를 확인하세요. 더 많은 정보를 얻기 위해, Optimum Habana의 [문서](https://huggingface.co/docs/optimum/habana/usage_guides/stable_diffusion)와 공식 GitHub 저장소에 제공된 [예시](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion)를 확인하세요.
## 벤치마크 ## 벤치마크
......
...@@ -27,7 +27,7 @@ Please also check out our [Community Scripts](https://github.com/huggingface/dif ...@@ -27,7 +27,7 @@ Please also check out our [Community Scripts](https://github.com/huggingface/dif
| Multilingual Stable Diffusion | Stable Diffusion Pipeline that supports prompts in 50 different languages. | [Multilingual Stable Diffusion](#multilingual-stable-diffusion-pipeline) | - | [Juan Carlos Piñeros](https://github.com/juancopi81) | | Multilingual Stable Diffusion | Stable Diffusion Pipeline that supports prompts in 50 different languages. | [Multilingual Stable Diffusion](#multilingual-stable-diffusion-pipeline) | - | [Juan Carlos Piñeros](https://github.com/juancopi81) |
| GlueGen Stable Diffusion | Stable Diffusion Pipeline that supports prompts in different languages using GlueGen adapter. | [GlueGen Stable Diffusion](#gluegen-stable-diffusion-pipeline) | - | [Phạm Hồng Vinh](https://github.com/rootonchair) | | GlueGen Stable Diffusion | Stable Diffusion Pipeline that supports prompts in different languages using GlueGen adapter. | [GlueGen Stable Diffusion](#gluegen-stable-diffusion-pipeline) | - | [Phạm Hồng Vinh](https://github.com/rootonchair) |
| Image to Image Inpainting Stable Diffusion | Stable Diffusion Pipeline that enables the overlaying of two images and subsequent inpainting | [Image to Image Inpainting Stable Diffusion](#image-to-image-inpainting-stable-diffusion) | - | [Alex McKinney](https://github.com/vvvm23) | | Image to Image Inpainting Stable Diffusion | Stable Diffusion Pipeline that enables the overlaying of two images and subsequent inpainting | [Image to Image Inpainting Stable Diffusion](#image-to-image-inpainting-stable-diffusion) | - | [Alex McKinney](https://github.com/vvvm23) |
| Text Based Inpainting Stable Diffusion | Stable Diffusion Inpainting Pipeline that enables passing a text prompt to generate the mask for inpainting | [Text Based Inpainting Stable Diffusion](#image-to-image-inpainting-stable-diffusion) | - | [Dhruv Karan](https://github.com/unography) | | Text Based Inpainting Stable Diffusion | Stable Diffusion Inpainting Pipeline that enables passing a text prompt to generate the mask for inpainting | [Text Based Inpainting Stable Diffusion](#text-based-inpainting-stable-diffusion) | - | [Dhruv Karan](https://github.com/unography) |
| Bit Diffusion | Diffusion on discrete data | [Bit Diffusion](#bit-diffusion) | - | [Stuti R.](https://github.com/kingstut) | | Bit Diffusion | Diffusion on discrete data | [Bit Diffusion](#bit-diffusion) | - | [Stuti R.](https://github.com/kingstut) |
| K-Diffusion Stable Diffusion | Run Stable Diffusion with any of [K-Diffusion's samplers](https://github.com/crowsonkb/k-diffusion/blob/master/k_diffusion/sampling.py) | [Stable Diffusion with K Diffusion](#stable-diffusion-with-k-diffusion) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) | | K-Diffusion Stable Diffusion | Run Stable Diffusion with any of [K-Diffusion's samplers](https://github.com/crowsonkb/k-diffusion/blob/master/k_diffusion/sampling.py) | [Stable Diffusion with K Diffusion](#stable-diffusion-with-k-diffusion) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) |
| Checkpoint Merger Pipeline | Diffusion Pipeline that enables merging of saved model checkpoints | [Checkpoint Merger Pipeline](#checkpoint-merger-pipeline) | - | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) | | Checkpoint Merger Pipeline | Diffusion Pipeline that enables merging of saved model checkpoints | [Checkpoint Merger Pipeline](#checkpoint-merger-pipeline) | - | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) |
...@@ -40,7 +40,7 @@ Please also check out our [Community Scripts](https://github.com/huggingface/dif ...@@ -40,7 +40,7 @@ Please also check out our [Community Scripts](https://github.com/huggingface/dif
| CLIP Guided Img2Img Stable Diffusion Pipeline | Doing CLIP guidance for image to image generation with Stable Diffusion | [CLIP Guided Img2Img Stable Diffusion](#clip-guided-img2img-stable-diffusion) | - | [Nipun Jindal](https://github.com/nipunjindal/) | | CLIP Guided Img2Img Stable Diffusion Pipeline | Doing CLIP guidance for image to image generation with Stable Diffusion | [CLIP Guided Img2Img Stable Diffusion](#clip-guided-img2img-stable-diffusion) | - | [Nipun Jindal](https://github.com/nipunjindal/) |
| TensorRT Stable Diffusion Text to Image Pipeline | Accelerates the Stable Diffusion Text2Image Pipeline using TensorRT | [TensorRT Stable Diffusion Text to Image Pipeline](#tensorrt-text2image-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) | | TensorRT Stable Diffusion Text to Image Pipeline | Accelerates the Stable Diffusion Text2Image Pipeline using TensorRT | [TensorRT Stable Diffusion Text to Image Pipeline](#tensorrt-text2image-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) |
| EDICT Image Editing Pipeline | Diffusion pipeline for text-guided image editing | [EDICT Image Editing Pipeline](#edict-image-editing-pipeline) | - | [Joqsan Azocar](https://github.com/Joqsan) | | EDICT Image Editing Pipeline | Diffusion pipeline for text-guided image editing | [EDICT Image Editing Pipeline](#edict-image-editing-pipeline) | - | [Joqsan Azocar](https://github.com/Joqsan) |
| Stable Diffusion RePaint | Stable Diffusion pipeline using [RePaint](https://arxiv.org/abs/2201.0986) for inpainting. | [Stable Diffusion RePaint](#stable-diffusion-repaint ) | - | [Markus Pobitzer](https://github.com/Markus-Pobitzer) | | Stable Diffusion RePaint | Stable Diffusion pipeline using [RePaint](https://arxiv.org/abs/2201.09865) for inpainting. | [Stable Diffusion RePaint](#stable-diffusion-repaint ) | - | [Markus Pobitzer](https://github.com/Markus-Pobitzer) |
| TensorRT Stable Diffusion Image to Image Pipeline | Accelerates the Stable Diffusion Image2Image Pipeline using TensorRT | [TensorRT Stable Diffusion Image to Image Pipeline](#tensorrt-image2image-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) | | TensorRT Stable Diffusion Image to Image Pipeline | Accelerates the Stable Diffusion Image2Image Pipeline using TensorRT | [TensorRT Stable Diffusion Image to Image Pipeline](#tensorrt-image2image-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) |
| Stable Diffusion IPEX Pipeline | Accelerate Stable Diffusion inference pipeline with BF16/FP32 precision on Intel Xeon CPUs with [IPEX](https://github.com/intel/intel-extension-for-pytorch) | [Stable Diffusion on IPEX](#stable-diffusion-on-ipex) | - | [Yingjie Han](https://github.com/yingjie-han/) | | Stable Diffusion IPEX Pipeline | Accelerate Stable Diffusion inference pipeline with BF16/FP32 precision on Intel Xeon CPUs with [IPEX](https://github.com/intel/intel-extension-for-pytorch) | [Stable Diffusion on IPEX](#stable-diffusion-on-ipex) | - | [Yingjie Han](https://github.com/yingjie-han/) |
| CLIP Guided Images Mixing Stable Diffusion Pipeline | Сombine images using usual diffusion models. | [CLIP Guided Images Mixing Using Stable Diffusion](#clip-guided-images-mixing-with-stable-diffusion) | - | [Karachev Denis](https://github.com/TheDenk) | | CLIP Guided Images Mixing Stable Diffusion Pipeline | Сombine images using usual diffusion models. | [CLIP Guided Images Mixing Using Stable Diffusion](#clip-guided-images-mixing-with-stable-diffusion) | - | [Karachev Denis](https://github.com/TheDenk) |
...@@ -192,10 +192,9 @@ prompt = "wooden boat" ...@@ -192,10 +192,9 @@ prompt = "wooden boat"
init_image = load_image("https://raw.githubusercontent.com/Picsart-AI-Research/HD-Painter/main/__assets__/samples/images/2.jpg") init_image = load_image("https://raw.githubusercontent.com/Picsart-AI-Research/HD-Painter/main/__assets__/samples/images/2.jpg")
mask_image = load_image("https://raw.githubusercontent.com/Picsart-AI-Research/HD-Painter/main/__assets__/samples/masks/2.png") mask_image = load_image("https://raw.githubusercontent.com/Picsart-AI-Research/HD-Painter/main/__assets__/samples/masks/2.png")
image = pipe (prompt, init_image, mask_image, use_rasg = True, use_painta = True, generator=torch.manual_seed(12345)).images[0] image = pipe(prompt, init_image, mask_image, use_rasg=True, use_painta=True, generator=torch.manual_seed(12345)).images[0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3) make_image_grid([init_image, mask_image, image], rows=1, cols=3)
``` ```
### Marigold Depth Estimation ### Marigold Depth Estimation
...@@ -223,7 +222,7 @@ pipe = DiffusionPipeline.from_pretrained( ...@@ -223,7 +222,7 @@ pipe = DiffusionPipeline.from_pretrained(
# (New) LCM version (faster speed) # (New) LCM version (faster speed)
pipe = DiffusionPipeline.from_pretrained( pipe = DiffusionPipeline.from_pretrained(
"prs-eth/marigold-lcm-v1-0", "prs-eth/marigold-depth-lcm-v1-0",
custom_pipeline="marigold_depth_estimation" custom_pipeline="marigold_depth_estimation"
# torch_dtype=torch.float16, # (optional) Run with half-precision (16-bit float). # torch_dtype=torch.float16, # (optional) Run with half-precision (16-bit float).
# variant="fp16", # (optional) Use with `torch_dtype=torch.float16`, to directly load fp16 checkpoint # variant="fp16", # (optional) Use with `torch_dtype=torch.float16`, to directly load fp16 checkpoint
...@@ -366,7 +365,6 @@ guided_pipeline = DiffusionPipeline.from_pretrained( ...@@ -366,7 +365,6 @@ guided_pipeline = DiffusionPipeline.from_pretrained(
custom_pipeline="clip_guided_stable_diffusion", custom_pipeline="clip_guided_stable_diffusion",
clip_model=clip_model, clip_model=clip_model,
feature_extractor=feature_extractor, feature_extractor=feature_extractor,
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
guided_pipeline.enable_attention_slicing() guided_pipeline.enable_attention_slicing()
...@@ -394,7 +392,7 @@ for i, img in enumerate(images): ...@@ -394,7 +392,7 @@ for i, img in enumerate(images):
``` ```
The `images` list contains a list of PIL images that can be saved locally or displayed directly in a google colab. The `images` list contains a list of PIL images that can be saved locally or displayed directly in a google colab.
Generated images tend to be of higher qualtiy than natively using stable diffusion. E.g. the above script generates the following images: Generated images tend to be of higher quality than natively using stable diffusion. E.g. the above script generates the following images:
![clip_guidance](https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/clip_guidance/merged_clip_guidance.jpg). ![clip_guidance](https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/clip_guidance/merged_clip_guidance.jpg).
...@@ -468,11 +466,9 @@ pipe.enable_attention_slicing() ...@@ -468,11 +466,9 @@ pipe.enable_attention_slicing()
### Text-to-Image ### Text-to-Image
images = pipe.text2img("An astronaut riding a horse").images images = pipe.text2img("An astronaut riding a horse").images
### Image-to-Image ### Image-to-Image
init_image = download_image("https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg") init_image = download_image("https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg")
prompt = "A fantasy landscape, trending on artstation" prompt = "A fantasy landscape, trending on artstation"
...@@ -480,7 +476,6 @@ prompt = "A fantasy landscape, trending on artstation" ...@@ -480,7 +476,6 @@ prompt = "A fantasy landscape, trending on artstation"
images = pipe.img2img(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images images = pipe.img2img(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images
### Inpainting ### Inpainting
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
init_image = download_image(img_url).resize((512, 512)) init_image = download_image(img_url).resize((512, 512))
...@@ -497,7 +492,7 @@ As shown above this one pipeline can run all both "text-to-image", "image-to-ima ...@@ -497,7 +492,7 @@ As shown above this one pipeline can run all both "text-to-image", "image-to-ima
Features of this custom pipeline: Features of this custom pipeline:
- Input a prompt without the 77 token length limit. - Input a prompt without the 77 token length limit.
- Includes tx2img, img2img. and inpainting pipelines. - Includes tx2img, img2img, and inpainting pipelines.
- Emphasize/weigh part of your prompt with parentheses as so: `a baby deer with (big eyes)` - Emphasize/weigh part of your prompt with parentheses as so: `a baby deer with (big eyes)`
- De-emphasize part of your prompt as so: `a [baby] deer with big eyes` - De-emphasize part of your prompt as so: `a [baby] deer with big eyes`
- Precisely weigh part of your prompt as so: `a baby deer with (big eyes:1.3)` - Precisely weigh part of your prompt as so: `a baby deer with (big eyes:1.3)`
...@@ -511,7 +506,7 @@ Prompt weighting equivalents: ...@@ -511,7 +506,7 @@ Prompt weighting equivalents:
You can run this custom pipeline as so: You can run this custom pipeline as so:
#### pytorch #### PyTorch
```python ```python
from diffusers import DiffusionPipeline from diffusers import DiffusionPipeline
...@@ -520,16 +515,14 @@ import torch ...@@ -520,16 +515,14 @@ import torch
pipe = DiffusionPipeline.from_pretrained( pipe = DiffusionPipeline.from_pretrained(
'hakurei/waifu-diffusion', 'hakurei/waifu-diffusion',
custom_pipeline="lpw_stable_diffusion", custom_pipeline="lpw_stable_diffusion",
torch_dtype=torch.float16 torch_dtype=torch.float16
) )
pipe=pipe.to("cuda") pipe = pipe.to("cuda")
prompt = "best_quality (1girl:1.3) bow bride brown_hair closed_mouth frilled_bow frilled_hair_tubes frills (full_body:1.3) fox_ear hair_bow hair_tubes happy hood japanese_clothes kimono long_sleeves red_bow smile solo tabi uchikake white_kimono wide_sleeves cherry_blossoms" prompt = "best_quality (1girl:1.3) bow bride brown_hair closed_mouth frilled_bow frilled_hair_tubes frills (full_body:1.3) fox_ear hair_bow hair_tubes happy hood japanese_clothes kimono long_sleeves red_bow smile solo tabi uchikake white_kimono wide_sleeves cherry_blossoms"
neg_prompt = "lowres, bad_anatomy, error_body, error_hair, error_arm, error_hands, bad_hands, error_fingers, bad_fingers, missing_fingers, error_legs, bad_legs, multiple_legs, missing_legs, error_lighting, error_shadow, error_reflection, text, error, extra_digit, fewer_digits, cropped, worst_quality, low_quality, normal_quality, jpeg_artifacts, signature, watermark, username, blurry" neg_prompt = "lowres, bad_anatomy, error_body, error_hair, error_arm, error_hands, bad_hands, error_fingers, bad_fingers, missing_fingers, error_legs, bad_legs, multiple_legs, missing_legs, error_lighting, error_shadow, error_reflection, text, error, extra_digit, fewer_digits, cropped, worst_quality, low_quality, normal_quality, jpeg_artifacts, signature, watermark, username, blurry"
pipe.text2img(prompt, negative_prompt=neg_prompt, width=512,height=512,max_embeddings_multiples=3).images[0] pipe.text2img(prompt, negative_prompt=neg_prompt, width=512, height=512, max_embeddings_multiples=3).images[0]
``` ```
#### onnxruntime #### onnxruntime
...@@ -548,11 +541,10 @@ pipe = DiffusionPipeline.from_pretrained( ...@@ -548,11 +541,10 @@ pipe = DiffusionPipeline.from_pretrained(
prompt = "a photo of an astronaut riding a horse on mars, best quality" prompt = "a photo of an astronaut riding a horse on mars, best quality"
neg_prompt = "lowres, bad anatomy, error body, error hair, error arm, error hands, bad hands, error fingers, bad fingers, missing fingers, error legs, bad legs, multiple legs, missing legs, error lighting, error shadow, error reflection, text, error, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry" neg_prompt = "lowres, bad anatomy, error body, error hair, error arm, error hands, bad hands, error fingers, bad fingers, missing fingers, error legs, bad legs, multiple legs, missing legs, error lighting, error shadow, error reflection, text, error, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"
pipe.text2img(prompt,negative_prompt=neg_prompt, width=512, height=512, max_embeddings_multiples=3).images[0] pipe.text2img(prompt, negative_prompt=neg_prompt, width=512, height=512, max_embeddings_multiples=3).images[0]
``` ```
if you see `Token indices sequence length is longer than the specified maximum sequence length for this model ( *** > 77 ) . Running this sequence through the model will result in indexing errors`. Do not worry, it is normal. If you see `Token indices sequence length is longer than the specified maximum sequence length for this model ( *** > 77 ) . Running this sequence through the model will result in indexing errors`. Do not worry, it is normal.
### Speech to Image ### Speech to Image
...@@ -587,7 +579,6 @@ diffuser_pipeline = DiffusionPipeline.from_pretrained( ...@@ -587,7 +579,6 @@ diffuser_pipeline = DiffusionPipeline.from_pretrained(
custom_pipeline="speech_to_image_diffusion", custom_pipeline="speech_to_image_diffusion",
speech_model=model, speech_model=model,
speech_processor=processor, speech_processor=processor,
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
...@@ -647,7 +638,6 @@ import torch ...@@ -647,7 +638,6 @@ import torch
pipe = DiffusionPipeline.from_pretrained( pipe = DiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4", "CompVis/stable-diffusion-v1-4",
custom_pipeline="wildcard_stable_diffusion", custom_pipeline="wildcard_stable_diffusion",
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
prompt = "__animal__ sitting on a __object__ wearing a __clothing__" prompt = "__animal__ sitting on a __object__ wearing a __clothing__"
...@@ -707,7 +697,6 @@ for i in range(args.num_images): ...@@ -707,7 +697,6 @@ for i in range(args.num_images):
images.append(th.from_numpy(np.array(image)).permute(2, 0, 1) / 255.) images.append(th.from_numpy(np.array(image)).permute(2, 0, 1) / 255.)
grid = tvu.make_grid(th.stack(images, dim=0), nrow=4, padding=0) grid = tvu.make_grid(th.stack(images, dim=0), nrow=4, padding=0)
tvu.save_image(grid, f'{prompt}_{args.weights}' + '.png') tvu.save_image(grid, f'{prompt}_{args.weights}' + '.png')
``` ```
### Imagic Stable Diffusion ### Imagic Stable Diffusion
...@@ -721,13 +710,14 @@ from io import BytesIO ...@@ -721,13 +710,14 @@ from io import BytesIO
import torch import torch
import os import os
from diffusers import DiffusionPipeline, DDIMScheduler from diffusers import DiffusionPipeline, DDIMScheduler
has_cuda = torch.cuda.is_available() has_cuda = torch.cuda.is_available()
device = torch.device('cpu' if not has_cuda else 'cuda') device = torch.device('cpu' if not has_cuda else 'cuda')
pipe = DiffusionPipeline.from_pretrained( pipe = DiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4", "CompVis/stable-diffusion-v1-4",
safety_checker=None, safety_checker=None,
custom_pipeline="imagic_stable_diffusion", custom_pipeline="imagic_stable_diffusion",
scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False) scheduler=DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
).to(device) ).to(device)
generator = torch.Generator("cuda").manual_seed(0) generator = torch.Generator("cuda").manual_seed(0)
seed = 0 seed = 0
...@@ -837,7 +827,7 @@ image.save('./seed_resize/seed_resize_{w}_{h}_image_compare.png'.format(w=width, ...@@ -837,7 +827,7 @@ image.save('./seed_resize/seed_resize_{w}_{h}_image_compare.png'.format(w=width,
### Multilingual Stable Diffusion Pipeline ### Multilingual Stable Diffusion Pipeline
The following code can generate an images from texts in different languages using the pre-trained [mBART-50 many-to-one multilingual machine translation model](https://huggingface.co/facebook/mbart-large-50-many-to-one-mmt) and Stable Diffusion. The following code can generate images from texts in different languages using the pre-trained [mBART-50 many-to-one multilingual machine translation model](https://huggingface.co/facebook/mbart-large-50-many-to-one-mmt) and Stable Diffusion.
```python ```python
from PIL import Image from PIL import Image
...@@ -881,7 +871,6 @@ diffuser_pipeline = DiffusionPipeline.from_pretrained( ...@@ -881,7 +871,6 @@ diffuser_pipeline = DiffusionPipeline.from_pretrained(
detection_pipeline=language_detection_pipeline, detection_pipeline=language_detection_pipeline,
translation_model=trans_model, translation_model=trans_model,
translation_tokenizer=trans_tokenizer, translation_tokenizer=trans_tokenizer,
torch_dtype=torch.float16, torch_dtype=torch.float16,
) )
...@@ -905,9 +894,9 @@ This example produces the following images: ...@@ -905,9 +894,9 @@ This example produces the following images:
### GlueGen Stable Diffusion Pipeline ### GlueGen Stable Diffusion Pipeline
GlueGen is a minimal adapter that allow alignment between any encoder (Text Encoder of different language, Multilingual Roberta, AudioClip) and CLIP text encoder used in standard Stable Diffusion model. This method allows easy language adaptation to available english Stable Diffusion checkpoints without the need of an image captioning dataset as well as long training hours. GlueGen is a minimal adapter that allows alignment between any encoder (Text Encoder of different language, Multilingual Roberta, AudioClip) and CLIP text encoder used in standard Stable Diffusion model. This method allows easy language adaptation to available english Stable Diffusion checkpoints without the need of an image captioning dataset as well as long training hours.
Make sure you downloaded `gluenet_French_clip_overnorm_over3_noln.ckpt` for French (there are also pre-trained weights for Chinese, Italian, Japanese, Spanish or train your own) at [GlueGen's official repo](https://github.com/salesforce/GlueGen/tree/main) Make sure you downloaded `gluenet_French_clip_overnorm_over3_noln.ckpt` for French (there are also pre-trained weights for Chinese, Italian, Japanese, Spanish or train your own) at [GlueGen's official repo](https://github.com/salesforce/GlueGen/tree/main).
```python ```python
from PIL import Image from PIL import Image
...@@ -974,7 +963,6 @@ mask_image = PIL.Image.open(mask_path).convert("RGB").resize((512, 512)) ...@@ -974,7 +963,6 @@ mask_image = PIL.Image.open(mask_path).convert("RGB").resize((512, 512))
pipe = DiffusionPipeline.from_pretrained( pipe = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-inpainting", "runwayml/stable-diffusion-inpainting",
custom_pipeline="img2img_inpainting", custom_pipeline="img2img_inpainting",
torch_dtype=torch.float16 torch_dtype=torch.float16
) )
pipe = pipe.to("cuda") pipe = pipe.to("cuda")
...@@ -1019,13 +1007,13 @@ image = pipe(image=image, text=text, prompt=prompt).images[0] ...@@ -1019,13 +1007,13 @@ image = pipe(image=image, text=text, prompt=prompt).images[0]
### Bit Diffusion ### Bit Diffusion
Based <https://arxiv.org/abs/2208.04202>, this is used for diffusion on discrete data - eg, discreate image data, DNA sequence data. An unconditional discreate image can be generated like this: Based <https://arxiv.org/abs/2208.04202>, this is used for diffusion on discrete data - eg, discrete image data, DNA sequence data. An unconditional discrete image can be generated like this:
```python ```python
from diffusers import DiffusionPipeline from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained("google/ddpm-cifar10-32", custom_pipeline="bit_diffusion") pipe = DiffusionPipeline.from_pretrained("google/ddpm-cifar10-32", custom_pipeline="bit_diffusion")
image = pipe().images[0] image = pipe().images[0]
``` ```
### Stable Diffusion with K Diffusion ### Stable Diffusion with K Diffusion
...@@ -1091,37 +1079,36 @@ image = pipe(prompt, generator=generator, num_inference_steps=50).images[0] ...@@ -1091,37 +1079,36 @@ image = pipe(prompt, generator=generator, num_inference_steps=50).images[0]
### Checkpoint Merger Pipeline ### Checkpoint Merger Pipeline
Based on the AUTOMATIC1111/webui for checkpoint merging. This is a custom pipeline that merges upto 3 pretrained model checkpoints as long as they are in the HuggingFace model_index.json format. Based on the AUTOMATIC1111/webui for checkpoint merging. This is a custom pipeline that merges up to 3 pretrained model checkpoints as long as they are in the HuggingFace model_index.json format.
The checkpoint merging is currently memory intensive as it modifies the weights of a DiffusionPipeline object in place. Expect at least 13GB RAM Usage on Kaggle GPU kernels and The checkpoint merging is currently memory intensive as it modifies the weights of a DiffusionPipeline object in place. Expect at least 13GB RAM usage on Kaggle GPU kernels and
on colab you might run out of the 12GB memory even while merging two checkpoints. on Colab you might run out of the 12GB memory even while merging two checkpoints.
Usage:- Usage:-
```python ```python
from diffusers import DiffusionPipeline from diffusers import DiffusionPipeline
#Return a CheckpointMergerPipeline class that allows you to merge checkpoints. # Return a CheckpointMergerPipeline class that allows you to merge checkpoints.
#The checkpoint passed here is ignored. But still pass one of the checkpoints you plan to # The checkpoint passed here is ignored. But still pass one of the checkpoints you plan to
#merge for convenience # merge for convenience
pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", custom_pipeline="checkpoint_merger") pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", custom_pipeline="checkpoint_merger")
#There are multiple possible scenarios: # There are multiple possible scenarios:
#The pipeline with the merged checkpoints is returned in all the scenarios # The pipeline with the merged checkpoints is returned in all the scenarios
#Compatible checkpoints a.k.a matched model_index.json files. Ignores the meta attributes in model_index.json during comparison.( attrs with _ as prefix ) # Compatible checkpoints a.k.a matched model_index.json files. Ignores the meta attributes in model_index.json during comparison.( attrs with _ as prefix )
merged_pipe = pipe.merge(["CompVis/stable-diffusion-v1-4","CompVis/stable-diffusion-v1-2"], interp = "sigmoid", alpha = 0.4) merged_pipe = pipe.merge(["CompVis/stable-diffusion-v1-4"," CompVis/stable-diffusion-v1-2"], interp="sigmoid", alpha=0.4)
#Incompatible checkpoints in model_index.json but merge might be possible. Use force = True to ignore model_index.json compatibility # Incompatible checkpoints in model_index.json but merge might be possible. Use force=True to ignore model_index.json compatibility
merged_pipe_1 = pipe.merge(["CompVis/stable-diffusion-v1-4","hakurei/waifu-diffusion"], force = True, interp = "sigmoid", alpha = 0.4) merged_pipe_1 = pipe.merge(["CompVis/stable-diffusion-v1-4", "hakurei/waifu-diffusion"], force=True, interp="sigmoid", alpha=0.4)
#Three checkpoint merging. Only "add_difference" method actually works on all three checkpoints. Using any other options will ignore the 3rd checkpoint. # Three checkpoint merging. Only "add_difference" method actually works on all three checkpoints. Using any other options will ignore the 3rd checkpoint.
merged_pipe_2 = pipe.merge(["CompVis/stable-diffusion-v1-4","hakurei/waifu-diffusion","prompthero/openjourney"], force = True, interp = "add_difference", alpha = 0.4) merged_pipe_2 = pipe.merge(["CompVis/stable-diffusion-v1-4", "hakurei/waifu-diffusion", "prompthero/openjourney"], force=True, interp="add_difference", alpha=0.4)
prompt = "An astronaut riding a horse on Mars" prompt = "An astronaut riding a horse on Mars"
image = merged_pipe(prompt).images[0] image = merged_pipe(prompt).images[0]
``` ```
Some examples along with the merge details: Some examples along with the merge details:
...@@ -1132,7 +1119,7 @@ Some examples along with the merge details: ...@@ -1132,7 +1119,7 @@ Some examples along with the merge details:
2. "hakurei/waifu-diffusion" + "prompthero/openjourney" ; Inverse Sigmoid interpolation; alpha = 0.8 2. "hakurei/waifu-diffusion" + "prompthero/openjourney" ; Inverse Sigmoid interpolation; alpha = 0.8
![Stable plus Waifu Sigmoid 0.8](https://huggingface.co/datasets/NagaSaiAbhinay/CheckpointMergerSamples/resolve/main/waifu_openjourney_inv_sig_0.8.png) ![Waifu plus openjourney Sigmoid 0.8](https://huggingface.co/datasets/NagaSaiAbhinay/CheckpointMergerSamples/resolve/main/waifu_openjourney_inv_sig_0.8.png)
3. "CompVis/stable-diffusion-v1-4" + "hakurei/waifu-diffusion" + "prompthero/openjourney"; Add Difference interpolation; alpha = 0.5 3. "CompVis/stable-diffusion-v1-4" + "hakurei/waifu-diffusion" + "prompthero/openjourney"; Add Difference interpolation; alpha = 0.5
...@@ -1197,16 +1184,16 @@ from PIL import Image ...@@ -1197,16 +1184,16 @@ from PIL import Image
pipe = DiffusionPipeline.from_pretrained( pipe = DiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4", "CompVis/stable-diffusion-v1-4",
custom_pipeline="magic_mix", custom_pipeline="magic_mix",
scheduler = DDIMScheduler.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="scheduler"), scheduler=DDIMScheduler.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="scheduler"),
).to('cuda') ).to('cuda')
img = Image.open('phone.jpg') img = Image.open('phone.jpg')
mix_img = pipe( mix_img = pipe(
img, img,
prompt = 'bed', prompt='bed',
kmin = 0.3, kmin=0.3,
kmax = 0.5, kmax=0.5,
mix_factor = 0.5, mix_factor=0.5,
) )
mix_img.save('phone_bed_mix.jpg') mix_img.save('phone_bed_mix.jpg')
``` ```
...@@ -1227,8 +1214,8 @@ For more example generations check out this [demo notebook](https://github.com/d ...@@ -1227,8 +1214,8 @@ For more example generations check out this [demo notebook](https://github.com/d
### Stable UnCLIP ### Stable UnCLIP
UnCLIPPipeline("kakaobrain/karlo-v1-alpha") provide a prior model that can generate clip image embedding from text. UnCLIPPipeline("kakaobrain/karlo-v1-alpha") provides a prior model that can generate clip image embedding from text.
StableDiffusionImageVariationPipeline("lambdalabs/sd-image-variations-diffusers") provide a decoder model than can generate images from clip image embedding. StableDiffusionImageVariationPipeline("lambdalabs/sd-image-variations-diffusers") provides a decoder model than can generate images from clip image embedding.
```python ```python
import torch import torch
...@@ -1269,7 +1256,7 @@ image.save("./shiba-inu.jpg") ...@@ -1269,7 +1256,7 @@ image.save("./shiba-inu.jpg")
print(pipeline.decoder_pipe.__class__) print(pipeline.decoder_pipe.__class__)
# <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_image_variation.StableDiffusionImageVariationPipeline'> # <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_image_variation.StableDiffusionImageVariationPipeline'>
# this pipeline only use prior module in "kakaobrain/karlo-v1-alpha" # this pipeline only uses prior module in "kakaobrain/karlo-v1-alpha"
# It is used to convert clip text embedding to clip image embedding. # It is used to convert clip text embedding to clip image embedding.
print(pipeline) print(pipeline)
# StableUnCLIPPipeline { # StableUnCLIPPipeline {
...@@ -1329,10 +1316,10 @@ pipe.to(device) ...@@ -1329,10 +1316,10 @@ pipe.to(device)
start_prompt = "A photograph of an adult lion" start_prompt = "A photograph of an adult lion"
end_prompt = "A photograph of a lion cub" end_prompt = "A photograph of a lion cub"
#For best results keep the prompts close in length to each other. Of course, feel free to try out with differing lengths. # For best results keep the prompts close in length to each other. Of course, feel free to try out with differing lengths.
generator = torch.Generator(device=device).manual_seed(42) generator = torch.Generator(device=device).manual_seed(42)
output = pipe(start_prompt, end_prompt, steps = 6, generator = generator, enable_sequential_cpu_offload=False) output = pipe(start_prompt, end_prompt, steps=6, generator=generator, enable_sequential_cpu_offload=False)
for i,image in enumerate(output.images): for i,image in enumerate(output.images):
img.save('result%s.jpg' % i) img.save('result%s.jpg' % i)
...@@ -1367,10 +1354,10 @@ pipe = DiffusionPipeline.from_pretrained( ...@@ -1367,10 +1354,10 @@ pipe = DiffusionPipeline.from_pretrained(
pipe.to(device) pipe.to(device)
images = [Image.open('./starry_night.jpg'), Image.open('./flowers.jpg')] images = [Image.open('./starry_night.jpg'), Image.open('./flowers.jpg')]
#For best results keep the prompts close in length to each other. Of course, feel free to try out with differing lengths. # For best results keep the prompts close in length to each other. Of course, feel free to try out with differing lengths.
generator = torch.Generator(device=device).manual_seed(42) generator = torch.Generator(device=device).manual_seed(42)
output = pipe(image = images ,steps = 6, generator = generator) output = pipe(image=images, steps=6, generator=generator)
for i,image in enumerate(output.images): for i,image in enumerate(output.images):
image.save('starry_to_flowers_%s.jpg' % i) image.save('starry_to_flowers_%s.jpg' % i)
...@@ -1392,7 +1379,7 @@ The resulting images in order:- ...@@ -1392,7 +1379,7 @@ The resulting images in order:-
### DDIM Noise Comparative Analysis Pipeline ### DDIM Noise Comparative Analysis Pipeline
#### **Research question: What visual concepts do the diffusion models learn from each noise level during training?** #### **Research question: What visual concepts do the diffusion models learn from each noise level during training?**
The [P2 weighting (CVPR 2022)](https://arxiv.org/abs/2204.00227) paper proposed an approach to answer the above question, which is their second contribution. The [P2 weighting (CVPR 2022)](https://arxiv.org/abs/2204.00227) paper proposed an approach to answer the above question, which is their second contribution.
The approach consists of the following steps: The approach consists of the following steps:
...@@ -1448,6 +1435,7 @@ import torch ...@@ -1448,6 +1435,7 @@ import torch
from diffusers import DiffusionPipeline from diffusers import DiffusionPipeline
from PIL import Image from PIL import Image
from transformers import CLIPFeatureExtractor, CLIPModel from transformers import CLIPFeatureExtractor, CLIPModel
feature_extractor = CLIPFeatureExtractor.from_pretrained( feature_extractor = CLIPFeatureExtractor.from_pretrained(
"laion/CLIP-ViT-B-32-laion2B-s34B-b79K" "laion/CLIP-ViT-B-32-laion2B-s34B-b79K"
) )
...@@ -1622,6 +1610,7 @@ import requests ...@@ -1622,6 +1610,7 @@ import requests
import torch import torch
from io import BytesIO from io import BytesIO
from diffusers import StableDiffusionPipeline, RePaintScheduler from diffusers import StableDiffusionPipeline, RePaintScheduler
def download_image(url): def download_image(url):
response = requests.get(url) response = requests.get(url)
return PIL.Image.open(BytesIO(response.content)).convert("RGB") return PIL.Image.open(BytesIO(response.content)).convert("RGB")
...@@ -1679,7 +1668,7 @@ image.save('tensorrt_img2img_new_zealand_hills.png') ...@@ -1679,7 +1668,7 @@ image.save('tensorrt_img2img_new_zealand_hills.png')
``` ```
### Stable Diffusion BoxDiff ### Stable Diffusion BoxDiff
BoxDiff is a training-free method for controlled generation with bounding box coordinates. It shoud work with any Stable Diffusion model. Below shows an example with `stable-diffusion-2-1-base`. BoxDiff is a training-free method for controlled generation with bounding box coordinates. It should work with any Stable Diffusion model. Below shows an example with `stable-diffusion-2-1-base`.
```py ```py
import torch import torch
from PIL import Image, ImageDraw from PIL import Image, ImageDraw
...@@ -1839,13 +1828,13 @@ Output Image ...@@ -1839,13 +1828,13 @@ Output Image
### Stable Diffusion on IPEX ### Stable Diffusion on IPEX
This diffusion pipeline aims to accelarate the inference of Stable-Diffusion on Intel Xeon CPUs with BF16/FP32 precision using [IPEX](https://github.com/intel/intel-extension-for-pytorch). This diffusion pipeline aims to accelerate the inference of Stable-Diffusion on Intel Xeon CPUs with BF16/FP32 precision using [IPEX](https://github.com/intel/intel-extension-for-pytorch).
To use this pipeline, you need to: To use this pipeline, you need to:
1. Install [IPEX](https://github.com/intel/intel-extension-for-pytorch) 1. Install [IPEX](https://github.com/intel/intel-extension-for-pytorch)
**Note:** For each PyTorch release, there is a corresponding release of the IPEX. Here is the mapping relationship. It is recommended to install Pytorch/IPEX2.0 to get the best performance. **Note:** For each PyTorch release, there is a corresponding release of the IPEX. Here is the mapping relationship. It is recommended to install PyTorch/IPEX2.0 to get the best performance.
|PyTorch Version|IPEX Version| |PyTorch Version|IPEX Version|
|--|--| |--|--|
...@@ -1864,26 +1853,26 @@ python -m pip install intel_extension_for_pytorch ...@@ -1864,26 +1853,26 @@ python -m pip install intel_extension_for_pytorch
python -m pip install intel_extension_for_pytorch==<version_name> -f https://developer.intel.com/ipex-whl-stable-cpu python -m pip install intel_extension_for_pytorch==<version_name> -f https://developer.intel.com/ipex-whl-stable-cpu
``` ```
2. After pipeline initialization, `prepare_for_ipex()` should be called to enable IPEX accelaration. Supported inference datatypes are Float32 and BFloat16. 2. After pipeline initialization, `prepare_for_ipex()` should be called to enable IPEX acceleration. Supported inference datatypes are Float32 and BFloat16.
**Note:** The setting of generated image height/width for `prepare_for_ipex()` should be same as the setting of pipeline inference. **Note:** The setting of generated image height/width for `prepare_for_ipex()` should be same as the setting of pipeline inference.
```python ```python
pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", custom_pipeline="stable_diffusion_ipex") pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", custom_pipeline="stable_diffusion_ipex")
# For Float32 # For Float32
pipe.prepare_for_ipex(prompt, dtype=torch.float32, height=512, width=512) #value of image height/width should be consistent with the pipeline inference pipe.prepare_for_ipex(prompt, dtype=torch.float32, height=512, width=512) # value of image height/width should be consistent with the pipeline inference
# For BFloat16 # For BFloat16
pipe.prepare_for_ipex(prompt, dtype=torch.bfloat16, height=512, width=512) #value of image height/width should be consistent with the pipeline inference pipe.prepare_for_ipex(prompt, dtype=torch.bfloat16, height=512, width=512) # value of image height/width should be consistent with the pipeline inference
``` ```
Then you can use the ipex pipeline in a similar way to the default stable diffusion pipeline. Then you can use the ipex pipeline in a similar way to the default stable diffusion pipeline.
```python ```python
# For Float32 # For Float32
image = pipe(prompt, num_inference_steps=20, height=512, width=512).images[0] #value of image height/width should be consistent with 'prepare_for_ipex()' image = pipe(prompt, num_inference_steps=20, height=512, width=512).images[0] # value of image height/width should be consistent with 'prepare_for_ipex()'
# For BFloat16 # For BFloat16
with torch.cpu.amp.autocast(enabled=True, dtype=torch.bfloat16): with torch.cpu.amp.autocast(enabled=True, dtype=torch.bfloat16):
image = pipe(prompt, num_inference_steps=20, height=512, width=512).images[0] #value of image height/width should be consistent with 'prepare_for_ipex()' image = pipe(prompt, num_inference_steps=20, height=512, width=512).images[0] # value of image height/width should be consistent with 'prepare_for_ipex()'
``` ```
The following code compares the performance of the original stable diffusion pipeline with the ipex-optimized pipeline. The following code compares the performance of the original stable diffusion pipeline with the ipex-optimized pipeline.
...@@ -1901,7 +1890,7 @@ def elapsed_time(pipeline, nb_pass=3, num_inference_steps=20): ...@@ -1901,7 +1890,7 @@ def elapsed_time(pipeline, nb_pass=3, num_inference_steps=20):
# warmup # warmup
for _ in range(2): for _ in range(2):
images = pipeline(prompt, num_inference_steps=num_inference_steps, height=512, width=512).images images = pipeline(prompt, num_inference_steps=num_inference_steps, height=512, width=512).images
#time evaluation # time evaluation
start = time.time() start = time.time()
for _ in range(nb_pass): for _ in range(nb_pass):
pipeline(prompt, num_inference_steps=num_inference_steps, height=512, width=512) pipeline(prompt, num_inference_steps=num_inference_steps, height=512, width=512)
...@@ -1922,7 +1911,7 @@ with torch.cpu.amp.autocast(enabled=True, dtype=torch.bfloat16): ...@@ -1922,7 +1911,7 @@ with torch.cpu.amp.autocast(enabled=True, dtype=torch.bfloat16):
latency = elapsed_time(pipe) latency = elapsed_time(pipe)
print("Latency of StableDiffusionIPEXPipeline--bf16", latency) print("Latency of StableDiffusionIPEXPipeline--bf16", latency)
latency = elapsed_time(pipe2) latency = elapsed_time(pipe2)
print("Latency of StableDiffusionPipeline--bf16",latency) print("Latency of StableDiffusionPipeline--bf16", latency)
############## fp32 inference performance ############### ############## fp32 inference performance ###############
...@@ -1937,13 +1926,12 @@ pipe4 = StableDiffusionPipeline.from_pretrained(model_id) ...@@ -1937,13 +1926,12 @@ pipe4 = StableDiffusionPipeline.from_pretrained(model_id)
latency = elapsed_time(pipe3) latency = elapsed_time(pipe3)
print("Latency of StableDiffusionIPEXPipeline--fp32", latency) print("Latency of StableDiffusionIPEXPipeline--fp32", latency)
latency = elapsed_time(pipe4) latency = elapsed_time(pipe4)
print("Latency of StableDiffusionPipeline--fp32",latency) print("Latency of StableDiffusionPipeline--fp32", latency)
``` ```
### Stable Diffusion XL on IPEX ### Stable Diffusion XL on IPEX
This diffusion pipeline aims to accelarate the inference of Stable-Diffusion XL on Intel Xeon CPUs with BF16/FP32 precision using [IPEX](https://github.com/intel/intel-extension-for-pytorch). This diffusion pipeline aims to accelerate the inference of Stable-Diffusion XL on Intel Xeon CPUs with BF16/FP32 precision using [IPEX](https://github.com/intel/intel-extension-for-pytorch).
To use this pipeline, you need to: To use this pipeline, you need to:
...@@ -1968,7 +1956,7 @@ python -m pip install intel_extension_for_pytorch ...@@ -1968,7 +1956,7 @@ python -m pip install intel_extension_for_pytorch
python -m pip install intel_extension_for_pytorch==<version_name> -f https://developer.intel.com/ipex-whl-stable-cpu python -m pip install intel_extension_for_pytorch==<version_name> -f https://developer.intel.com/ipex-whl-stable-cpu
``` ```
2. After pipeline initialization, `prepare_for_ipex()` should be called to enable IPEX accelaration. Supported inference datatypes are Float32 and BFloat16. 2. After pipeline initialization, `prepare_for_ipex()` should be called to enable IPEX acceleration. Supported inference datatypes are Float32 and BFloat16.
**Note:** The values of `height` and `width` used during preparation with `prepare_for_ipex()` should be the same when running inference with the prepared pipeline. **Note:** The values of `height` and `width` used during preparation with `prepare_for_ipex()` should be the same when running inference with the prepared pipeline.
...@@ -2011,7 +1999,7 @@ def elapsed_time(pipeline, nb_pass=3, num_inference_steps=1): ...@@ -2011,7 +1999,7 @@ def elapsed_time(pipeline, nb_pass=3, num_inference_steps=1):
# warmup # warmup
for _ in range(2): for _ in range(2):
images = pipeline(prompt, num_inference_steps=num_inference_steps, height=512, width=512, guidance_scale=0.0).images images = pipeline(prompt, num_inference_steps=num_inference_steps, height=512, width=512, guidance_scale=0.0).images
#time evaluation # time evaluation
start = time.time() start = time.time()
for _ in range(nb_pass): for _ in range(nb_pass):
pipeline(prompt, num_inference_steps=num_inference_steps, height=512, width=512, guidance_scale=0.0) pipeline(prompt, num_inference_steps=num_inference_steps, height=512, width=512, guidance_scale=0.0)
...@@ -2047,8 +2035,7 @@ pipe4 = StableDiffusionXLPipeline.from_pretrained(model_id, low_cpu_mem_usage=Tr ...@@ -2047,8 +2035,7 @@ pipe4 = StableDiffusionXLPipeline.from_pretrained(model_id, low_cpu_mem_usage=Tr
latency = elapsed_time(pipe3, num_inference_steps=steps) latency = elapsed_time(pipe3, num_inference_steps=steps)
print("Latency of StableDiffusionXLPipelineIpex--fp32", latency, "s for total", steps, "steps") print("Latency of StableDiffusionXLPipelineIpex--fp32", latency, "s for total", steps, "steps")
latency = elapsed_time(pipe4, num_inference_steps=steps) latency = elapsed_time(pipe4, num_inference_steps=steps)
print("Latency of StableDiffusionXLPipeline--fp32",latency, "s for total", steps, "steps") print("Latency of StableDiffusionXLPipeline--fp32", latency, "s for total", steps, "steps")
``` ```
### CLIP Guided Images Mixing With Stable Diffusion ### CLIP Guided Images Mixing With Stable Diffusion
...@@ -2061,7 +2048,7 @@ This approach is using (optional) CoCa model to avoid writing image description. ...@@ -2061,7 +2048,7 @@ This approach is using (optional) CoCa model to avoid writing image description.
### Stable Diffusion XL Long Weighted Prompt Pipeline ### Stable Diffusion XL Long Weighted Prompt Pipeline
This SDXL pipeline support unlimited length prompt and negative prompt, compatible with A1111 prompt weighted style. This SDXL pipeline supports unlimited length prompt and negative prompt, compatible with A1111 prompt weighted style.
You can provide both `prompt` and `prompt_2`. If only one prompt is provided, `prompt_2` will be a copy of the provided `prompt`. Here is a sample code to use this pipeline. You can provide both `prompt` and `prompt_2`. If only one prompt is provided, `prompt_2` will be a copy of the provided `prompt`. Here is a sample code to use this pipeline.
...@@ -2153,9 +2140,9 @@ coca_model = open_clip.create_model('coca_ViT-L-14', pretrained='laion2B-s13B-b9 ...@@ -2153,9 +2140,9 @@ coca_model = open_clip.create_model('coca_ViT-L-14', pretrained='laion2B-s13B-b9
coca_model.dtype = torch.float16 coca_model.dtype = torch.float16
coca_transform = open_clip.image_transform( coca_transform = open_clip.image_transform(
coca_model.visual.image_size, coca_model.visual.image_size,
is_train = False, is_train=False,
mean = getattr(coca_model.visual, 'image_mean', None), mean=getattr(coca_model.visual, 'image_mean', None),
std = getattr(coca_model.visual, 'image_std', None), std=getattr(coca_model.visual, 'image_std', None),
) )
coca_tokenizer = SimpleTokenizer() coca_tokenizer = SimpleTokenizer()
...@@ -2207,7 +2194,7 @@ This pipeline uses the Mixture. Refer to the [Mixture](https://arxiv.org/abs/230 ...@@ -2207,7 +2194,7 @@ This pipeline uses the Mixture. Refer to the [Mixture](https://arxiv.org/abs/230
```python ```python
from diffusers import LMSDiscreteScheduler, DiffusionPipeline from diffusers import LMSDiscreteScheduler, DiffusionPipeline
# Creater scheduler and model (similar to StableDiffusionPipeline) # Create scheduler and model (similar to StableDiffusionPipeline)
scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000) scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
pipeline = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", scheduler=scheduler, custom_pipeline="mixture_tiling") pipeline = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", scheduler=scheduler, custom_pipeline="mixture_tiling")
pipeline.to("cuda") pipeline.to("cuda")
...@@ -2248,7 +2235,6 @@ from diffusers.pipelines.stable_diffusion import StableDiffusionInpaintPipeline ...@@ -2248,7 +2235,6 @@ from diffusers.pipelines.stable_diffusion import StableDiffusionInpaintPipeline
# Use the PNDMScheduler scheduler here instead # Use the PNDMScheduler scheduler here instead
scheduler = PNDMScheduler.from_pretrained("stabilityai/stable-diffusion-2-inpainting", subfolder="scheduler") scheduler = PNDMScheduler.from_pretrained("stabilityai/stable-diffusion-2-inpainting", subfolder="scheduler")
pipe = StableDiffusionInpaintPipeline.from_pretrained("stabilityai/stable-diffusion-2-inpainting", pipe = StableDiffusionInpaintPipeline.from_pretrained("stabilityai/stable-diffusion-2-inpainting",
custom_pipeline="stable_diffusion_tensorrt_inpaint", custom_pipeline="stable_diffusion_tensorrt_inpaint",
variant='fp16', variant='fp16',
...@@ -2287,7 +2273,7 @@ from diffusers.pipelines.pipeline_utils import Image2ImageRegion, Text2ImageRegi ...@@ -2287,7 +2273,7 @@ from diffusers.pipelines.pipeline_utils import Image2ImageRegion, Text2ImageRegi
# Load and preprocess guide image # Load and preprocess guide image
iic_image = preprocess_image(Image.open("input_image.png").convert("RGB")) iic_image = preprocess_image(Image.open("input_image.png").convert("RGB"))
# Creater scheduler and model (similar to StableDiffusionPipeline) # Create scheduler and model (similar to StableDiffusionPipeline)
scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000) scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
pipeline = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", scheduler=scheduler).to("cuda:0", custom_pipeline="mixture_canvas") pipeline = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", scheduler=scheduler).to("cuda:0", custom_pipeline="mixture_canvas")
pipeline.to("cuda") pipeline.to("cuda")
...@@ -2298,7 +2284,7 @@ output = pipeline( ...@@ -2298,7 +2284,7 @@ output = pipeline(
canvas_width=352, canvas_width=352,
regions=[ regions=[
Text2ImageRegion(0, 800, 0, 352, guidance_scale=8, Text2ImageRegion(0, 800, 0, 352, guidance_scale=8,
prompt=f"best quality, masterpiece, WLOP, sakimichan, art contest winner on pixiv, 8K, intricate details, wet effects, rain drops, ethereal, mysterious, futuristic, UHD, HDR, cinematic lighting, in a beautiful forest, rainy day, award winning, trending on artstation, beautiful confident cheerful young woman, wearing a futuristic sleeveless dress, ultra beautiful detailed eyes, hyper-detailed face, complex, perfect, model,  textured, chiaroscuro, professional make-up, realistic, figure in frame, "), prompt=f"best quality, masterpiece, WLOP, sakimichan, art contest winner on pixiv, 8K, intricate details, wet effects, rain drops, ethereal, mysterious, futuristic, UHD, HDR, cinematic lighting, in a beautiful forest, rainy day, award winning, trending on artstation, beautiful confident cheerful young woman, wearing a futuristic sleeveless dress, ultra beautiful detailed eyes, hyper-detailed face, complex, perfect, model, textured, chiaroscuro, professional make-up, realistic, figure in frame, "),
Image2ImageRegion(352-800, 352, 0, 352, reference_image=iic_image, strength=1.0), Image2ImageRegion(352-800, 352, 0, 352, reference_image=iic_image, strength=1.0),
], ],
num_inference_steps=100, num_inference_steps=100,
...@@ -2317,22 +2303,19 @@ It is a simple and minimalist diffusion model. ...@@ -2317,22 +2303,19 @@ It is a simple and minimalist diffusion model.
The following code shows how to use the IADB pipeline to generate images using a pretrained celebahq-256 model. The following code shows how to use the IADB pipeline to generate images using a pretrained celebahq-256 model.
```python ```python
pipeline_iadb = DiffusionPipeline.from_pretrained("thomasc4/iadb-celebahq-256", custom_pipeline='iadb') pipeline_iadb = DiffusionPipeline.from_pretrained("thomasc4/iadb-celebahq-256", custom_pipeline='iadb')
pipeline_iadb = pipeline_iadb.to('cuda') pipeline_iadb = pipeline_iadb.to('cuda')
output = pipeline_iadb(batch_size=4,num_inference_steps=128) output = pipeline_iadb(batch_size=4, num_inference_steps=128)
for i in range(len(output[0])): for i in range(len(output[0])):
plt.imshow(output[0][i]) plt.imshow(output[0][i])
plt.show() plt.show()
``` ```
Sampling with the IADB formulation is easy, and can be done in a few lines (the pipeline already implements it): Sampling with the IADB formulation is easy, and can be done in a few lines (the pipeline already implements it):
```python ```python
def sample_iadb(model, x0, nb_step): def sample_iadb(model, x0, nb_step):
x_alpha = x0 x_alpha = x0
for t in range(nb_step): for t in range(nb_step):
...@@ -2343,13 +2326,11 @@ def sample_iadb(model, x0, nb_step): ...@@ -2343,13 +2326,11 @@ def sample_iadb(model, x0, nb_step):
x_alpha = x_alpha + (alpha_next-alpha)*d x_alpha = x_alpha + (alpha_next-alpha)*d
return x_alpha return x_alpha
``` ```
The training loop is also straightforward: The training loop is also straightforward:
```python ```python
# Training loop # Training loop
while True: while True:
x0 = sample_noise() x0 = sample_noise()
...@@ -2401,9 +2382,9 @@ query_pose3 = [-55.0, 90.0, 0.0] ...@@ -2401,9 +2382,9 @@ query_pose3 = [-55.0, 90.0, 0.0]
# H, W = (256, 256) # H, W = (512, 512) # zero123 training is 256,256 # H, W = (256, 256) # H, W = (512, 512) # zero123 training is 256,256
# for batch input # for batch input
input_image1 = load_image("./demo/4_blackarm.png") #load_image("https://cvlab-zero123-live.hf.space/file=/home/user/app/configs/4_blackarm.png") input_image1 = load_image("./demo/4_blackarm.png") # load_image("https://cvlab-zero123-live.hf.space/file=/home/user/app/configs/4_blackarm.png")
input_image2 = load_image("./demo/8_motor.png") #load_image("https://cvlab-zero123-live.hf.space/file=/home/user/app/configs/8_motor.png") input_image2 = load_image("./demo/8_motor.png") # load_image("https://cvlab-zero123-live.hf.space/file=/home/user/app/configs/8_motor.png")
input_image3 = load_image("./demo/7_london.png") #load_image("https://cvlab-zero123-live.hf.space/file=/home/user/app/configs/7_london.png") input_image3 = load_image("./demo/7_london.png") # load_image("https://cvlab-zero123-live.hf.space/file=/home/user/app/configs/7_london.png")
input_images = [input_image1, input_image2, input_image3] input_images = [input_image1, input_image2, input_image3]
query_poses = [query_pose1, query_pose2, query_pose3] query_poses = [query_pose1, query_pose2, query_pose3]
...@@ -2434,7 +2415,6 @@ input_images = pre_images ...@@ -2434,7 +2415,6 @@ input_images = pre_images
images = pipe(input_imgs=input_images, prompt_imgs=input_images, poses=query_poses, height=H, width=W, images = pipe(input_imgs=input_images, prompt_imgs=input_images, poses=query_poses, height=H, width=W,
guidance_scale=3.0, num_images_per_prompt=num_images_per_prompt, num_inference_steps=50).images guidance_scale=3.0, num_images_per_prompt=num_images_per_prompt, num_inference_steps=50).images
# save imgs # save imgs
log_dir = "logs" log_dir = "logs"
os.makedirs(log_dir, exist_ok=True) os.makedirs(log_dir, exist_ok=True)
...@@ -2444,12 +2424,11 @@ for obj in range(bs): ...@@ -2444,12 +2424,11 @@ for obj in range(bs):
for idx in range(num_images_per_prompt): for idx in range(num_images_per_prompt):
images[i].save(os.path.join(log_dir,f"obj{obj}_{idx}.jpg")) images[i].save(os.path.join(log_dir,f"obj{obj}_{idx}.jpg"))
i += 1 i += 1
``` ```
### Stable Diffusion XL Reference ### Stable Diffusion XL Reference
This pipeline uses the Reference . Refer to the [stable_diffusion_reference](https://github.com/huggingface/diffusers/blob/main/examples/community/README.md#stable-diffusion-reference). This pipeline uses the Reference. Refer to the [stable_diffusion_reference](https://github.com/huggingface/diffusers/blob/main/examples/community/README.md#stable-diffusion-reference).
```py ```py
import torch import torch
...@@ -2457,6 +2436,7 @@ from PIL import Image ...@@ -2457,6 +2436,7 @@ from PIL import Image
from diffusers.utils import load_image from diffusers.utils import load_image
from diffusers import DiffusionPipeline from diffusers import DiffusionPipeline
from diffusers.schedulers import UniPCMultistepScheduler from diffusers.schedulers import UniPCMultistepScheduler
input_image = load_image("https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png") input_image = load_image("https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png")
# pipe = DiffusionPipeline.from_pretrained( # pipe = DiffusionPipeline.from_pretrained(
...@@ -2529,7 +2509,7 @@ from diffusers import DiffusionPipeline ...@@ -2529,7 +2509,7 @@ from diffusers import DiffusionPipeline
# load the pipeline # load the pipeline
# make sure you're logged in with `huggingface-cli login` # make sure you're logged in with `huggingface-cli login`
model_id_or_path = "runwayml/stable-diffusion-v1-5" model_id_or_path = "runwayml/stable-diffusion-v1-5"
#can also be used with dreamlike-art/dreamlike-photoreal-2.0 # can also be used with dreamlike-art/dreamlike-photoreal-2.0
pipe = DiffusionPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16, custom_pipeline="pipeline_fabric").to("cuda") pipe = DiffusionPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16, custom_pipeline="pipeline_fabric").to("cuda")
# let's specify a prompt # let's specify a prompt
...@@ -2560,7 +2540,7 @@ torch.manual_seed(0) ...@@ -2560,7 +2540,7 @@ torch.manual_seed(0)
image = pipe( image = pipe(
prompt=prompt, prompt=prompt,
negative_prompt=negative_prompt, negative_prompt=negative_prompt,
liked = liked, liked=liked,
num_inference_steps=20, num_inference_steps=20,
).images[0] ).images[0]
...@@ -2730,7 +2710,7 @@ pipe.to(torch_device="cuda", torch_dtype=torch.float32) ...@@ -2730,7 +2710,7 @@ pipe.to(torch_device="cuda", torch_dtype=torch.float32)
```py ```py
prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k" prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"
# Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps. # Can be set to 1~50 steps. LCM supports fast inference even <= 4 steps. Recommend: 1~8 steps.
num_inference_steps = 4 num_inference_steps = 4
images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=8.0, lcm_origin_steps=50, output_type="pil").images images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=8.0, lcm_origin_steps=50, output_type="pil").images
...@@ -2762,9 +2742,9 @@ prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k" ...@@ -2762,9 +2742,9 @@ prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"
input_image=Image.open("myimg.png") input_image=Image.open("myimg.png")
strength = 0.5 #strength =0 (no change) strength=1 (completely overwrite image) strength = 0.5 # strength =0 (no change) strength=1 (completely overwrite image)
# Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps. # Can be set to 1~50 steps. LCM supports fast inference even <= 4 steps. Recommend: 1~8 steps.
num_inference_steps = 4 num_inference_steps = 4
images = pipe(prompt=prompt, image=input_image, strength=strength, num_inference_steps=num_inference_steps, guidance_scale=8.0, lcm_origin_steps=50, output_type="pil").images images = pipe(prompt=prompt, image=input_image, strength=strength, num_inference_steps=num_inference_steps, guidance_scale=8.0, lcm_origin_steps=50, output_type="pil").images
...@@ -2827,7 +2807,7 @@ Two checkpoints are available for use: ...@@ -2827,7 +2807,7 @@ Two checkpoints are available for use:
- [ldm3d-pano](https://huggingface.co/Intel/ldm3d-pano). This checkpoint enables the generation of panoramic images and requires the StableDiffusionLDM3DPipeline pipeline to be used. - [ldm3d-pano](https://huggingface.co/Intel/ldm3d-pano). This checkpoint enables the generation of panoramic images and requires the StableDiffusionLDM3DPipeline pipeline to be used.
- [ldm3d-sr](https://huggingface.co/Intel/ldm3d-sr). This checkpoint enables the upscaling of RGB and depth images. Can be used in cascade after the original LDM3D pipeline using the StableDiffusionUpscaleLDM3DPipeline pipeline. - [ldm3d-sr](https://huggingface.co/Intel/ldm3d-sr). This checkpoint enables the upscaling of RGB and depth images. Can be used in cascade after the original LDM3D pipeline using the StableDiffusionUpscaleLDM3DPipeline pipeline.
'''py ```py
from PIL import Image from PIL import Image
import os import os
import torch import torch
...@@ -2838,11 +2818,11 @@ from diffusers import StableDiffusionLDM3DPipeline, DiffusionPipeline ...@@ -2838,11 +2818,11 @@ from diffusers import StableDiffusionLDM3DPipeline, DiffusionPipeline
pipe_ldm3d = StableDiffusionLDM3DPipeline.from_pretrained("Intel/ldm3d-4c") pipe_ldm3d = StableDiffusionLDM3DPipeline.from_pretrained("Intel/ldm3d-4c")
pipe_ldm3d.to("cuda") pipe_ldm3d.to("cuda")
prompt =f"A picture of some lemons on a table" prompt = "A picture of some lemons on a table"
output = pipe_ldm3d(prompt) output = pipe_ldm3d(prompt)
rgb_image, depth_image = output.rgb, output.depth rgb_image, depth_image = output.rgb, output.depth
rgb_image[0].save(f"lemons_ldm3d_rgb.jpg") rgb_image[0].save("lemons_ldm3d_rgb.jpg")
depth_image[0].save(f"lemons_ldm3d_depth.png") depth_image[0].save("lemons_ldm3d_depth.png")
# Upscale the previous output to a resolution of (1024, 1024) # Upscale the previous output to a resolution of (1024, 1024)
...@@ -2850,19 +2830,19 @@ pipe_ldm3d_upscale = DiffusionPipeline.from_pretrained("Intel/ldm3d-sr", custom_ ...@@ -2850,19 +2830,19 @@ pipe_ldm3d_upscale = DiffusionPipeline.from_pretrained("Intel/ldm3d-sr", custom_
pipe_ldm3d_upscale.to("cuda") pipe_ldm3d_upscale.to("cuda")
low_res_img = Image.open(f"lemons_ldm3d_rgb.jpg").convert("RGB") low_res_img = Image.open("lemons_ldm3d_rgb.jpg").convert("RGB")
low_res_depth = Image.open(f"lemons_ldm3d_depth.png").convert("L") low_res_depth = Image.open("lemons_ldm3d_depth.png").convert("L")
outputs = pipe_ldm3d_upscale(prompt="high quality high resolution uhd 4k image", rgb=low_res_img, depth=low_res_depth, num_inference_steps=50, target_res=[1024, 1024]) outputs = pipe_ldm3d_upscale(prompt="high quality high resolution uhd 4k image", rgb=low_res_img, depth=low_res_depth, num_inference_steps=50, target_res=[1024, 1024])
upscaled_rgb, upscaled_depth =outputs.rgb[0], outputs.depth[0] upscaled_rgb, upscaled_depth = outputs.rgb[0], outputs.depth[0]
upscaled_rgb.save(f"upscaled_lemons_rgb.png") upscaled_rgb.save("upscaled_lemons_rgb.png")
upscaled_depth.save(f"upscaled_lemons_depth.png") upscaled_depth.save("upscaled_lemons_depth.png")
''' ```
### ControlNet + T2I Adapter Pipeline ### ControlNet + T2I Adapter Pipeline
This pipelines combines both ControlNet and T2IAdapter into a single pipeline, where the forward pass is executed once. This pipeline combines both ControlNet and T2IAdapter into a single pipeline, where the forward pass is executed once.
It receives `control_image` and `adapter_image`, as well as `controlnet_conditioning_scale` and `adapter_conditioning_scale`, for the ControlNet and Adapter modules, respectively. Whenever `adapter_conditioning_scale = 0` or `controlnet_conditioning_scale = 0`, it will act as a full ControlNet module or as a full T2IAdapter module, respectively. It receives `control_image` and `adapter_image`, as well as `controlnet_conditioning_scale` and `adapter_conditioning_scale`, for the ControlNet and Adapter modules, respectively. Whenever `adapter_conditioning_scale=0` or `controlnet_conditioning_scale=0`, it will act as a full ControlNet module or as a full T2IAdapter module, respectively.
```py ```py
import cv2 import cv2
...@@ -2925,7 +2905,6 @@ images = pipe( ...@@ -2925,7 +2905,6 @@ images = pipe(
adapter_conditioning_scale=strength, adapter_conditioning_scale=strength,
).images ).images
images[0].save("controlnet_and_adapter.png") images[0].save("controlnet_and_adapter.png")
``` ```
### ControlNet + T2I Adapter + Inpainting Pipeline ### ControlNet + T2I Adapter + Inpainting Pipeline
...@@ -2996,12 +2975,11 @@ images = pipe( ...@@ -2996,12 +2975,11 @@ images = pipe(
strength=0.7, strength=0.7,
).images ).images
images[0].save("controlnet_and_adapter_inpaint.png") images[0].save("controlnet_and_adapter_inpaint.png")
``` ```
### Regional Prompting Pipeline ### Regional Prompting Pipeline
This pipeline is a port of the [Regional Prompter extension](https://github.com/hako-mikan/sd-webui-regional-prompter) for [Stable Diffusion web UI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) to diffusers. This pipeline is a port of the [Regional Prompter extension](https://github.com/hako-mikan/sd-webui-regional-prompter) for [Stable Diffusion web UI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) to `diffusers`.
This code implements a pipeline for the Stable Diffusion model, enabling the division of the canvas into multiple regions, with different prompts applicable to each region. Users can specify regions in two ways: using `Cols` and `Rows` modes for grid-like divisions, or the `Prompt` mode for regions calculated based on prompts. This code implements a pipeline for the Stable Diffusion model, enabling the division of the canvas into multiple regions, with different prompts applicable to each region. Users can specify regions in two ways: using `Cols` and `Rows` modes for grid-like divisions, or the `Prompt` mode for regions calculated based on prompts.
![sample](https://github.com/hako-mikan/sd-webui-regional-prompter/blob/imgs/rp_pipeline1.png) ![sample](https://github.com/hako-mikan/sd-webui-regional-prompter/blob/imgs/rp_pipeline1.png)
...@@ -3012,6 +2990,7 @@ This code implements a pipeline for the Stable Diffusion model, enabling the div ...@@ -3012,6 +2990,7 @@ This code implements a pipeline for the Stable Diffusion model, enabling the div
```py ```py
from examples.community.regional_prompting_stable_diffusion import RegionalPromptingStableDiffusionPipeline from examples.community.regional_prompting_stable_diffusion import RegionalPromptingStableDiffusionPipeline
pipe = RegionalPromptingStableDiffusionPipeline.from_single_file(model_path, vae=vae) pipe = RegionalPromptingStableDiffusionPipeline.from_single_file(model_path, vae=vae)
rp_args = { rp_args = {
...@@ -3019,7 +2998,7 @@ rp_args = { ...@@ -3019,7 +2998,7 @@ rp_args = {
"div": "1;1;1" "div": "1;1;1"
} }
prompt =""" prompt = """
green hair twintail BREAK green hair twintail BREAK
red blouse BREAK red blouse BREAK
blue skirt blue skirt
...@@ -3029,11 +3008,11 @@ images = pipe( ...@@ -3029,11 +3008,11 @@ images = pipe(
prompt=prompt, prompt=prompt,
negative_prompt=negative_prompt, negative_prompt=negative_prompt,
guidance_scale=7.5, guidance_scale=7.5,
height = 768, height=768,
width = 512, width=512,
num_inference_steps =20, num_inference_steps=20,
num_images_per_prompt = 1, num_images_per_prompt=1,
rp_args = rp_args rp_args=rp_args
).images ).images
time = time.strftime(r"%Y%m%d%H%M%S") time = time.strftime(r"%Y%m%d%H%M%S")
...@@ -3059,19 +3038,19 @@ blue skirt ...@@ -3059,19 +3038,19 @@ blue skirt
### 2-Dimentional division ### 2-Dimentional division
The prompt consists of instructions separated by the term `BREAK` and is assigned to different regions of a two-dimensional space. The image is initially split in the main splitting direction, which in this case is rows, due to the presence of a single semicolon`;`, dividing the space into an upper and a lower section. Additional sub-splitting is then applied, indicated by commas. The upper row is split into ratios of `2:1:1`, while the lower row is split into a ratio of `4:6`. Rows themselves are split in a `1:2` ratio. According to the reference image, the blue sky is designated as the first region, green hair as the second, the bookshelf as the third, and so on, in a sequence based on their position from the top left. The terrarium is placed on the desk in the fourth region, and the orange dress and sofa are in the fifth region, conforming to their respective splits. The prompt consists of instructions separated by the term `BREAK` and is assigned to different regions of a two-dimensional space. The image is initially split in the main splitting direction, which in this case is rows, due to the presence of a single semicolon `;`, dividing the space into an upper and a lower section. Additional sub-splitting is then applied, indicated by commas. The upper row is split into ratios of `2:1:1`, while the lower row is split into a ratio of `4:6`. Rows themselves are split in a `1:2` ratio. According to the reference image, the blue sky is designated as the first region, green hair as the second, the bookshelf as the third, and so on, in a sequence based on their position from the top left. The terrarium is placed on the desk in the fourth region, and the orange dress and sofa are in the fifth region, conforming to their respective splits.
``` ```py
rp_args = { rp_args = {
"mode":"rows", "mode":"rows",
"div": "1,2,1,1;2,4,6" "div": "1,2,1,1;2,4,6"
} }
prompt =""" prompt = """
blue sky BREAK blue sky BREAK
green hair BREAK green hair BREAK
book shelf BREAK book shelf BREAK
terrarium on desk BREAK terrarium on the desk BREAK
orange dress and sofa orange dress and sofa
""" """
``` ```
...@@ -3080,10 +3059,10 @@ orange dress and sofa ...@@ -3080,10 +3059,10 @@ orange dress and sofa
### Prompt Mode ### Prompt Mode
There are limitations to methods of specifying regions in advance. This is because specifying regions can be a hindrance when designating complex shapes or dynamic compositions. In the region specified by the prompt, the regions is determined after the image generation has begun. This allows us to accommodate compositions and complex regions. There are limitations to methods of specifying regions in advance. This is because specifying regions can be a hindrance when designating complex shapes or dynamic compositions. In the region specified by the prompt, the region is determined after the image generation has begun. This allows us to accommodate compositions and complex regions.
For further infomagen, see [here](https://github.com/hako-mikan/sd-webui-regional-prompter/blob/main/prompt_en.md). For further infomagen, see [here](https://github.com/hako-mikan/sd-webui-regional-prompter/blob/main/prompt_en.md).
### syntax ### Syntax
``` ```
baseprompt target1 target2 BREAK baseprompt target1 target2 BREAK
...@@ -3105,14 +3084,14 @@ is also effective. ...@@ -3105,14 +3084,14 @@ is also effective.
In this example, masks are calculated for shirt, tie, skirt, and color prompts are specified only for those regions. In this example, masks are calculated for shirt, tie, skirt, and color prompts are specified only for those regions.
``` ```py
rp_args = { rp_args = {
"mode":"prompt-ex", "mode": "prompt-ex",
"save_mask":True, "save_mask": True,
"th": "0.4,0.6,0.6", "th": "0.4,0.6,0.6",
} }
prompt =""" prompt = """
a girl in street with shirt, tie, skirt BREAK a girl in street with shirt, tie, skirt BREAK
red, shirt BREAK red, shirt BREAK
green, tie BREAK green, tie BREAK
...@@ -3122,7 +3101,7 @@ blue , skirt ...@@ -3122,7 +3101,7 @@ blue , skirt
![sample](https://github.com/hako-mikan/sd-webui-regional-prompter/blob/imgs/rp_pipeline3.png) ![sample](https://github.com/hako-mikan/sd-webui-regional-prompter/blob/imgs/rp_pipeline3.png)
### threshold ### Threshold
The threshold used to determine the mask created by the prompt. This can be set as many times as there are masks, as the range varies widely depending on the target prompt. If multiple regions are used, enter them separated by commas. For example, hair tends to be ambiguous and requires a small value, while face tends to be large and requires a small value. These should be ordered by BREAK. The threshold used to determine the mask created by the prompt. This can be set as many times as there are masks, as the range varies widely depending on the target prompt. If multiple regions are used, enter them separated by commas. For example, hair tends to be ambiguous and requires a small value, while face tends to be large and requires a small value. These should be ordered by BREAK.
...@@ -3141,7 +3120,7 @@ The difference is that in Prompt, duplicate regions are added, whereas in Prompt ...@@ -3141,7 +3120,7 @@ The difference is that in Prompt, duplicate regions are added, whereas in Prompt
### Accuracy ### Accuracy
In the case of a 512 x 512 image, Attention mode reduces the size of the region to about 8 x 8 pixels deep in the U-Net, so that small regions get mixed up; Latent mode calculates 64*64, so that the region is exact. In the case of a 512x512 image, Attention mode reduces the size of the region to about 8x8 pixels deep in the U-Net, so that small regions get mixed up; Latent mode calculates 64*64, so that the region is exact.
``` ```
girl hair twintail frills,ribbons, dress, face BREAK girl hair twintail frills,ribbons, dress, face BREAK
...@@ -3154,7 +3133,7 @@ When an image is generated, the generated mask is displayed. It is generated at ...@@ -3154,7 +3133,7 @@ When an image is generated, the generated mask is displayed. It is generated at
### Use common prompt ### Use common prompt
You can attach the prompt up to ADDCOMM to all prompts by separating it first with ADDCOMM. This is useful when you want to include elements common to all regions. For example, when generating pictures of three people with different appearances, it's necessary to include the instruction of 'three people' in all regions. It's also useful when inserting quality tags and other things."For example, if you write as follows: You can attach the prompt up to ADDCOMM to all prompts by separating it first with ADDCOMM. This is useful when you want to include elements common to all regions. For example, when generating pictures of three people with different appearances, it's necessary to include the instruction of 'three people' in all regions. It's also useful when inserting quality tags and other things. "For example, if you write as follows:
``` ```
best quality, 3persons in garden, ADDCOMM best quality, 3persons in garden, ADDCOMM
...@@ -3177,24 +3156,24 @@ Negative prompts are equally effective across all regions, but it is possible to ...@@ -3177,24 +3156,24 @@ Negative prompts are equally effective across all regions, but it is possible to
### Parameters ### Parameters
To activate Regional Prompter, it is necessary to enter settings in rp_args. The items that can be set are as follows. rp_args is a dictionary type. To activate Regional Prompter, it is necessary to enter settings in `rp_args`. The items that can be set are as follows. `rp_args` is a dictionary type.
### Input Parameters ### Input Parameters
Parameters are specified through the `rp_arg`(dictionary type). Parameters are specified through the `rp_arg`(dictionary type).
``` ```py
rp_args = { rp_args = {
"mode":"rows", "mode":"rows",
"div": "1;1;1" "div": "1;1;1"
} }
pipe(prompt =prompt, rp_args = rp_args) pipe(prompt=prompt, rp_args=rp_args)
``` ```
### Required Parameters ### Required Parameters
- `mode`: Specifies the method for defining regions. Choose from `Cols`, `Rows`, `Prompt` or `Prompt-Ex`. This parameter is case-insensitive. - `mode`: Specifies the method for defining regions. Choose from `Cols`, `Rows`, `Prompt`, or `Prompt-Ex`. This parameter is case-insensitive.
- `divide`: Used in `Cols` and `Rows` modes. Details on how to specify this are provided under the respective `Cols` and `Rows` sections. - `divide`: Used in `Cols` and `Rows` modes. Details on how to specify this are provided under the respective `Cols` and `Rows` sections.
- `th`: Used in `Prompt` mode. The method of specification is detailed under the `Prompt` section. - `th`: Used in `Prompt` mode. The method of specification is detailed under the `Prompt` section.
...@@ -3208,7 +3187,7 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur ...@@ -3208,7 +3187,7 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur
- Reference paper - Reference paper
``` ```bibtex
@article{chung2022diffusion, @article{chung2022diffusion,
title={Diffusion posterior sampling for general noisy inverse problems}, title={Diffusion posterior sampling for general noisy inverse problems},
author={Chung, Hyungjin and Kim, Jeongsol and Mccann, Michael T and Klasky, Marc L and Ye, Jong Chul}, author={Chung, Hyungjin and Kim, Jeongsol and Mccann, Michael T and Klasky, Marc L and Ye, Jong Chul},
...@@ -3220,7 +3199,7 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur ...@@ -3220,7 +3199,7 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur
- This pipeline allows zero-shot conditional sampling from the posterior distribution $p(x|y)$, given observation on $y$, unconditional generative model $p(x)$ and differentiable operator $y=f(x)$. - This pipeline allows zero-shot conditional sampling from the posterior distribution $p(x|y)$, given observation on $y$, unconditional generative model $p(x)$ and differentiable operator $y=f(x)$.
- For example, $f(.)$ can be downsample operator, then $y$ is a downsampled image, and the pipeline becomes a super-resolution pipeline. - For example, $f(.)$ can be downsample operator, then $y$ is a downsampled image, and the pipeline becomes a super-resolution pipeline.
- To use this pipeline, you need to know your operator $f(.)$ and corrupted image $y$, and pass them during the call. For example, as in the main function of dps_pipeline.py, you need to first define the Gaussian blurring operator $f(.)$. The operator should be a callable nn.Module, with all the parameter gradient disabled: - To use this pipeline, you need to know your operator $f(.)$ and corrupted image $y$, and pass them during the call. For example, as in the main function of `dps_pipeline.py`, you need to first define the Gaussian blurring operator $f(.)$. The operator should be a callable `nn.Module`, with all the parameter gradient disabled:
```python ```python
import torch.nn.functional as F import torch.nn.functional as F
...@@ -3250,7 +3229,7 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur ...@@ -3250,7 +3229,7 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur
def weights_init(self): def weights_init(self):
if self.blur_type == "gaussian": if self.blur_type == "gaussian":
n = np.zeros((self.kernel_size, self.kernel_size)) n = np.zeros((self.kernel_size, self.kernel_size))
n[self.kernel_size // 2,self.kernel_size // 2] = 1 n[self.kernel_size // 2, self.kernel_size // 2] = 1
k = scipy.ndimage.gaussian_filter(n, sigma=self.std) k = scipy.ndimage.gaussian_filter(n, sigma=self.std)
k = torch.from_numpy(k) k = torch.from_numpy(k)
self.k = k self.k = k
...@@ -3280,7 +3259,7 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur ...@@ -3280,7 +3259,7 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur
self.conv.update_weights(self.kernel.type(torch.float32)) self.conv.update_weights(self.kernel.type(torch.float32))
for param in self.parameters(): for param in self.parameters():
param.requires_grad=False param.requires_grad = False
def forward(self, data, **kwargs): def forward(self, data, **kwargs):
return self.conv(data) return self.conv(data)
...@@ -3317,7 +3296,7 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur ...@@ -3317,7 +3296,7 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur
- ![sample](https://github.com/tongdaxu/Images/assets/22267548/4d2a1216-08d1-4aeb-9ce3-7a2d87561d65) - ![sample](https://github.com/tongdaxu/Images/assets/22267548/4d2a1216-08d1-4aeb-9ce3-7a2d87561d65)
- Gaussian blurred image: - Gaussian blurred image:
- ![ddpm_generated_image](https://github.com/tongdaxu/Images/assets/22267548/65076258-344b-4ed8-b704-a04edaade8ae) - ![ddpm_generated_image](https://github.com/tongdaxu/Images/assets/22267548/65076258-344b-4ed8-b704-a04edaade8ae)
- You can download those image to run the example on your own. - You can download those images to run the example on your own.
- Next, we need to define a loss function used for diffusion posterior sample. For most of the cases, the RMSE is fine: - Next, we need to define a loss function used for diffusion posterior sample. For most of the cases, the RMSE is fine:
...@@ -3326,7 +3305,7 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur ...@@ -3326,7 +3305,7 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur
return torch.sqrt(torch.sum((yhat-y)**2)) return torch.sqrt(torch.sum((yhat-y)**2))
``` ```
- And next, as any other diffusion models, we need the score estimator and scheduler. As we are working with $256x256$ face images, we use ddmp-celebahq-256: - And next, as any other diffusion models, we need the score estimator and scheduler. As we are working with $256x256$ face images, we use ddpm-celebahq-256:
```python ```python
# set up scheduler # set up scheduler
...@@ -3343,20 +3322,20 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur ...@@ -3343,20 +3322,20 @@ The Pipeline supports `compel` syntax. Input prompts using the `compel` structur
# finally, the pipeline # finally, the pipeline
dpspipe = DPSPipeline(model, scheduler) dpspipe = DPSPipeline(model, scheduler)
image = dpspipe( image = dpspipe(
measurement = measurement, measurement=measurement,
operator = operator, operator=operator,
loss_fn = RMSELoss, loss_fn=RMSELoss,
zeta = 1.0, zeta=1.0,
).images[0] ).images[0]
image.save("dps_generated_image.png") image.save("dps_generated_image.png")
``` ```
- The zeta is a hyperparameter that is in range of $[0,1]$. It need to be tuned for best effect. By setting zeta=1, you should be able to have the reconstructed result: - The `zeta` is a hyperparameter that is in range of $[0,1]$. It needs to be tuned for best effect. By setting `zeta=1`, you should be able to have the reconstructed result:
- Reconstructed image: - Reconstructed image:
- ![sample](https://github.com/tongdaxu/Images/assets/22267548/0ceb5575-d42e-4f0b-99c0-50e69c982209) - ![sample](https://github.com/tongdaxu/Images/assets/22267548/0ceb5575-d42e-4f0b-99c0-50e69c982209)
- The reconstruction is perceptually similar to the source image, but different in details. - The reconstruction is perceptually similar to the source image, but different in details.
- In dps_pipeline.py, we also provide a super-resolution example, which should produce: - In `dps_pipeline.py`, we also provide a super-resolution example, which should produce:
- Downsampled image: - Downsampled image:
- ![dps_mea](https://github.com/tongdaxu/Images/assets/22267548/ff6a33d6-26f0-42aa-88ce-f8a76ba45a13) - ![dps_mea](https://github.com/tongdaxu/Images/assets/22267548/ff6a33d6-26f0-42aa-88ce-f8a76ba45a13)
- Reconstructed image: - Reconstructed image:
...@@ -3368,9 +3347,8 @@ This pipeline combines AnimateDiff and ControlNet. Enjoy precise motion control ...@@ -3368,9 +3347,8 @@ This pipeline combines AnimateDiff and ControlNet. Enjoy precise motion control
```py ```py
import torch import torch
from diffusers import AutoencoderKL, ControlNetModel, MotionAdapter from diffusers import AutoencoderKL, ControlNetModel, MotionAdapter, DiffusionPipeline, DPMSolverMultistepScheduler
from diffusers.pipelines import DiffusionPipeline from diffusers.utils import export_to_gif
from diffusers.schedulers import DPMSolverMultistepScheduler
from PIL import Image from PIL import Image
motion_id = "guoyww/animatediff-motion-adapter-v1-5-2" motion_id = "guoyww/animatediff-motion-adapter-v1-5-2"
...@@ -3385,7 +3363,8 @@ pipe = DiffusionPipeline.from_pretrained( ...@@ -3385,7 +3363,8 @@ pipe = DiffusionPipeline.from_pretrained(
controlnet=controlnet, controlnet=controlnet,
vae=vae, vae=vae,
custom_pipeline="pipeline_animatediff_controlnet", custom_pipeline="pipeline_animatediff_controlnet",
).to(device="cuda", dtype=torch.float16) torch_dtype=torch.float16,
).to(device="cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained( pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained(
model_id, subfolder="scheduler", beta_schedule="linear", clip_sample=False, timestep_spacing="linspace", steps_offset=1 model_id, subfolder="scheduler", beta_schedule="linear", clip_sample=False, timestep_spacing="linspace", steps_offset=1
) )
...@@ -3406,7 +3385,6 @@ result = pipe( ...@@ -3406,7 +3385,6 @@ result = pipe(
num_inference_steps=20, num_inference_steps=20,
).frames[0] ).frames[0]
from diffusers.utils import export_to_gif
export_to_gif(result.frames[0], "result.gif") export_to_gif(result.frames[0], "result.gif")
``` ```
...@@ -3431,9 +3409,8 @@ You can also use multiple controlnets at once! ...@@ -3431,9 +3409,8 @@ You can also use multiple controlnets at once!
```python ```python
import torch import torch
from diffusers import AutoencoderKL, ControlNetModel, MotionAdapter from diffusers import AutoencoderKL, ControlNetModel, MotionAdapter, DiffusionPipeline, DPMSolverMultistepScheduler
from diffusers.pipelines import DiffusionPipeline from diffusers.utils import export_to_gif
from diffusers.schedulers import DPMSolverMultistepScheduler
from PIL import Image from PIL import Image
motion_id = "guoyww/animatediff-motion-adapter-v1-5-2" motion_id = "guoyww/animatediff-motion-adapter-v1-5-2"
...@@ -3449,7 +3426,8 @@ pipe = DiffusionPipeline.from_pretrained( ...@@ -3449,7 +3426,8 @@ pipe = DiffusionPipeline.from_pretrained(
controlnet=[controlnet1, controlnet2], controlnet=[controlnet1, controlnet2],
vae=vae, vae=vae,
custom_pipeline="pipeline_animatediff_controlnet", custom_pipeline="pipeline_animatediff_controlnet",
).to(device="cuda", dtype=torch.float16) torch_dtype=torch.float16,
).to(device="cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained( pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained(
model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1, beta_schedule="linear", model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1, beta_schedule="linear",
) )
...@@ -3496,7 +3474,6 @@ result = pipe( ...@@ -3496,7 +3474,6 @@ result = pipe(
num_inference_steps=20, num_inference_steps=20,
) )
from diffusers.utils import export_to_gif
export_to_gif(result.frames[0], "result.gif") export_to_gif(result.frames[0], "result.gif")
``` ```
...@@ -3625,7 +3602,6 @@ pipe.train_lora(prompt, image) ...@@ -3625,7 +3602,6 @@ pipe.train_lora(prompt, image)
output = pipe(prompt, image, mask_image, source_points, target_points) output = pipe(prompt, image, mask_image, source_points, target_points)
output_image = PIL.Image.fromarray(output) output_image = PIL.Image.fromarray(output)
output_image.save("./output.png") output_image.save("./output.png")
``` ```
### Instaflow Pipeline ### Instaflow Pipeline
...@@ -3674,7 +3650,8 @@ This pipeline provides null-text inversion for editing real images. It enables n ...@@ -3674,7 +3650,8 @@ This pipeline provides null-text inversion for editing real images. It enables n
- Reference paper - Reference paper
```@article{hertz2022prompt, ```bibtex
@article{hertz2022prompt,
title={Prompt-to-prompt image editing with cross attention control}, title={Prompt-to-prompt image editing with cross attention control},
author={Hertz, Amir and Mokady, Ron and Tenenbaum, Jay and Aberman, Kfir and Pritch, Yael and Cohen-Or, Daniel}, author={Hertz, Amir and Mokady, Ron and Tenenbaum, Jay and Aberman, Kfir and Pritch, Yael and Cohen-Or, Daniel},
booktitle={arXiv preprint arXiv:2208.01626}, booktitle={arXiv preprint arXiv:2208.01626},
...@@ -3682,11 +3659,10 @@ This pipeline provides null-text inversion for editing real images. It enables n ...@@ -3682,11 +3659,10 @@ This pipeline provides null-text inversion for editing real images. It enables n
```} ```}
```py ```py
from diffusers.schedulers import DDIMScheduler from diffusers import DDIMScheduler
from examples.community.pipeline_null_text_inversion import NullTextPipeline from examples.community.pipeline_null_text_inversion import NullTextPipeline
import torch import torch
# Load the pipeline
device = "cuda" device = "cuda"
# Provide invert_prompt and the image for null-text optimization. # Provide invert_prompt and the image for null-text optimization.
invert_prompt = "A lying cat" invert_prompt = "A lying cat"
...@@ -3698,13 +3674,13 @@ prompt = "A lying cat" ...@@ -3698,13 +3674,13 @@ prompt = "A lying cat"
# or different if editing. # or different if editing.
prompt = "A lying dog" prompt = "A lying dog"
#Float32 is essential to a well optimization # Float32 is essential to a well optimization
model_path = "runwayml/stable-diffusion-v1-5" model_path = "runwayml/stable-diffusion-v1-5"
scheduler = DDIMScheduler(num_train_timesteps=1000, beta_start=0.00085, beta_end=0.0120, beta_schedule="scaled_linear") scheduler = DDIMScheduler(num_train_timesteps=1000, beta_start=0.00085, beta_end=0.0120, beta_schedule="scaled_linear")
pipeline = NullTextPipeline.from_pretrained(model_path, scheduler = scheduler, torch_dtype=torch.float32).to(device) pipeline = NullTextPipeline.from_pretrained(model_path, scheduler=scheduler, torch_dtype=torch.float32).to(device)
#Saves the inverted_latent to save time # Saves the inverted_latent to save time
inverted_latent, uncond = pipeline.invert(input_image, invert_prompt, num_inner_steps=10, early_stop_epsilon= 1e-5, num_inference_steps = steps) inverted_latent, uncond = pipeline.invert(input_image, invert_prompt, num_inner_steps=10, early_stop_epsilon=1e-5, num_inference_steps=steps)
pipeline(prompt, uncond, inverted_latent, guidance_scale=7.5, num_inference_steps=steps).images[0].save(input_image+".output.jpg") pipeline(prompt, uncond, inverted_latent, guidance_scale=7.5, num_inference_steps=steps).images[0].save(input_image+".output.jpg")
``` ```
...@@ -3761,7 +3737,7 @@ for frame in frames: ...@@ -3761,7 +3737,7 @@ for frame in frames:
controlnet = ControlNetModel.from_pretrained( controlnet = ControlNetModel.from_pretrained(
"lllyasviel/sd-controlnet-canny").to('cuda') "lllyasviel/sd-controlnet-canny").to('cuda')
# You can use any fintuned SD here # You can use any finetuned SD here
pipe = DiffusionPipeline.from_pretrained( pipe = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", controlnet=controlnet, custom_pipeline='rerender_a_video').to('cuda') "runwayml/stable-diffusion-v1-5", controlnet=controlnet, custom_pipeline='rerender_a_video').to('cuda')
...@@ -3803,7 +3779,7 @@ This pipeline is the implementation of [Style Aligned Image Generation via Share ...@@ -3803,7 +3779,7 @@ This pipeline is the implementation of [Style Aligned Image Generation via Share
from typing import List from typing import List
import torch import torch
from diffusers.pipelines.pipeline_utils import DiffusionPipeline from diffusers import DiffusionPipeline
from PIL import Image from PIL import Image
model_id = "a-r-r-o-w/dreamshaper-xl-turbo" model_id = "a-r-r-o-w/dreamshaper-xl-turbo"
...@@ -3882,11 +3858,10 @@ export_to_gif(frames, "animation.gif") ...@@ -3882,11 +3858,10 @@ export_to_gif(frames, "animation.gif")
IP Adapter FaceID is an experimental IP Adapter model that uses image embeddings generated by `insightface`, so no image encoder needs to be loaded. IP Adapter FaceID is an experimental IP Adapter model that uses image embeddings generated by `insightface`, so no image encoder needs to be loaded.
You need to install `insightface` and all its requirements to use this model. You need to install `insightface` and all its requirements to use this model.
You must pass the image embedding tensor as `image_embeds` to the StableDiffusionPipeline instead of `ip_adapter_image`. You must pass the image embedding tensor as `image_embeds` to the `DiffusionPipeline` instead of `ip_adapter_image`.
You can find more results [here](https://github.com/huggingface/diffusers/pull/6276). You can find more results [here](https://github.com/huggingface/diffusers/pull/6276).
```py ```py
import diffusers
import torch import torch
from diffusers.utils import load_image from diffusers.utils import load_image
import cv2 import cv2
...@@ -3916,7 +3891,7 @@ pipeline.load_ip_adapter_face_id("h94/IP-Adapter-FaceID", "ip-adapter-faceid_sd1 ...@@ -3916,7 +3891,7 @@ pipeline.load_ip_adapter_face_id("h94/IP-Adapter-FaceID", "ip-adapter-faceid_sd1
pipeline.to("cuda") pipeline.to("cuda")
generator = torch.Generator(device="cpu").manual_seed(42) generator = torch.Generator(device="cpu").manual_seed(42)
num_images=2 num_images = 2
image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ai_face2.png") image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ai_face2.png")
...@@ -3939,13 +3914,13 @@ for i in range(num_images): ...@@ -3939,13 +3914,13 @@ for i in range(num_images):
### InstantID Pipeline ### InstantID Pipeline
InstantID is a new state-of-the-art tuning-free method to achieve ID-Preserving generation with only single image, supporting various downstream tasks. For any usgae question, please refer to the [official implementation](https://github.com/InstantID/InstantID). InstantID is a new state-of-the-art tuning-free method to achieve ID-Preserving generation with only single image, supporting various downstream tasks. For any usage question, please refer to the [official implementation](https://github.com/InstantID/InstantID).
```py ```py
# !pip install opencv-python transformers accelerate insightface # !pip install diffusers opencv-python transformers accelerate insightface
import diffusers import diffusers
from diffusers.utils import load_image from diffusers.utils import load_image
from diffusers.models import ControlNetModel from diffusers import ControlNetModel
import cv2 import cv2
import torch import torch
...@@ -3963,12 +3938,13 @@ app.prepare(ctx_id=0, det_size=(640, 640)) ...@@ -3963,12 +3938,13 @@ app.prepare(ctx_id=0, det_size=(640, 640))
# prepare models under ./checkpoints # prepare models under ./checkpoints
# https://huggingface.co/InstantX/InstantID # https://huggingface.co/InstantX/InstantID
from huggingface_hub import hf_hub_download from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="InstantX/InstantID", filename="ControlNetModel/config.json", local_dir="./checkpoints") hf_hub_download(repo_id="InstantX/InstantID", filename="ControlNetModel/config.json", local_dir="./checkpoints")
hf_hub_download(repo_id="InstantX/InstantID", filename="ControlNetModel/diffusion_pytorch_model.safetensors", local_dir="./checkpoints") hf_hub_download(repo_id="InstantX/InstantID", filename="ControlNetModel/diffusion_pytorch_model.safetensors", local_dir="./checkpoints")
hf_hub_download(repo_id="InstantX/InstantID", filename="ip-adapter.bin", local_dir="./checkpoints") hf_hub_download(repo_id="InstantX/InstantID", filename="ip-adapter.bin", local_dir="./checkpoints")
face_adapter = f'./checkpoints/ip-adapter.bin' face_adapter = './checkpoints/ip-adapter.bin'
controlnet_path = f'./checkpoints/ControlNetModel' controlnet_path = './checkpoints/ControlNetModel'
# load IdentityNet # load IdentityNet
controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16) controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16)
...@@ -3979,7 +3955,7 @@ pipe = StableDiffusionXLInstantIDPipeline.from_pretrained( ...@@ -3979,7 +3955,7 @@ pipe = StableDiffusionXLInstantIDPipeline.from_pretrained(
controlnet=controlnet, controlnet=controlnet,
torch_dtype=torch.float16 torch_dtype=torch.float16
) )
pipe.cuda() pipe.to("cuda")
# load adapter # load adapter
pipe.load_ip_adapter_instantid(face_adapter) pipe.load_ip_adapter_instantid(face_adapter)
...@@ -4046,8 +4022,9 @@ import cv2 ...@@ -4046,8 +4022,9 @@ import cv2
import torch import torch
import numpy as np import numpy as np
from diffusers import ControlNetModel,DDIMScheduler, DiffusionPipeline from diffusers import ControlNetModel, DDIMScheduler, DiffusionPipeline
import sys import sys
gmflow_dir = "/path/to/gmflow" gmflow_dir = "/path/to/gmflow"
sys.path.insert(0, gmflow_dir) sys.path.insert(0, gmflow_dir)
...@@ -4075,7 +4052,7 @@ def video_to_frame(video_path: str, interval: int): ...@@ -4075,7 +4052,7 @@ def video_to_frame(video_path: str, interval: int):
input_video_path = 'https://github.com/williamyang1991/FRESCO/raw/main/data/car-turn.mp4' input_video_path = 'https://github.com/williamyang1991/FRESCO/raw/main/data/car-turn.mp4'
output_video_path = 'car.gif' output_video_path = 'car.gif'
# You can use any fintuned SD here # You can use any finetuned SD here
model_path = 'SG161222/Realistic_Vision_V2.0' model_path = 'SG161222/Realistic_Vision_V2.0'
prompt = 'a red car turns in the winter' prompt = 'a red car turns in the winter'
...@@ -4120,14 +4097,13 @@ output_frames = pipe( ...@@ -4120,14 +4097,13 @@ output_frames = pipe(
output_frames[0].save(output_video_path, save_all=True, output_frames[0].save(output_video_path, save_all=True,
append_images=output_frames[1:], duration=100, loop=0) append_images=output_frames[1:], duration=100, loop=0)
``` ```
# Perturbed-Attention Guidance # Perturbed-Attention Guidance
[Project](https://ku-cvlab.github.io/Perturbed-Attention-Guidance/) / [arXiv](https://arxiv.org/abs/2403.17377) / [GitHub](https://github.com/KU-CVLAB/Perturbed-Attention-Guidance) [Project](https://ku-cvlab.github.io/Perturbed-Attention-Guidance/) / [arXiv](https://arxiv.org/abs/2403.17377) / [GitHub](https://github.com/KU-CVLAB/Perturbed-Attention-Guidance)
This implementation is based on [Diffusers](https://huggingface.co/docs/diffusers/index). StableDiffusionPAGPipeline is a modification of StableDiffusionPipeline to support Perturbed-Attention Guidance (PAG). This implementation is based on [Diffusers](https://huggingface.co/docs/diffusers/index). `StableDiffusionPAGPipeline` is a modification of `StableDiffusionPipeline` to support Perturbed-Attention Guidance (PAG).
## Example Usage ## Example Usage
...@@ -4147,14 +4123,14 @@ pipe = StableDiffusionPipeline.from_pretrained( ...@@ -4147,14 +4123,14 @@ pipe = StableDiffusionPipeline.from_pretrained(
torch_dtype=torch.float16 torch_dtype=torch.float16
) )
device="cuda" device = "cuda"
pipe = pipe.to(device) pipe = pipe.to(device)
pag_scale = 5.0 pag_scale = 5.0
pag_applied_layers_index = ['m0'] pag_applied_layers_index = ['m0']
batch_size = 4 batch_size = 4
seed=10 seed = 10
base_dir = "./results/" base_dir = "./results/"
grid_dir = base_dir + "/pag" + str(pag_scale) + "/" grid_dir = base_dir + "/pag" + str(pag_scale) + "/"
...@@ -4164,7 +4140,7 @@ if not os.path.exists(grid_dir): ...@@ -4164,7 +4140,7 @@ if not os.path.exists(grid_dir):
set_seed(seed) set_seed(seed)
latent_input = randn_tensor(shape=(batch_size,4,64,64),generator=None, device=device, dtype=torch.float16) latent_input = randn_tensor(shape=(batch_size,4,64,64), generator=None, device=device, dtype=torch.float16)
output_baseline = pipe( output_baseline = pipe(
"", "",
...@@ -4196,6 +4172,6 @@ grid_image.save(grid_dir + "sample.png") ...@@ -4196,6 +4172,6 @@ grid_image.save(grid_dir + "sample.png")
## PAG Parameters ## PAG Parameters
pag_scale : gudiance scale of PAG (ex: 5.0) `pag_scale` : guidance scale of PAG (ex: 5.0)
pag_applied_layers_index : index of the layer to apply perturbation (ex: ['m0']) `pag_applied_layers_index` : index of the layer to apply perturbation (ex: ['m0'])
\ No newline at end of file
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
# A SDXL pipeline can take unlimited weighted prompt # A SDXL pipeline can take unlimited weighted prompt
# #
# Author: Andrew Zhu # Author: Andrew Zhu
# Github: https://github.com/xhinker # GitHub: https://github.com/xhinker
# Medium: https://medium.com/@xhinker # Medium: https://medium.com/@xhinker
## ----------------------------------------------------------- ## -----------------------------------------------------------
......
...@@ -261,7 +261,7 @@ The authors found that by using DoRA, both the learning capacity and training st ...@@ -261,7 +261,7 @@ The authors found that by using DoRA, both the learning capacity and training st
**Usage** **Usage**
1. To use DoRA you need to upgrade the installation of `peft`: 1. To use DoRA you need to upgrade the installation of `peft`:
```bash ```bash
pip install-U peft pip install -U peft
``` ```
2. Enable DoRA training by adding this flag 2. Enable DoRA training by adding this flag
```bash ```bash
......
...@@ -30,7 +30,7 @@ accelerate launch finetune_instruct_pix2pix.py \ ...@@ -30,7 +30,7 @@ accelerate launch finetune_instruct_pix2pix.py \
## Inference ## Inference
After training the model and the lora weight of the model is stored in the ```$OUTPUT_DIR```. After training the model and the lora weight of the model is stored in the ```$OUTPUT_DIR```.
```bash ```py
# load the base model pipeline # load the base model pipeline
pipe_lora = StableDiffusionInstructPix2PixPipeline.from_pretrained("timbrooks/instruct-pix2pix") pipe_lora = StableDiffusionInstructPix2PixPipeline.from_pretrained("timbrooks/instruct-pix2pix")
......
...@@ -6,7 +6,7 @@ This aims to provide diffusers examples with Intel optimizations such as Bfloat1 ...@@ -6,7 +6,7 @@ This aims to provide diffusers examples with Intel optimizations such as Bfloat1
## Accelerating the fine-tuning for textual inversion ## Accelerating the fine-tuning for textual inversion
We accelereate the fine-tuning for textual inversion with Intel Extension for PyTorch. The [examples](textual_inversion) enable both single node and multi-node distributed training with Bfloat16 support on Intel Xeon Scalable Processor. We accelerate the fine-tuning for textual inversion with Intel Extension for PyTorch. The [examples](textual_inversion) enable both single node and multi-node distributed training with Bfloat16 support on Intel Xeon Scalable Processor.
## Accelerating the inference for Stable Diffusion using Bfloat16 ## Accelerating the inference for Stable Diffusion using Bfloat16
......
...@@ -323,7 +323,7 @@ accelerate launch train_dreambooth.py \ ...@@ -323,7 +323,7 @@ accelerate launch train_dreambooth.py \
### Using DreamBooth for other pipelines than Stable Diffusion ### Using DreamBooth for other pipelines than Stable Diffusion
Altdiffusion also support dreambooth now, the runing comman is basically the same as above, all you need to do is replace the `MODEL_NAME` like this: Altdiffusion also supports dreambooth now, the running command is basically the same as above, all you need to do is replace the `MODEL_NAME` like this:
One can now simply change the `pretrained_model_name_or_path` to another architecture such as [`AltDiffusion`](https://huggingface.co/docs/diffusers/api/pipelines/alt_diffusion). One can now simply change the `pretrained_model_name_or_path` to another architecture such as [`AltDiffusion`](https://huggingface.co/docs/diffusers/api/pipelines/alt_diffusion).
``` ```
......
...@@ -45,7 +45,7 @@ accelerate launch train_vqgan.py \ ...@@ -45,7 +45,7 @@ accelerate launch train_vqgan.py \
``` ```
An example training run is [here](https://wandb.ai/sayakpaul/vqgan-training/runs/0m5kzdfp) by @sayakpaul and a lower scale one [here](https://wandb.ai/dsbuddy27/vqgan-training/runs/eqd6xi4n?nw=nwuserisamu). The validation images can be obtained from [here](https://huggingface.co/datasets/diffusers/docs-images/tree/main/vqgan_validation_images). An example training run is [here](https://wandb.ai/sayakpaul/vqgan-training/runs/0m5kzdfp) by @sayakpaul and a lower scale one [here](https://wandb.ai/dsbuddy27/vqgan-training/runs/eqd6xi4n?nw=nwuserisamu). The validation images can be obtained from [here](https://huggingface.co/datasets/diffusers/docs-images/tree/main/vqgan_validation_images).
The simplest way to improve the quality of a VQGAN model is to maximize the amount of information present in the bottleneck. The easiest way to do this is increasing the image resolution. However, other ways include, but not limited to, lowering compression by downsampling fewer times or increasing the vocaburary size which at most can be around 16384. How to do this is shown below. The simplest way to improve the quality of a VQGAN model is to maximize the amount of information present in the bottleneck. The easiest way to do this is increasing the image resolution. However, other ways include, but not limited to, lowering compression by downsampling fewer times or increasing the vocabulary size which at most can be around 16384. How to do this is shown below.
# Modifying the architecture # Modifying the architecture
...@@ -118,10 +118,10 @@ To lower the amount of layers in a VQGan, you can remove layers by modifying the ...@@ -118,10 +118,10 @@ To lower the amount of layers in a VQGan, you can remove layers by modifying the
"vq_embed_dim": 4 "vq_embed_dim": 4
} }
``` ```
For increasing the size of the vocaburaries you can increase num_vq_embeddings. However, [some research](https://magvit.cs.cmu.edu/v2/) shows that the representation of VQGANs start degrading after 2^14~16384 vq embeddings so it's not recommended to go past that. For increasing the size of the vocabularies you can increase num_vq_embeddings. However, [some research](https://magvit.cs.cmu.edu/v2/) shows that the representation of VQGANs start degrading after 2^14~16384 vq embeddings so it's not recommended to go past that.
## Extra training tips/ideas ## Extra training tips/ideas
During logging take care to make sure data_time is low. data_time is the amount spent loading the data and where the GPU is not active. So essentially, it's the time wasted. The easiest way to lower data time is to increase the --dataloader_num_workers to a higher number like 4. Due to a bug in Pytorch, this only works on linux based systems. For more details check [here](https://github.com/huggingface/diffusers/issues/7646) During logging take care to make sure data_time is low. data_time is the amount spent loading the data and where the GPU is not active. So essentially, it's the time wasted. The easiest way to lower data time is to increase the --dataloader_num_workers to a higher number like 4. Due to a bug in Pytorch, this only works on linux based systems. For more details check [here](https://github.com/huggingface/diffusers/issues/7646)
Secondly, training should seem to be done when both the discriminator and the generator loss converges. Secondly, training should seem to be done when both the discriminator and the generator loss converges.
Thirdly, another low hanging fruit is just using ema using the --use_ema parameter. This tends to make the output images smoother. This has a con where you have to lower your batch size by 1 but it may be worth it. Thirdly, another low hanging fruit is just using ema using the --use_ema parameter. This tends to make the output images smoother. This has a con where you have to lower your batch size by 1 but it may be worth it.
Another more experimental low hanging fruit is changing from the vgg19 to different models for the lpips loss using the --timm_model_backend. If you do this, I recommend also changing the timm_model_layers parameter to the layer in your model which you think is best for representation. However, becareful with the feature map norms since this can easily overdominate the loss. Another more experimental low hanging fruit is changing from the vgg19 to different models for the lpips loss using the --timm_model_backend. If you do this, I recommend also changing the timm_model_layers parameter to the layer in your model which you think is best for representation. However, be careful with the feature map norms since this can easily overdominate the loss.
\ No newline at end of file \ No newline at end of file
...@@ -52,7 +52,7 @@ EXAMPLE_DOC_STRING = """ ...@@ -52,7 +52,7 @@ EXAMPLE_DOC_STRING = """
>>> image.save("cd_imagenet64_l2_onestep_sample_penguin.png") >>> image.save("cd_imagenet64_l2_onestep_sample_penguin.png")
>>> # Multistep sampling, class-conditional image generation >>> # Multistep sampling, class-conditional image generation
>>> # Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo: >>> # Timesteps can be explicitly specified; the particular timesteps below are from the original GitHub repo:
>>> # https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L77 >>> # https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L77
>>> image = pipe(num_inference_steps=None, timesteps=[22, 0], class_labels=145).images[0] >>> image = pipe(num_inference_steps=None, timesteps=[22, 0], class_labels=145).images[0]
>>> image.save("cd_imagenet64_l2_multistep_sample_penguin.png") >>> image.save("cd_imagenet64_l2_multistep_sample_penguin.png")
......
...@@ -80,7 +80,7 @@ class EDMEulerSchedulerTest(SchedulerCommonTest): ...@@ -80,7 +80,7 @@ class EDMEulerSchedulerTest(SchedulerCommonTest):
assert abs(result_sum.item() - 34.1855) < 1e-3 assert abs(result_sum.item() - 34.1855) < 1e-3
assert abs(result_mean.item() - 0.044) < 1e-3 assert abs(result_mean.item() - 0.044) < 1e-3
# Override test_from_save_pretrined to use EDMEulerScheduler-specific logic # Override test_from_save_pretrained to use EDMEulerScheduler-specific logic
def test_from_save_pretrained(self): def test_from_save_pretrained(self):
kwargs = dict(self.forward_default_kwargs) kwargs = dict(self.forward_default_kwargs)
num_inference_steps = kwargs.pop("num_inference_steps", None) num_inference_steps = kwargs.pop("num_inference_steps", None)
...@@ -118,7 +118,7 @@ class EDMEulerSchedulerTest(SchedulerCommonTest): ...@@ -118,7 +118,7 @@ class EDMEulerSchedulerTest(SchedulerCommonTest):
assert torch.sum(torch.abs(output - new_output)) < 1e-5, "Scheduler outputs are not identical" assert torch.sum(torch.abs(output - new_output)) < 1e-5, "Scheduler outputs are not identical"
# Override test_from_save_pretrined to use EDMEulerScheduler-specific logic # Override test_from_save_pretrained to use EDMEulerScheduler-specific logic
def test_step_shape(self): def test_step_shape(self):
num_inference_steps = 10 num_inference_steps = 10
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment