Unverified Commit 0a401b95 authored by M. Tolga Cangöz's avatar M. Tolga Cangöz Committed by GitHub
Browse files

[`Docs`] Fix typos (#6122)

Fix typos and trim trailing whitespaces
parent 664e931b
...@@ -77,7 +77,7 @@ Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggi ...@@ -77,7 +77,7 @@ Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggi
## Quickstart ## Quickstart
Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 15000+ checkpoints): Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 16000+ checkpoints):
```python ```python
from diffusers import DiffusionPipeline from diffusers import DiffusionPipeline
...@@ -219,7 +219,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9 ...@@ -219,7 +219,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9
- https://github.com/deep-floyd/IF - https://github.com/deep-floyd/IF
- https://github.com/bentoml/BentoML - https://github.com/bentoml/BentoML
- https://github.com/bmaltais/kohya_ss - https://github.com/bmaltais/kohya_ss
- +6000 other amazing GitHub repositories 💪 - +7000 other amazing GitHub repositories 💪
Thank you for using us ❤️. Thank you for using us ❤️.
......
...@@ -18,8 +18,7 @@ limitations under the License. ...@@ -18,8 +18,7 @@ limitations under the License.
Diffusers examples are a collection of scripts to demonstrate how to effectively use the `diffusers` library Diffusers examples are a collection of scripts to demonstrate how to effectively use the `diffusers` library
for a variety of use cases involving training or fine-tuning. for a variety of use cases involving training or fine-tuning.
**Note**: If you are looking for **official** examples on how to use `diffusers` for inference, **Note**: If you are looking for **official** examples on how to use `diffusers` for inference, please have a look at [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines).
please have a look at [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines).
Our examples aspire to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**. Our examples aspire to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**.
More specifically, this means: More specifically, this means:
...@@ -27,11 +26,10 @@ More specifically, this means: ...@@ -27,11 +26,10 @@ More specifically, this means:
- **Self-contained**: An example script shall only depend on "pip-install-able" Python packages that can be found in a `requirements.txt` file. Example scripts shall **not** depend on any local files. This means that one can simply download an example script, *e.g.* [train_unconditional.py](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/train_unconditional.py), install the required dependencies, *e.g.* [requirements.txt](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/requirements.txt) and execute the example script. - **Self-contained**: An example script shall only depend on "pip-install-able" Python packages that can be found in a `requirements.txt` file. Example scripts shall **not** depend on any local files. This means that one can simply download an example script, *e.g.* [train_unconditional.py](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/train_unconditional.py), install the required dependencies, *e.g.* [requirements.txt](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/requirements.txt) and execute the example script.
- **Easy-to-tweak**: While we strive to present as many use cases as possible, the example scripts are just that - examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, most of the examples fully expose the preprocessing of the data and the training loop to allow you to tweak and edit them as required. - **Easy-to-tweak**: While we strive to present as many use cases as possible, the example scripts are just that - examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, most of the examples fully expose the preprocessing of the data and the training loop to allow you to tweak and edit them as required.
- **Beginner-friendly**: We do not aim for providing state-of-the-art training scripts for the newest models, but rather examples that can be used as a way to better understand diffusion models and how to use them with the `diffusers` library. We often purposefully leave out certain state-of-the-art methods if we consider them too complex for beginners. - **Beginner-friendly**: We do not aim for providing state-of-the-art training scripts for the newest models, but rather examples that can be used as a way to better understand diffusion models and how to use them with the `diffusers` library. We often purposefully leave out certain state-of-the-art methods if we consider them too complex for beginners.
- **One-purpose-only**: Examples should show one task and one task only. Even if a task is from a modeling - **One-purpose-only**: Examples should show one task and one task only. Even if a task is from a modeling point of view very similar, *e.g.* image super-resolution and image modification tend to use the same model and training method, we want examples to showcase only one task to keep them as readable and easy-to-understand as possible.
point of view very similar, *e.g.* image super-resolution and image modification tend to use the same model and training method, we want examples to showcase only one task to keep them as readable and easy-to-understand as possible.
We provide **official** examples that cover the most popular tasks of diffusion models. We provide **official** examples that cover the most popular tasks of diffusion models.
*Official* examples are **actively** maintained by the `diffusers` maintainers and we try to rigorously follow our example philosophy as defined above. *Official* examples are **actively** maintained by the `diffusers` maintainers and we try to rigorously follow our example philosophy as defined above.
If you feel like another important example should exist, we are more than happy to welcome a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) or directly a [Pull Request](https://github.com/huggingface/diffusers/compare) from you! If you feel like another important example should exist, we are more than happy to welcome a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) or directly a [Pull Request](https://github.com/huggingface/diffusers/compare) from you!
Training examples show how to pretrain or fine-tune diffusion models for a variety of tasks. Currently we support: Training examples show how to pretrain or fine-tune diffusion models for a variety of tasks. Currently we support:
...@@ -39,7 +37,7 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie ...@@ -39,7 +37,7 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie
| Task | 🤗 Accelerate | 🤗 Datasets | Colab | Task | 🤗 Accelerate | 🤗 Datasets | Colab
|---|---|:---:|:---:| |---|---|:---:|:---:|
| [**Unconditional Image Generation**](./unconditional_image_generation) | ✅ | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) | [**Unconditional Image Generation**](./unconditional_image_generation) | ✅ | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb)
| [**Text-to-Image fine-tuning**](./text_to_image) | ✅ | ✅ | | [**Text-to-Image fine-tuning**](./text_to_image) | ✅ | ✅ |
| [**Textual Inversion**](./textual_inversion) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) | [**Textual Inversion**](./textual_inversion) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb)
| [**Dreambooth**](./dreambooth) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_dreambooth_training.ipynb) | [**Dreambooth**](./dreambooth) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_dreambooth_training.ipynb)
| [**ControlNet**](./controlnet) | ✅ | ✅ | - | [**ControlNet**](./controlnet) | ✅ | ✅ | -
......
This diff is collapsed.
# Research projects # Research projects
This folder contains various research projects using 🧨 Diffusers. This folder contains various research projects using 🧨 Diffusers.
They are not really maintained by the core maintainers of this library and often require a specific version of Diffusers that is indicated in the requirements file of each folder. They are not really maintained by the core maintainers of this library and often require a specific version of Diffusers that is indicated in the requirements file of each folder.
Updating them to the most recent version of the library will require some work. Updating them to the most recent version of the library will require some work.
To use any of them, just run the command To use any of them, just run the command
......
## [Deprecated] Multi Token Textual Inversion ## [Deprecated] Multi Token Textual Inversion
**IMPORTART: This research project is deprecated. Multi Token Textual Inversion is now supported natively in [the officail textual inversion example](https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion#running-locally-with-pytorch).** **IMPORTART: This research project is deprecated. Multi Token Textual Inversion is now supported natively in [the official textual inversion example](https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion#running-locally-with-pytorch).**
The author of this project is [Isamu Isozaki](https://github.com/isamu-isozaki) - please make sure to tag the author for issue and PRs as well as @patrickvonplaten. The author of this project is [Isamu Isozaki](https://github.com/isamu-isozaki) - please make sure to tag the author for issue and PRs as well as @patrickvonplaten.
...@@ -17,9 +17,9 @@ Feel free to add these options to your training! In practice num_vec_per_token a ...@@ -17,9 +17,9 @@ Feel free to add these options to your training! In practice num_vec_per_token a
[Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples. [Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples.
The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion. The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion.
## Running on Colab ## Running on Colab
Colab for training Colab for training
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb)
Colab for inference Colab for inference
...@@ -53,7 +53,7 @@ accelerate config ...@@ -53,7 +53,7 @@ accelerate config
### Cat toy example ### Cat toy example
You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-5`, so you'll need to visit [its card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license and tick the checkbox if you agree. You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-5`, so you'll need to visit [its card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license and tick the checkbox if you agree.
You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens). You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
...@@ -63,7 +63,7 @@ Run the following command to authenticate your token ...@@ -63,7 +63,7 @@ Run the following command to authenticate your token
huggingface-cli login huggingface-cli login
``` ```
If you have already cloned the repo, then you won't need to go through these steps. If you have already cloned the repo, then you won't need to go through these steps.
<br> <br>
......
...@@ -2,4 +2,4 @@ ...@@ -2,4 +2,4 @@
**This research project is not actively maintained by the diffusers team. For any questions or comments, please contact Prathik Rao (prathikr), Sunghoon Choi (hanbitmyths), Ashwini Khade (askhade), or Peng Wang (pengwa) on github with any questions.** **This research project is not actively maintained by the diffusers team. For any questions or comments, please contact Prathik Rao (prathikr), Sunghoon Choi (hanbitmyths), Ashwini Khade (askhade), or Peng Wang (pengwa) on github with any questions.**
This aims to provide diffusers examples with ONNXRuntime optimizations for training/fine-tuning unconditional image generation, text to image, and textual inversion. Please see individual directories for more details on how to run each task using ONNXRuntime. This aims to provide diffusers examples with ONNXRuntime optimizations for training/fine-tuning unconditional image generation, text to image, and textual inversion. Please see individual directories for more details on how to run each task using ONNXRuntime.
\ No newline at end of file
...@@ -34,7 +34,7 @@ accelerate config ...@@ -34,7 +34,7 @@ accelerate config
### Pokemon example ### Pokemon example
You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree.
You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens). You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
...@@ -68,7 +68,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image.py \ ...@@ -68,7 +68,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image.py \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--lr_scheduler="constant" --lr_warmup_steps=0 \ --lr_scheduler="constant" --lr_warmup_steps=0 \
--output_dir="sd-pokemon-model" --output_dir="sd-pokemon-model"
``` ```
Please contact Prathik Rao (prathikr), Sunghoon Choi (hanbitmyths), Ashwini Khade (askhade), or Peng Wang (pengwa) on github with any questions. Please contact Prathik Rao (prathikr), Sunghoon Choi (hanbitmyths), Ashwini Khade (askhade), or Peng Wang (pengwa) on github with any questions.
\ No newline at end of file
...@@ -3,9 +3,9 @@ ...@@ -3,9 +3,9 @@
[Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples. [Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples.
The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion. The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion.
## Running on Colab ## Running on Colab
Colab for training Colab for training
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb)
Colab for inference Colab for inference
...@@ -39,7 +39,7 @@ accelerate config ...@@ -39,7 +39,7 @@ accelerate config
### Cat toy example ### Cat toy example
You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-5`, so you'll need to visit [its card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license and tick the checkbox if you agree. You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-5`, so you'll need to visit [its card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license and tick the checkbox if you agree.
You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens). You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
...@@ -49,7 +49,7 @@ Run the following command to authenticate your token ...@@ -49,7 +49,7 @@ Run the following command to authenticate your token
huggingface-cli login huggingface-cli login
``` ```
If you have already cloned the repo, then you won't need to go through these steps. If you have already cloned the repo, then you won't need to go through these steps.
<br> <br>
......
...@@ -35,7 +35,7 @@ from accelerate.utils import write_basic_config ...@@ -35,7 +35,7 @@ from accelerate.utils import write_basic_config
write_basic_config() write_basic_config()
``` ```
When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups. When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups.
### Toy example ### Toy example
......
...@@ -72,8 +72,8 @@ params = jax.tree_util.tree_map(lambda x: x.astype(jnp.bfloat16), params) ...@@ -72,8 +72,8 @@ params = jax.tree_util.tree_map(lambda x: x.astype(jnp.bfloat16), params)
params["scheduler"] = scheduler_state params["scheduler"] = scheduler_state
``` ```
This section adjusts the data types of the model parameters. This section adjusts the data types of the model parameters.
We convert all parameters to `bfloat16` to speed-up the computation with model weights. We convert all parameters to `bfloat16` to speed-up the computation with model weights.
**Note** that the scheduler parameters are **not** converted to `blfoat16` as the loss **Note** that the scheduler parameters are **not** converted to `blfoat16` as the loss
in precision is degrading the pipeline's performance too significantly. in precision is degrading the pipeline's performance too significantly.
**3. Define Inputs to Pipeline** **3. Define Inputs to Pipeline**
...@@ -146,12 +146,12 @@ For this we will be using a JAX feature called [Ahead of Time](https://jax.readt ...@@ -146,12 +146,12 @@ For this we will be using a JAX feature called [Ahead of Time](https://jax.readt
In [sdxl_single_aot.py](./sdxl_single_aot.py) we give a simple example of how to write our own parallelization logic for text-to-image generation pipeline in JAX using [StabilityAI's Stable Diffusion XL](stabilityai/stable-diffusion-xl-base-1.0) In [sdxl_single_aot.py](./sdxl_single_aot.py) we give a simple example of how to write our own parallelization logic for text-to-image generation pipeline in JAX using [StabilityAI's Stable Diffusion XL](stabilityai/stable-diffusion-xl-base-1.0)
We add a `aot_compile` function that compiles the `pipeline._generate` function We add a `aot_compile` function that compiles the `pipeline._generate` function
telling JAX which input arguments are static, that is, arguments that telling JAX which input arguments are static, that is, arguments that
are known at compile time and won't change. In our case, it is num_inference_steps, are known at compile time and won't change. In our case, it is num_inference_steps,
height, width and return_latents. height, width and return_latents.
Once the function is compiled, these parameters are omitted from future calls and Once the function is compiled, these parameters are omitted from future calls and
cannot be changed without modifying the code and recompiling. cannot be changed without modifying the code and recompiling.
```python ```python
...@@ -205,9 +205,9 @@ def generate( ...@@ -205,9 +205,9 @@ def generate(
g = jnp.array([guidance_scale] * prompt_ids.shape[0], dtype=jnp.float32) g = jnp.array([guidance_scale] * prompt_ids.shape[0], dtype=jnp.float32)
g = g[:, None] g = g[:, None]
images = p_generate( images = p_generate(
prompt_ids, prompt_ids,
p_params, p_params,
rng, rng,
g, g,
None, None,
neg_prompt_ids) neg_prompt_ids)
...@@ -220,7 +220,7 @@ def generate( ...@@ -220,7 +220,7 @@ def generate(
The first forward pass after AOT compilation still takes a while longer than The first forward pass after AOT compilation still takes a while longer than
subsequent passes, this is because on the first pass, JAX uses Python dispatch, which subsequent passes, this is because on the first pass, JAX uses Python dispatch, which
Fills the C++ dispatch cache. Fills the C++ dispatch cache.
When using jit, this extra step is done automatically, but when using AOT compilation, When using jit, this extra step is done automatically, but when using AOT compilation,
it doesn't happen until the function call is made. it doesn't happen until the function call is made.
```python ```python
......
...@@ -42,7 +42,7 @@ from accelerate.utils import write_basic_config ...@@ -42,7 +42,7 @@ from accelerate.utils import write_basic_config
write_basic_config() write_basic_config()
``` ```
When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups. When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups.
## Circle filling dataset ## Circle filling dataset
...@@ -85,7 +85,7 @@ accelerate launch train_t2i_adapter_sdxl.py \ ...@@ -85,7 +85,7 @@ accelerate launch train_t2i_adapter_sdxl.py \
To better track our training experiments, we're using the following flags in the command above: To better track our training experiments, we're using the following flags in the command above:
* `report_to="wandb` will ensure the training runs are tracked on Weights and Biases. To use it, be sure to install `wandb` with `pip install wandb`. * `report_to="wandb` will ensure the training runs are tracked on Weights and Biases. To use it, be sure to install `wandb` with `pip install wandb`.
* `validation_image`, `validation_prompt`, and `validation_steps` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected. * `validation_image`, `validation_prompt`, and `validation_steps` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected.
Our experiments were conducted on a single 40GB A100 GPU. Our experiments were conducted on a single 40GB A100 GPU.
......
...@@ -36,7 +36,7 @@ Note also that we use PEFT library as backend for LoRA training, make sure to ha ...@@ -36,7 +36,7 @@ Note also that we use PEFT library as backend for LoRA training, make sure to ha
### Pokemon example ### Pokemon example
You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree.
You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens). You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
...@@ -71,7 +71,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image.py \ ...@@ -71,7 +71,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image.py \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--lr_scheduler="constant" --lr_warmup_steps=0 \ --lr_scheduler="constant" --lr_warmup_steps=0 \
--output_dir="sd-pokemon-model" --output_dir="sd-pokemon-model"
``` ```
<!-- accelerate_snippet_end --> <!-- accelerate_snippet_end -->
...@@ -145,11 +145,11 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image.py \ ...@@ -145,11 +145,11 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image.py \
--train_batch_size=1 \ --train_batch_size=1 \
--gradient_accumulation_steps=4 \ --gradient_accumulation_steps=4 \
--gradient_checkpointing \ --gradient_checkpointing \
--max_train_steps=15000 \ --max_train_steps=15000 \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--lr_scheduler="constant" --lr_warmup_steps=0 \ --lr_scheduler="constant" --lr_warmup_steps=0 \
--output_dir="sd-pokemon-model" --output_dir="sd-pokemon-model"
``` ```
...@@ -157,7 +157,7 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image.py \ ...@@ -157,7 +157,7 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image.py \
We support training with the Min-SNR weighting strategy proposed in [Efficient Diffusion Training via Min-SNR Weighting Strategy](https://arxiv.org/abs/2303.09556) which helps to achieve faster convergence We support training with the Min-SNR weighting strategy proposed in [Efficient Diffusion Training via Min-SNR Weighting Strategy](https://arxiv.org/abs/2303.09556) which helps to achieve faster convergence
by rebalancing the loss. In order to use it, one needs to set the `--snr_gamma` argument. The recommended by rebalancing the loss. In order to use it, one needs to set the `--snr_gamma` argument. The recommended
value when using it is 5.0. value when using it is 5.0.
You can find [this project on Weights and Biases](https://wandb.ai/sayakpaul/text2image-finetune-minsnr) that compares the loss surfaces of the following setups: You can find [this project on Weights and Biases](https://wandb.ai/sayakpaul/text2image-finetune-minsnr) that compares the loss surfaces of the following setups:
...@@ -167,7 +167,7 @@ You can find [this project on Weights and Biases](https://wandb.ai/sayakpaul/tex ...@@ -167,7 +167,7 @@ You can find [this project on Weights and Biases](https://wandb.ai/sayakpaul/tex
For our small Pokemons dataset, the effects of Min-SNR weighting strategy might not appear to be pronounced, but for larger datasets, we believe the effects will be more pronounced. For our small Pokemons dataset, the effects of Min-SNR weighting strategy might not appear to be pronounced, but for larger datasets, we believe the effects will be more pronounced.
Also, note that in this example, we either predict `epsilon` (i.e., the noise) or the `v_prediction`. For both of these cases, the formulation of the Min-SNR weighting strategy that we have used holds. Also, note that in this example, we either predict `epsilon` (i.e., the noise) or the `v_prediction`. For both of these cases, the formulation of the Min-SNR weighting strategy that we have used holds.
## Training with LoRA ## Training with LoRA
...@@ -186,7 +186,7 @@ on consumer GPUs like Tesla T4, Tesla V100. ...@@ -186,7 +186,7 @@ on consumer GPUs like Tesla T4, Tesla V100.
### Training ### Training
First, you need to set up your development environment as is explained in the [installation section](#installing-the-dependencies). Make sure to set the `MODEL_NAME` and `DATASET_NAME` environment variables. Here, we will use [Stable Diffusion v1-4](https://hf.co/CompVis/stable-diffusion-v1-4) and the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions). First, you need to set up your development environment as is explained in the [installation section](#installing-the-dependencies). Make sure to set the `MODEL_NAME` and `DATASET_NAME` environment variables. Here, we will use [Stable Diffusion v1-4](https://hf.co/CompVis/stable-diffusion-v1-4) and the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions).
**___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___** **___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___**
...@@ -197,7 +197,7 @@ export MODEL_NAME="CompVis/stable-diffusion-v1-4" ...@@ -197,7 +197,7 @@ export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/pokemon-blip-captions"
``` ```
For this example we want to directly store the trained LoRA embeddings on the Hub, so For this example we want to directly store the trained LoRA embeddings on the Hub, so
we need to be logged in and add the `--push_to_hub` flag. we need to be logged in and add the `--push_to_hub` flag.
```bash ```bash
...@@ -225,11 +225,11 @@ The above command will also run inference as fine-tuning progresses and log the ...@@ -225,11 +225,11 @@ The above command will also run inference as fine-tuning progresses and log the
The final LoRA embedding weights have been uploaded to [sayakpaul/sd-model-finetuned-lora-t4](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4). **___Note: [The final weights](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4/blob/main/pytorch_lora_weights.bin) are only 3 MB in size, which is orders of magnitudes smaller than the original model.___** The final LoRA embedding weights have been uploaded to [sayakpaul/sd-model-finetuned-lora-t4](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4). **___Note: [The final weights](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4/blob/main/pytorch_lora_weights.bin) are only 3 MB in size, which is orders of magnitudes smaller than the original model.___**
You can check some inference samples that were logged during the course of the fine-tuning process [here](https://wandb.ai/sayakpaul/text2image-fine-tune/runs/q4lc0xsw). You can check some inference samples that were logged during the course of the fine-tuning process [here](https://wandb.ai/sayakpaul/text2image-fine-tune/runs/q4lc0xsw).
### Inference ### Inference
Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline` after loading the trained LoRA weights. You Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline` after loading the trained LoRA weights. You
need to pass the `output_dir` for loading the LoRA weights which, in this case, is `sd-pokemon-model-lora`. need to pass the `output_dir` for loading the LoRA weights which, in this case, is `sd-pokemon-model-lora`.
```python ```python
...@@ -248,9 +248,9 @@ image.save("pokemon.png") ...@@ -248,9 +248,9 @@ image.save("pokemon.png")
If you are loading the LoRA parameters from the Hub and if the Hub repository has If you are loading the LoRA parameters from the Hub and if the Hub repository has
a `base_model` tag (such as [this](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4/blob/main/README.md?code=true#L4)), then a `base_model` tag (such as [this](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4/blob/main/README.md?code=true#L4)), then
you can do: you can do:
```py ```py
from huggingface_hub.repocard import RepoCard from huggingface_hub.repocard import RepoCard
lora_model_id = "sayakpaul/sd-model-finetuned-lora-t4" lora_model_id = "sayakpaul/sd-model-finetuned-lora-t4"
...@@ -287,7 +287,7 @@ python train_text_to_image_flax.py \ ...@@ -287,7 +287,7 @@ python train_text_to_image_flax.py \
--max_train_steps=15000 \ --max_train_steps=15000 \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--output_dir="sd-pokemon-model" --output_dir="sd-pokemon-model"
``` ```
To run on your own training files prepare the dataset according to the format required by `datasets`, you can find the instructions for how to do that in this [document](https://huggingface.co/docs/datasets/v2.4.0/en/image_load#imagefolder-with-metadata). To run on your own training files prepare the dataset according to the format required by `datasets`, you can find the instructions for how to do that in this [document](https://huggingface.co/docs/datasets/v2.4.0/en/image_load#imagefolder-with-metadata).
...@@ -321,5 +321,5 @@ According to [this issue](https://github.com/huggingface/diffusers/issues/2234#i ...@@ -321,5 +321,5 @@ According to [this issue](https://github.com/huggingface/diffusers/issues/2234#i
## Stable Diffusion XL ## Stable Diffusion XL
* We support fine-tuning the UNet shipped in [Stable Diffusion XL](https://huggingface.co/papers/2307.01952) via the `train_text_to_image_sdxl.py` script. Please refer to the docs [here](./README_sdxl.md). * We support fine-tuning the UNet shipped in [Stable Diffusion XL](https://huggingface.co/papers/2307.01952) via the `train_text_to_image_sdxl.py` script. Please refer to the docs [here](./README_sdxl.md).
* We also support fine-tuning of the UNet and Text Encoder shipped in [Stable Diffusion XL](https://huggingface.co/papers/2307.01952) with LoRA via the `train_text_to_image_lora_sdxl.py` script. Please refer to the docs [here](./README_sdxl.md). * We also support fine-tuning of the UNet and Text Encoder shipped in [Stable Diffusion XL](https://huggingface.co/papers/2307.01952) with LoRA via the `train_text_to_image_lora_sdxl.py` script. Please refer to the docs [here](./README_sdxl.md).
...@@ -3,9 +3,9 @@ ...@@ -3,9 +3,9 @@
[Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples. [Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples.
The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion. The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion.
## Running on Colab ## Running on Colab
Colab for training Colab for training
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb)
Colab for inference Colab for inference
...@@ -84,11 +84,11 @@ accelerate launch textual_inversion.py \ ...@@ -84,11 +84,11 @@ accelerate launch textual_inversion.py \
A full training run takes ~1 hour on one V100 GPU. A full training run takes ~1 hour on one V100 GPU.
**Note**: As described in [the official paper](https://arxiv.org/abs/2208.01618) **Note**: As described in [the official paper](https://arxiv.org/abs/2208.01618)
only one embedding vector is used for the placeholder token, *e.g.* `"<cat-toy>"`. only one embedding vector is used for the placeholder token, *e.g.* `"<cat-toy>"`.
However, one can also add multiple embedding vectors for the placeholder token However, one can also add multiple embedding vectors for the placeholder token
to increase the number of fine-tuneable parameters. This can help the model to learn to increase the number of fine-tuneable parameters. This can help the model to learn
more complex details. To use multiple embedding vectors, you should define `--num_vectors` more complex details. To use multiple embedding vectors, you should define `--num_vectors`
to a number larger than one, *e.g.*: to a number larger than one, *e.g.*:
```bash ```bash
--num_vectors 5 --num_vectors 5
......
...@@ -27,7 +27,7 @@ And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) e ...@@ -27,7 +27,7 @@ And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) e
accelerate config accelerate config
``` ```
### Unconditional Flowers ### Unconditional Flowers
The command to train a DDPM UNet model on the Oxford Flowers dataset: The command to train a DDPM UNet model on the Oxford Flowers dataset:
...@@ -52,7 +52,7 @@ A full training run takes 2 hours on 4xV100 GPUs. ...@@ -52,7 +52,7 @@ A full training run takes 2 hours on 4xV100 GPUs.
<img src="https://user-images.githubusercontent.com/26864830/180248660-a0b143d0-b89a-42c5-8656-2ebf6ece7e52.png" width="700" /> <img src="https://user-images.githubusercontent.com/26864830/180248660-a0b143d0-b89a-42c5-8656-2ebf6ece7e52.png" width="700" />
### Unconditional Pokemon ### Unconditional Pokemon
The command to train a DDPM UNet model on the Pokemon dataset: The command to train a DDPM UNet model on the Pokemon dataset:
...@@ -96,7 +96,7 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_unconditional.py \ ...@@ -96,7 +96,7 @@ accelerate launch --mixed_precision="fp16" --multi_gpu train_unconditional.py \
--logger="wandb" --logger="wandb"
``` ```
To be able to use Weights and Biases (`wandb`) as a logger you need to install the library: `pip install wandb`. To be able to use Weights and Biases (`wandb`) as a logger you need to install the library: `pip install wandb`.
### Using your own data ### Using your own data
......
...@@ -72,7 +72,7 @@ In a nutshell, LoRA allows adapting pretrained models by adding pairs of rank-de ...@@ -72,7 +72,7 @@ In a nutshell, LoRA allows adapting pretrained models by adding pairs of rank-de
### Prior Training ### Prior Training
First, you need to set up your development environment as explained in the [installation](#Running-locally-with-PyTorch) section. Make sure to set the `DATASET_NAME` environment variable. Here, we will use the [Pokemon captions dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions). First, you need to set up your development environment as explained in the [installation](#Running-locally-with-PyTorch) section. Make sure to set the `DATASET_NAME` environment variable. Here, we will use the [Pokemon captions dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions).
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/pokemon-blip-captions"
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment