We recommend installing 🤗 Diffusers in a virtual environment from PyPI or Conda. For more details about installing [PyTorch](https://pytorch.org/get-started/locally/) and [Flax](https://flax.readthedocs.io/en/latest/#installation), please refer to their official documentation.
We recommend installing 🤗 Diffusers in a virtual environment from PyPI or Conda. For more details about installing [PyTorch](https://pytorch.org/get-started/locally/), please refer to their official documentation.
### PyTorch
### PyTorch
...
@@ -53,14 +53,6 @@ With `conda` (maintained by the community):
...
@@ -53,14 +53,6 @@ With `conda` (maintained by the community):
conda install-c conda-forge diffusers
conda install-c conda-forge diffusers
```
```
### Flax
With `pip` (official package):
```bash
pip install--upgrade diffusers[flax]
```
### Apple Silicon (M1/M2) support
### Apple Silicon (M1/M2) support
Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggingface.co/docs/diffusers/optimization/mps) guide.
Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggingface.co/docs/diffusers/optimization/mps) guide.
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
...
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# Installation
# Installation
Diffusers is tested on Python 3.8+, PyTorch 1.4+, and Flax 0.4.1+. Follow the installation instructions for the deep learning library you're using,[PyTorch](https://pytorch.org/get-started/locally/)or [Flax](https://flax.readthedocs.io/en/latest/).
Diffusers is tested on Python 3.8+ and PyTorch 1.4+. Install[PyTorch](https://pytorch.org/get-started/locally/)according to your system and setup.
Create a [virtual environment](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/) for easier management of separate projects and to avoid compatibility issues between dependencies. Use [uv](https://docs.astral.sh/uv/), a Rust-based Python package and project manager, to create a virtual environment and install Diffusers.
Create a [virtual environment](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/) for easier management of separate projects and to avoid compatibility issues between dependencies. Use [uv](https://docs.astral.sh/uv/), a Rust-based Python package and project manager, to create a virtual environment and install Diffusers.
...
@@ -32,12 +32,6 @@ PyTorch only supports Python 3.8 - 3.11 on Windows.
...
@@ -32,12 +32,6 @@ PyTorch only supports Python 3.8 - 3.11 on Windows.
uv pip install diffusers["torch"] transformers
uv pip install diffusers["torch"] transformers
```
```
Use the command below for Flax.
```bash
uv pip install diffusers["flax"] transformers
```
</hfoption>
</hfoption>
<hfoptionid="conda">
<hfoptionid="conda">
...
@@ -71,27 +65,12 @@ An editable install is recommended for development workflows or if you're using
...
@@ -71,27 +65,12 @@ An editable install is recommended for development workflows or if you're using
Clone the repository and install Diffusers with the following commands.
Clone the repository and install Diffusers with the following commands.
@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.
...
@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.
[ControlNet](https://hf.co/papers/2302.05543) models are adapters trained on top of another pretrained model. It allows for a greater degree of control over image generation by conditioning the model with an additional input image. The input image can be a canny edge, depth map, human pose, and many more.
[ControlNet](https://hf.co/papers/2302.05543) models are adapters trained on top of another pretrained model. It allows for a greater degree of control over image generation by conditioning the model with an additional input image. The input image can be a canny edge, depth map, human pose, and many more.
If you're training on a GPU with limited vRAM, you should try enabling the `gradient_checkpointing`, `gradient_accumulation_steps`, and `mixed_precision` parameters in the training command. You can also reduce your memory footprint by using memory-efficient attention with [xFormers](../optimization/xformers). JAX/Flax training is also supported for efficient training on TPUs and GPUs, but it doesn't support gradient checkpointing or xFormers. You should have a GPU with >30GB of memory if you want to train faster with Flax.
If you're training on a GPU with limited vRAM, you should try enabling the `gradient_checkpointing`, `gradient_accumulation_steps`, and `mixed_precision` parameters in the training command. You can also reduce your memory footprint by using memory-efficient attention with [xFormers](../optimization/xformers).
This guide will explore the [train_controlnet.py](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/train_controlnet.py) training script to help you become familiar with it, and how you can adapt it for your own use-case.
This guide will explore the [train_controlnet.py](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/train_controlnet.py) training script to help you become familiar with it, and how you can adapt it for your own use-case.
...
@@ -28,45 +28,10 @@ pip install .
...
@@ -28,45 +28,10 @@ pip install .
Then navigate to the example folder containing the training script and install the required dependencies for the script you're using:
Then navigate to the example folder containing the training script and install the required dependencies for the script you're using:
<hfoptionsid="installation">
<hfoptionid="PyTorch">
```bash
```bash
cd examples/controlnet
cd examples/controlnet
pip install-r requirements.txt
pip install-r requirements.txt
```
```
</hfoption>
<hfoptionid="Flax">
If you have access to a TPU, the Flax training script runs even faster! Let's run the training script on the [Google Cloud TPU VM](https://cloud.google.com/tpu/docs/run-calculation-jax). Create a single TPU v4-8 VM and connect to it:
Then install the required dependencies for the Flax script:
```bash
cd examples/controlnet
pip install-r requirements_flax.txt
```
</hfoption>
</hfoptions>
<Tip>
<Tip>
...
@@ -120,7 +85,7 @@ Many of the basic and important parameters are described in the [Text-to-image](
...
@@ -120,7 +85,7 @@ Many of the basic and important parameters are described in the [Text-to-image](
### Min-SNR weighting
### Min-SNR weighting
The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch and is unavailable in the Flax training script.
The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch.
Add the `--snr_gamma` parameter and set it to the recommended value of 5.0:
Add the `--snr_gamma` parameter and set it to the recommended value of 5.0:
...
@@ -272,9 +237,6 @@ That's it! You don't need to add any additional parameters to your training comm
...
@@ -272,9 +237,6 @@ That's it! You don't need to add any additional parameters to your training comm
With Flax, you can [profile your code](https://jax.readthedocs.io/en/latest/profiling.html) by adding the `--profile_steps==5` parameter to your training command. Install the Tensorboard profile plugin:
Then you can inspect the profile at [http://localhost:6006/#profile](http://localhost:6006/#profile).
<Tipwarning={true}>
If you run into version conflicts with the plugin, try uninstalling and reinstalling all versions of TensorFlow and Tensorboard. The debugging functionality of the profile plugin is still experimental, and not all views are fully functional. The `trace_viewer` cuts off events after 1M, which can result in all your device traces getting lost if for example, you profile the compilation step by accident.
@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.
...
@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.
[DreamBooth](https://huggingface.co/papers/2208.12242) is a training technique that updates the entire diffusion model by training on just a few images of a subject or style. It works by associating a special word in the prompt with the example images.
[DreamBooth](https://huggingface.co/papers/2208.12242) is a training technique that updates the entire diffusion model by training on just a few images of a subject or style. It works by associating a special word in the prompt with the example images.
If you're training on a GPU with limited vRAM, you should try enabling the `gradient_checkpointing` and `mixed_precision` parameters in the training command. You can also reduce your memory footprint by using memory-efficient attention with [xFormers](../optimization/xformers). JAX/Flax training is also supported for efficient training on TPUs and GPUs, but it doesn't support gradient checkpointing or xFormers. You should have a GPU with >30GB of memory if you want to train faster with Flax.
If you're training on a GPU with limited vRAM, you should try enabling the `gradient_checkpointing` and `mixed_precision` parameters in the training command. You can also reduce your memory footprint by using memory-efficient attention with [xFormers](../optimization/xformers).
This guide will explore the [train_dreambooth.py](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py) script to help you become more familiar with it, and how you can adapt it for your own use-case.
This guide will explore the [train_dreambooth.py](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py) script to help you become more familiar with it, and how you can adapt it for your own use-case.
...
@@ -28,25 +28,11 @@ pip install .
...
@@ -28,25 +28,11 @@ pip install .
Navigate to the example folder with the training script and install the required dependencies for the script you're using:
Navigate to the example folder with the training script and install the required dependencies for the script you're using:
<hfoptionsid="installation">
<hfoptionid="PyTorch">
```bash
```bash
cd examples/dreambooth
cd examples/dreambooth
pip install-r requirements.txt
pip install-r requirements.txt
```
```
</hfoption>
<hfoptionid="Flax">
```bash
cd examples/dreambooth
pip install-r requirements_flax.txt
```
</hfoption>
</hfoptions>
<Tip>
<Tip>
🤗 Accelerate is a library for helping you train on multiple GPUs/TPUs or with mixed-precision. It'll automatically configure your training setup based on your hardware and environment. Take a look at the 🤗 Accelerate [Quick tour](https://huggingface.co/docs/accelerate/quicktour) to learn more.
🤗 Accelerate is a library for helping you train on multiple GPUs/TPUs or with mixed-precision. It'll automatically configure your training setup based on your hardware and environment. Take a look at the 🤗 Accelerate [Quick tour](https://huggingface.co/docs/accelerate/quicktour) to learn more.
...
@@ -110,7 +96,7 @@ Some basic and important parameters to know and specify are:
...
@@ -110,7 +96,7 @@ Some basic and important parameters to know and specify are:
### Min-SNR weighting
### Min-SNR weighting
The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch and is unavailable in the Flax training script.
The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch.
Add the `--snr_gamma` parameter and set it to the recommended value of 5.0:
Add the `--snr_gamma` parameter and set it to the recommended value of 5.0:
...
@@ -311,9 +297,6 @@ That's it! You don't need to add any additional parameters to your training comm
...
@@ -311,9 +297,6 @@ That's it! You don't need to add any additional parameters to your training comm
LoRA is a training technique for significantly reducing the number of trainable parameters. As a result, training is faster and it is easier to store the resulting weights because they are a lot smaller (~100MBs). Use the [train_dreambooth_lora.py](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora.py) script to train with LoRA.
LoRA is a training technique for significantly reducing the number of trainable parameters. As a result, training is faster and it is easier to store the resulting weights because they are a lot smaller (~100MBs). Use the [train_dreambooth_lora.py](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora.py) script to train with LoRA.
@@ -88,7 +88,7 @@ Most of the parameters are identical to the parameters in the [Text-to-image](te
...
@@ -88,7 +88,7 @@ Most of the parameters are identical to the parameters in the [Text-to-image](te
### Min-SNR weighting
### Min-SNR weighting
The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch and is unavailable in the Flax training script.
The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch.
Add the `--snr_gamma` parameter and set it to the recommended value of 5.0:
Add the `--snr_gamma` parameter and set it to the recommended value of 5.0:
Navigate to the example folder with the training script and install the required dependencies for the script you're using:
Navigate to the example folder with the training script and install the required dependencies for the script you're using:
<hfoptionsid="installation">
<hfoptionid="PyTorch">
```bash
```bash
cd examples/text_to_image
cd examples/text_to_image
pip install-r requirements.txt
pip install-r requirements.txt
```
```
</hfoption>
<hfoptionid="Flax">
```bash
cd examples/text_to_image
pip install-r requirements_flax.txt
```
</hfoption>
</hfoptions>
<Tip>
<Tip>
🤗 Accelerate is a library for helping you train on multiple GPUs/TPUs or with mixed-precision. It'll automatically configure your training setup based on your hardware and environment. Take a look at the 🤗 Accelerate [Quick tour](https://huggingface.co/docs/accelerate/quicktour) to learn more.
🤗 Accelerate is a library for helping you train on multiple GPUs/TPUs or with mixed-precision. It'll automatically configure your training setup based on your hardware and environment. Take a look at the 🤗 Accelerate [Quick tour](https://huggingface.co/docs/accelerate/quicktour) to learn more.
These examples are **actively** maintained, so please feel free to open an issue if they aren't working as expected. If you feel like another training example should be included, you're more than welcome to start a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) to discuss your feature idea with us and whether it meets our criteria of being self-contained, easy-to-tweak, beginner-friendly, and single-purpose.
These examples are **actively** maintained, so please feel free to open an issue if they aren't working as expected. If you feel like another training example should be included, you're more than welcome to start a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) to discuss your feature idea with us and whether it meets our criteria of being self-contained, easy-to-tweak, beginner-friendly, and single-purpose.
...
@@ -48,7 +48,7 @@ cd diffusers
...
@@ -48,7 +48,7 @@ cd diffusers
pip install .
pip install .
```
```
Then navigate to the folder of the training script (for example, [DreamBooth](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth)) and install the `requirements.txt` file. Some training scripts have a specific requirement file for SDXL, LoRA or Flax. If you're using one of these scripts, make sure you install its corresponding requirements file.
Then navigate to the folder of the training script (for example, [DreamBooth](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth)) and install the `requirements.txt` file. Some training scripts have a specific requirement file for SDXL or LoRA. If you're using one of these scripts, make sure you install its corresponding requirements file.
@@ -96,7 +96,7 @@ Most of the parameters are identical to the parameters in the [Text-to-image](te
...
@@ -96,7 +96,7 @@ Most of the parameters are identical to the parameters in the [Text-to-image](te
### Min-SNR weighting
### Min-SNR weighting
The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting either `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch and is unavailable in the Flax training script.
The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting either `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch.
Add the `--snr_gamma` parameter and set it to the recommended value of 5.0:
Add the `--snr_gamma` parameter and set it to the recommended value of 5.0:
@@ -20,7 +20,7 @@ The text-to-image script is experimental, and it's easy to overfit and run into
...
@@ -20,7 +20,7 @@ The text-to-image script is experimental, and it's easy to overfit and run into
Text-to-image models like Stable Diffusion are conditioned to generate images given a text prompt.
Text-to-image models like Stable Diffusion are conditioned to generate images given a text prompt.
Training a model can be taxing on your hardware, but if you enable `gradient_checkpointing` and `mixed_precision`, it is possible to train a model on a single 24GB GPU. If you're training with larger batch sizes or want to train faster, it's better to use GPUs with more than 30GB of memory. You can reduce your memory footprint by enabling memory-efficient attention with [xFormers](../optimization/xformers). JAX/Flax training is also supported for efficient training on TPUs and GPUs, but it doesn't support gradient checkpointing, gradient accumulation or xFormers. A GPU with at least 30GB of memory or a TPU v3 is recommended for training with Flax.
Training a model can be taxing on your hardware, but if you enable `gradient_checkpointing` and `mixed_precision`, it is possible to train a model on a single 24GB GPU. If you're training with larger batch sizes or want to train faster, it's better to use GPUs with more than 30GB of memory. You can reduce your memory footprint by enabling memory-efficient attention with [xFormers](../optimization/xformers).
This guide will explore the [train_text_to_image.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py) training script to help you become familiar with it, and how you can adapt it for your own use-case.
This guide will explore the [train_text_to_image.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py) training script to help you become familiar with it, and how you can adapt it for your own use-case.
...
@@ -34,20 +34,10 @@ pip install .
...
@@ -34,20 +34,10 @@ pip install .
Then navigate to the example folder containing the training script and install the required dependencies for the script you're using:
Then navigate to the example folder containing the training script and install the required dependencies for the script you're using:
<hfoptionsid="installation">
<hfoptionid="PyTorch">
```bash
```bash
cd examples/text_to_image
cd examples/text_to_image
pip install-r requirements.txt
pip install-r requirements.txt
```
```
</hfoption>
<hfoptionid="Flax">
```bash
cd examples/text_to_image
pip install-r requirements_flax.txt
```
</hfoption>
</hfoptions>
<Tip>
<Tip>
...
@@ -106,7 +96,7 @@ Some basic and important parameters include:
...
@@ -106,7 +96,7 @@ Some basic and important parameters include:
### Min-SNR weighting
### Min-SNR weighting
The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch and is unavailable in the Flax training script.
The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch.
Add the `--snr_gamma` parameter and set it to the recommended value of 5.0:
Add the `--snr_gamma` parameter and set it to the recommended value of 5.0:
...
@@ -155,9 +145,6 @@ Lastly, the [training loop](https://github.com/huggingface/diffusers/blob/8959c5
...
@@ -155,9 +145,6 @@ Lastly, the [training loop](https://github.com/huggingface/diffusers/blob/8959c5
Once you've made all your changes or you're okay with the default configuration, you're ready to launch the training script! 🚀
Once you've made all your changes or you're okay with the default configuration, you're ready to launch the training script! 🚀
<hfoptionsid="training-inference">
<hfoptionid="PyTorch">
Let's train on the [Naruto BLIP captions](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) dataset to generate your own Naruto characters. Set the environment variables `MODEL_NAME` and `dataset_name` to the model and the dataset (either from the Hub or a local path). If you're training on more than one GPU, add the `--multi_gpu` parameter to the `accelerate launch` command.
Let's train on the [Naruto BLIP captions](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) dataset to generate your own Naruto characters. Set the environment variables `MODEL_NAME` and `dataset_name` to the model and the dataset (either from the Hub or a local path). If you're training on more than one GPU, add the `--multi_gpu` parameter to the `accelerate launch` command.
Training with Flax can be faster on TPUs and GPUs thanks to [@duongna211](https://github.com/duongna21). Flax is more efficient on a TPU, but GPU performance is also great.
Set the environment variables `MODEL_NAME` and `dataset_name` to the model and the dataset (either from the Hub or a local path).
<Tip>
To train on a local dataset, set the `TRAIN_DIR` and `OUTPUT_DIR` environment variables to the path of the dataset and where to save the model to.