更新diffusers文件夹

3d1b9667 · wangwei990215 · b6a53272 · b6a53272 · 3d1b9667 · 3d1b9667
Commit 3d1b9667 authored Apr 17, 2025 by wangwei990215
20 changed files
--- a/diffusers-0.27.0/examples/text_to_image/output/sd2.1/pytorch_lora_weights.safetensors
+++ b/diffusers-0.27.0/examples/text_to_image/output/sd2.1/pytorch_lora_weights.safetensors
--- a/diffusers-0.27.0/examples/text_to_image/run-4.sh
+++ b/diffusers-0.27.0/examples/text_to_image/run-4.sh
-export MODEL_NAME="/path/to/stable-diffusion-2-1-base/"
+# source /public/home/chenxi/hopt/dtk24042-miopen.sh 
+export MODEL_NAME="/path/to/stable-diffusion-2-1-base"
 export OUTPUT_DIR="./output/sd2.1"
-export DATASET_NAME="/path/to/data/train-00000-of-00001-566cc9b19d7203f8.parquet"
+export DATASET_NAME="/path/to/train-00000-of-00001-566cc9b19d7203f8.parquet"
 #export DATASET_NAME="./pokemon-blip-captions"
-#export HIP_VISIBLE_DEVICES=7
+export HIP_VISIBLE_DEVICES=4,5,6,7
 #export LD_LIBRARY_PATH=/opt/rocblas-install/lib/:$LD_LIBRARY_PATH
 #export PYTORCH_HIP_ALLOC_CONF=max_split_size_mb:25314
-export HIP_VISIBLE_DEVICES=4,5,6,7
 torchrun --nproc_per_node=4 train_text_to_image_lora.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
@@ -14,7 +14,7 @@ torchrun --nproc_per_node=4 train_text_to_image_lora.py \
  --resolution=960 \
  --center_crop \
  --random_flip \
-  --train_batch_size=8 \
+  --train_batch_size=2 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=101 \
  --learning_rate=1e-04 \

--- a/diffusers-0.27.0/examples/text_to_image/run.sh
+++ b/diffusers-0.27.0/examples/text_to_image/run.sh
 # source /public/home/chenxi/hopt/dtk24042-miopen.sh 
-export MODEL_NAME="/path/to/stable-diffusion-2-1-base/"
+export MODEL_NAME="/path/to/stable-diffusion-2-1-base"
 export OUTPUT_DIR="./output/sd2.1"
-export DATASET_NAME="/path/to/data/train-00000-of-00001-566cc9b19d7203f8.parquet"
+export DATASET_NAME="path/to/train-00000-of-00001-566cc9b19d7203f8.parquet"
 #export DATASET_NAME="./pokemon-blip-captions"
-export HIP_VISIBLE_DEVICES=4,5,6,7
+export HIP_VISIBLE_DEVICES=7
 #export LD_LIBRARY_PATH=/opt/rocblas-install/lib/:$LD_LIBRARY_PATH
 #export PYTORCH_HIP_ALLOC_CONF=max_split_size_mb:25314
-python train_text_to_image_lora.py \
+python  train_text_to_image_lora.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$DATASET_NAME \
  --dataloader_num_workers=8 \
  --resolution=960 \
  --center_crop \
  --random_flip \
-  --train_batch_size=2 \
+  --train_batch_size=8 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=101 \
  --learning_rate=1e-04 \

--- a/diffusers-0.27.0/examples/text_to_image/train_text_to_image_lora.py
+++ b/diffusers-0.27.0/examples/text_to_image/train_text_to_image_lora.py
@@ -766,7 +766,7 @@ def main():
    for epoch in range(first_epoch, args.num_train_epochs):
        unet.train()
        train_loss = 0.0
-        # from layer_check_pt import acc_check_hook, register_hook 
+        from layer_check_pt import acc_check_hook, register_hook 
        for step, batch in enumerate(train_dataloader):
            with accelerator.accumulate(unet):
                # Convert images to latent space

--- a/diffusers-0.27.0/examples/textual_inversion/README.md
+++ b/diffusers-0.27.0/examples/textual_inversion/README.md
+## Textual Inversion fine-tuning example
+[Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples.
+The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion.
+## Running on Colab
+Colab for training
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb)
+Colab for inference
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_conceptualizer_inference.ipynb)
+## Running locally with PyTorch
+### Installing the dependencies
+Before running the scripts, make sure to install the library's training dependencies:
+**Important**
+To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment:
+```bash
+git clone https://github.com/huggingface/diffusers
+cd diffusers
+pip install .
+```
+Then cd in the example folder and run:
+```bash
+pip install -r requirements.txt
+```
+And initialize an [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with:
+```bash
+accelerate config
+```
+### Cat toy example
+First, let's login so that we can upload the checkpoint to the Hub during training:
+```bash
+huggingface-cli login
+```
+Now let's get our dataset. For this example we will use some cat images: https://huggingface.co/datasets/diffusers/cat_toy_example .
+Let's first download it locally:
+```py
+from huggingface_hub import snapshot_download
+local_dir = "./cat"
+snapshot_download("diffusers/cat_toy_example", local_dir=local_dir, repo_type="dataset", ignore_patterns=".gitattributes")
+```
+This will be our training data.
+Now we can launch the training using:
+**___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___**
+**___Note: Please follow the [README_sdxl.md](./README_sdxl.md) if you are using the [stable-diffusion-xl](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0).___**
+```bash
+export MODEL_NAME="runwayml/stable-diffusion-v1-5"
+export DATA_DIR="./cat"
+accelerate launch textual_inversion.py \
+  --pretrained_model_name_or_path=$MODEL_NAME \
+  --train_data_dir=$DATA_DIR \
+  --learnable_property="object" \
+  --placeholder_token="<cat-toy>" \
+  --initializer_token="toy" \
+  --resolution=512 \
+  --train_batch_size=1 \
+  --gradient_accumulation_steps=4 \
+  --max_train_steps=3000 \
+  --learning_rate=5.0e-04 \
+  --scale_lr \
+  --lr_scheduler="constant" \
+  --lr_warmup_steps=0 \
+  --push_to_hub \
+  --output_dir="textual_inversion_cat"
+```
+A full training run takes ~1 hour on one V100 GPU.
+**Note**: As described in [the official paper](https://arxiv.org/abs/2208.01618)
+only one embedding vector is used for the placeholder token, *e.g.* `"<cat-toy>"`.
+However, one can also add multiple embedding vectors for the placeholder token
+to increase the number of fine-tuneable parameters. This can help the model to learn
+more complex details. To use multiple embedding vectors, you should define `--num_vectors`
+to a number larger than one, *e.g.*:
+```bash
+--num_vectors 5
+```
+The saved textual inversion vectors will then be larger in size compared to the default case.
+### Inference
+Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `placeholder_token` in your prompt.
+```python
+from diffusers import StableDiffusionPipeline
+import torch
+model_id = "path-to-your-trained-model"
+pipe = StableDiffusionPipeline.from_pretrained(model_id,torch_dtype=torch.float16).to("cuda")
+prompt = "A <cat-toy> backpack"
+image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
+image.save("cat-backpack.png")
+```
+## Training with Flax/JAX
+For faster training on TPUs and GPUs you can leverage the flax training example. Follow the instructions above to get the model and dataset before running the script.
+Before running the scripts, make sure to install the library's training dependencies:
+```bash
+pip install -U -r requirements_flax.txt
+```
+```bash
+export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
+export DATA_DIR="path-to-dir-containing-images"
+python textual_inversion_flax.py \
+  --pretrained_model_name_or_path=$MODEL_NAME \
+  --train_data_dir=$DATA_DIR \
+  --learnable_property="object" \
+  --placeholder_token="<cat-toy>" \
+  --initializer_token="toy" \
+  --resolution=512 \
+  --train_batch_size=1 \
+  --max_train_steps=3000 \
+  --learning_rate=5.0e-04 \
+  --scale_lr \
+  --output_dir="textual_inversion_cat"
+```
+It should be at least 70% faster than the PyTorch script with the same configuration.
+### Training with xformers:
+You can enable memory efficient attention by [installing xFormers](https://github.com/facebookresearch/xformers#installing-xformers) and padding the `--enable_xformers_memory_efficient_attention` argument to the script. This is not available with the Flax/JAX implementation.
--- a/diffusers-0.27.0/examples/textual_inversion/README_sdxl.md
+++ b/diffusers-0.27.0/examples/textual_inversion/README_sdxl.md
+## Textual Inversion fine-tuning example for SDXL
+```
+export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
+export DATA_DIR="./cat"
+accelerate launch textual_inversion_sdxl.py \
+  --pretrained_model_name_or_path=$MODEL_NAME \
+  --train_data_dir=$DATA_DIR \
+  --learnable_property="object" \
+  --placeholder_token="<cat-toy>" \
+  --initializer_token="toy" \
+  --mixed_precision="bf16" \
+  --resolution=768 \
+  --train_batch_size=1 \
+  --gradient_accumulation_steps=4 \
+  --max_train_steps=500 \
+  --learning_rate=5.0e-04 \
+  --scale_lr \
+  --lr_scheduler="constant" \
+  --lr_warmup_steps=0 \
+  --save_as_full_pipeline \
+  --output_dir="./textual_inversion_cat_sdxl"
+```
+For now, only training of the first text encoder is supported. 
\ No newline at end of file
--- a/diffusers-0.27.0/examples/textual_inversion/requirements.txt
+++ b/diffusers-0.27.0/examples/textual_inversion/requirements.txt
+accelerate>=0.16.0
+torchvision
+transformers>=4.25.1
+ftfy
+tensorboard
+Jinja2
--- a/diffusers-0.27.0/examples/textual_inversion/requirements_flax.txt
+++ b/diffusers-0.27.0/examples/textual_inversion/requirements_flax.txt
+transformers>=4.25.1
+flax
+optax
+torch
+torchvision
+ftfy
+tensorboard
+Jinja2
--- a/diffusers-0.27.0/examples/textual_inversion/test_textual_inversion.py
+++ b/diffusers-0.27.0/examples/textual_inversion/test_textual_inversion.py
+# coding=utf-8
+# Copyright 2024 HuggingFace Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import logging
+import os
+import sys
+import tempfile
+sys.path.append("..")
+from test_examples_utils import ExamplesTestsAccelerate, run_command  # noqa: E402
+logging.basicConfig(level=logging.DEBUG)
+logger = logging.getLogger()
+stream_handler = logging.StreamHandler(sys.stdout)
+logger.addHandler(stream_handler)
+class TextualInversion(ExamplesTestsAccelerate):
+    def test_textual_inversion(self):
+        with tempfile.TemporaryDirectory() as tmpdir:
+            test_args = f"""
+                examples/textual_inversion/textual_inversion.py
+                --pretrained_model_name_or_path hf-internal-testing/tiny-stable-diffusion-pipe
+                --train_data_dir docs/source/en/imgs
+                --learnable_property object
+                --placeholder_token <cat-toy>
+                --initializer_token a
+                --save_steps 1
+                --num_vectors 2
+                --resolution 64
+                --train_batch_size 1
+                --gradient_accumulation_steps 1
+                --max_train_steps 2
+                --learning_rate 5.0e-04
+                --scale_lr
+                --lr_scheduler constant
+                --lr_warmup_steps 0
+                --output_dir {tmpdir}
+                """.split()
+            run_command(self._launch_args + test_args)
+            # save_pretrained smoke test
+            self.assertTrue(os.path.isfile(os.path.join(tmpdir, "learned_embeds.safetensors")))
+    def test_textual_inversion_checkpointing(self):
+        with tempfile.TemporaryDirectory() as tmpdir:
+            test_args = f"""
+                examples/textual_inversion/textual_inversion.py
+                --pretrained_model_name_or_path hf-internal-testing/tiny-stable-diffusion-pipe
+                --train_data_dir docs/source/en/imgs
+                --learnable_property object
+                --placeholder_token <cat-toy>
+                --initializer_token a
+                --save_steps 1
+                --num_vectors 2
+                --resolution 64
+                --train_batch_size 1
+                --gradient_accumulation_steps 1
+                --max_train_steps 3
+                --learning_rate 5.0e-04
+                --scale_lr
+                --lr_scheduler constant
+                --lr_warmup_steps 0
+                --output_dir {tmpdir}
+                --checkpointing_steps=1
+                --checkpoints_total_limit=2
+                """.split()
+            run_command(self._launch_args + test_args)
+            # check checkpoint directories exist
+            self.assertEqual(
+                {x for x in os.listdir(tmpdir) if "checkpoint" in x},
+                {"checkpoint-2", "checkpoint-3"},
+            )
+    def test_textual_inversion_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints(self):
+        with tempfile.TemporaryDirectory() as tmpdir:
+            test_args = f"""
+                examples/textual_inversion/textual_inversion.py
+                --pretrained_model_name_or_path hf-internal-testing/tiny-stable-diffusion-pipe
+                --train_data_dir docs/source/en/imgs
+                --learnable_property object
+                --placeholder_token <cat-toy>
+                --initializer_token a
+                --save_steps 1
+                --num_vectors 2
+                --resolution 64
+                --train_batch_size 1
+                --gradient_accumulation_steps 1
+                --max_train_steps 2
+                --learning_rate 5.0e-04
+                --scale_lr
+                --lr_scheduler constant
+                --lr_warmup_steps 0
+                --output_dir {tmpdir}
+                --checkpointing_steps=1
+                """.split()
+            run_command(self._launch_args + test_args)
+            # check checkpoint directories exist
+            self.assertEqual(
+                {x for x in os.listdir(tmpdir) if "checkpoint" in x},
+                {"checkpoint-1", "checkpoint-2"},
+            )
+            resume_run_args = f"""
+                examples/textual_inversion/textual_inversion.py
+                --pretrained_model_name_or_path hf-internal-testing/tiny-stable-diffusion-pipe
+                --train_data_dir docs/source/en/imgs
+                --learnable_property object
+                --placeholder_token <cat-toy>
+                --initializer_token a
+                --save_steps 1
+                --num_vectors 2
+                --resolution 64
+                --train_batch_size 1
+                --gradient_accumulation_steps 1
+                --max_train_steps 2
+                --learning_rate 5.0e-04
+                --scale_lr
+                --lr_scheduler constant
+                --lr_warmup_steps 0
+                --output_dir {tmpdir}
+                --checkpointing_steps=1
+                --resume_from_checkpoint=checkpoint-2
+                --checkpoints_total_limit=2
+                """.split()
+            run_command(self._launch_args + resume_run_args)
+            # check checkpoint directories exist
+            self.assertEqual(
+                {x for x in os.listdir(tmpdir) if "checkpoint" in x},
+                {"checkpoint-2", "checkpoint-3"},
+            )
--- a/diffusers-0.27.0/examples/textual_inversion/test_textual_inversion_sdxl.py
+++ b/diffusers-0.27.0/examples/textual_inversion/test_textual_inversion_sdxl.py
+# coding=utf-8
+# Copyright 2024 HuggingFace Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import logging
+import os
+import sys
+import tempfile
+sys.path.append("..")
+from test_examples_utils import ExamplesTestsAccelerate, run_command  # noqa: E402
+logging.basicConfig(level=logging.DEBUG)
+logger = logging.getLogger()
+stream_handler = logging.StreamHandler(sys.stdout)
+logger.addHandler(stream_handler)
+class TextualInversionSdxl(ExamplesTestsAccelerate):
+    def test_textual_inversion_sdxl(self):
+        with tempfile.TemporaryDirectory() as tmpdir:
+            test_args = f"""
+                examples/textual_inversion/textual_inversion_sdxl.py
+                --pretrained_model_name_or_path hf-internal-testing/tiny-sdxl-pipe
+                --train_data_dir docs/source/en/imgs
+                --learnable_property object
+                --placeholder_token <cat-toy>
+                --initializer_token a
+                --save_steps 1
+                --num_vectors 2
+                --resolution 64
+                --train_batch_size 1
+                --gradient_accumulation_steps 1
+                --max_train_steps 2
+                --learning_rate 5.0e-04
+                --scale_lr
+                --lr_scheduler constant
+                --lr_warmup_steps 0
+                --output_dir {tmpdir}
+                """.split()
+            run_command(self._launch_args + test_args)
+            # save_pretrained smoke test
+            self.assertTrue(os.path.isfile(os.path.join(tmpdir, "learned_embeds.safetensors")))
+    def test_textual_inversion_sdxl_checkpointing(self):
+        with tempfile.TemporaryDirectory() as tmpdir:
+            test_args = f"""
+                examples/textual_inversion/textual_inversion_sdxl.py
+                --pretrained_model_name_or_path hf-internal-testing/tiny-sdxl-pipe
+                --train_data_dir docs/source/en/imgs
+                --learnable_property object
+                --placeholder_token <cat-toy>
+                --initializer_token a
+                --save_steps 1
+                --num_vectors 2
+                --resolution 64
+                --train_batch_size 1
+                --gradient_accumulation_steps 1
+                --max_train_steps 3
+                --learning_rate 5.0e-04
+                --scale_lr
+                --lr_scheduler constant
+                --lr_warmup_steps 0
+                --output_dir {tmpdir}
+                --checkpointing_steps=1
+                --checkpoints_total_limit=2
+                """.split()
+            run_command(self._launch_args + test_args)
+            # check checkpoint directories exist
+            self.assertEqual(
+                {x for x in os.listdir(tmpdir) if "checkpoint" in x},
+                {"checkpoint-2", "checkpoint-3"},
+            )
+    def test_textual_inversion_sdxl_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints(self):
+        with tempfile.TemporaryDirectory() as tmpdir:
+            test_args = f"""
+                examples/textual_inversion/textual_inversion_sdxl.py
+                --pretrained_model_name_or_path hf-internal-testing/tiny-sdxl-pipe
+                --train_data_dir docs/source/en/imgs
+                --learnable_property object
+                --placeholder_token <cat-toy>
+                --initializer_token a
+                --save_steps 1
+                --num_vectors 2
+                --resolution 64
+                --train_batch_size 1
+                --gradient_accumulation_steps 1
+                --max_train_steps 2
+                --learning_rate 5.0e-04
+                --scale_lr
+                --lr_scheduler constant
+                --lr_warmup_steps 0
+                --output_dir {tmpdir}
+                --checkpointing_steps=1
+                """.split()
+            run_command(self._launch_args + test_args)
+            # check checkpoint directories exist
+            self.assertEqual(
+                {x for x in os.listdir(tmpdir) if "checkpoint" in x},
+                {"checkpoint-1", "checkpoint-2"},
+            )
+            resume_run_args = f"""
+                examples/textual_inversion/textual_inversion_sdxl.py
+                --pretrained_model_name_or_path hf-internal-testing/tiny-sdxl-pipe
+                --train_data_dir docs/source/en/imgs
+                --learnable_property object
+                --placeholder_token <cat-toy>
+                --initializer_token a
+                --save_steps 1
+                --num_vectors 2
+                --resolution 64
+                --train_batch_size 1
+                --gradient_accumulation_steps 1
+                --max_train_steps 2
+                --learning_rate 5.0e-04
+                --scale_lr
+                --lr_scheduler constant
+                --lr_warmup_steps 0
+                --output_dir {tmpdir}
+                --checkpointing_steps=1
+                --resume_from_checkpoint=checkpoint-2
+                --checkpoints_total_limit=2
+                """.split()
+            run_command(self._launch_args + resume_run_args)
+            # check checkpoint directories exist
+            self.assertEqual(
+                {x for x in os.listdir(tmpdir) if "checkpoint" in x},
+                {"checkpoint-2", "checkpoint-3"},
+            )
--- a/diffusers-0.27.0/examples/textual_inversion/textual_inversion.py
+++ b/diffusers-0.27.0/examples/textual_inversion/textual_inversion.py
--- a/diffusers-0.27.0/examples/textual_inversion/textual_inversion_flax.py
+++ b/diffusers-0.27.0/examples/textual_inversion/textual_inversion_flax.py
--- a/diffusers-0.27.0/examples/textual_inversion/textual_inversion_sdxl.py
+++ b/diffusers-0.27.0/examples/textual_inversion/textual_inversion_sdxl.py
--- a/diffusers-0.27.0/examples/unconditional_image_generation/README.md
+++ b/diffusers-0.27.0/examples/unconditional_image_generation/README.md
+## Training an unconditional diffusion model
+Creating a training image set is [described in a different document](https://huggingface.co/docs/datasets/image_process#image-datasets).
+### Installing the dependencies
+Before running the scripts, make sure to install the library's training dependencies:
+**Important**
+To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment:
+```bash
+git clone https://github.com/huggingface/diffusers
+cd diffusers
+pip install .
+```
+Then cd in the example folder  and run
+```bash
+pip install -r requirements.txt
+```
+And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:
+```bash
+accelerate config
+```
+### Unconditional Flowers
+The command to train a DDPM UNet model on the Oxford Flowers dataset:
+```bash
+accelerate launch train_unconditional.py \
+  --dataset_name="huggan/flowers-102-categories" \
+  --resolution=64 --center_crop --random_flip \
+  --output_dir="ddpm-ema-flowers-64" \
+  --train_batch_size=16 \
+  --num_epochs=100 \
+  --gradient_accumulation_steps=1 \
+  --use_ema \
+  --learning_rate=1e-4 \
+  --lr_warmup_steps=500 \
+  --mixed_precision=no \
+  --push_to_hub
+```
+An example trained model: https://huggingface.co/anton-l/ddpm-ema-flowers-64
+A full training run takes 2 hours on 4xV100 GPUs.
+<img src="https://user-images.githubusercontent.com/26864830/180248660-a0b143d0-b89a-42c5-8656-2ebf6ece7e52.png" width="700" />
+### Unconditional Pokemon
+The command to train a DDPM UNet model on the Pokemon dataset:
+```bash
+accelerate launch train_unconditional.py \
+  --dataset_name="huggan/pokemon" \
+  --resolution=64 --center_crop --random_flip \
+  --output_dir="ddpm-ema-pokemon-64" \
+  --train_batch_size=16 \
+  --num_epochs=100 \
+  --gradient_accumulation_steps=1 \
+  --use_ema \
+  --learning_rate=1e-4 \
+  --lr_warmup_steps=500 \
+  --mixed_precision=no \
+  --push_to_hub
+```
+An example trained model: https://huggingface.co/anton-l/ddpm-ema-pokemon-64
+A full training run takes 2 hours on 4xV100 GPUs.
+<img src="https://user-images.githubusercontent.com/26864830/180248200-928953b4-db38-48db-b0c6-8b740fe6786f.png" width="700" />
+### Training with multiple GPUs
+`accelerate` allows for seamless multi-GPU training. Follow the instructions [here](https://huggingface.co/docs/accelerate/basic_tutorials/launch)
+for running distributed training with `accelerate`. Here is an example command:
+```bash
+accelerate launch --mixed_precision="fp16" --multi_gpu train_unconditional.py \
+  --dataset_name="huggan/pokemon" \
+  --resolution=64 --center_crop --random_flip \
+  --output_dir="ddpm-ema-pokemon-64" \
+  --train_batch_size=16 \
+  --num_epochs=100 \
+  --gradient_accumulation_steps=1 \
+  --use_ema \
+  --learning_rate=1e-4 \
+  --lr_warmup_steps=500 \
+  --mixed_precision="fp16" \
+  --logger="wandb"
+```
+To be able to use Weights and Biases (`wandb`) as a logger you need to install the library: `pip install wandb`.
+### Using your own data
+To use your own dataset, there are 2 ways:
+- you can either provide your own folder as `--train_data_dir`
+- or you can upload your dataset to the hub (possibly as a private repo, if you prefer so), and simply pass the `--dataset_name` argument.
+Below, we explain both in more detail.
+#### Provide the dataset as a folder
+If you provide your own folders with images, the script expects the following directory structure:
+```bash
+data_dir/xxx.png
+data_dir/xxy.png
+data_dir/[...]/xxz.png
+```
+In other words, the script will take care of gathering all images inside the folder. You can then run the script like this:
+```bash
+accelerate launch train_unconditional.py \
+    --train_data_dir <path-to-train-directory> \
+    <other-arguments>
+```
+Internally, the script will use the [`ImageFolder`](https://huggingface.co/docs/datasets/v2.0.0/en/image_process#imagefolder) feature which will automatically turn the folders into 🤗 Dataset objects.
+#### Upload your data to the hub, as a (possibly private) repo
+It's very easy (and convenient) to upload your image dataset to the hub using the [`ImageFolder`](https://huggingface.co/docs/datasets/v2.0.0/en/image_process#imagefolder) feature available in 🤗 Datasets. Simply do the following:
+```python
+from datasets import load_dataset
+# example 1: local folder
+dataset = load_dataset("imagefolder", data_dir="path_to_your_folder")
+# example 2: local files (supported formats are tar, gzip, zip, xz, rar, zstd)
+dataset = load_dataset("imagefolder", data_files="path_to_zip_file")
+# example 3: remote files (supported formats are tar, gzip, zip, xz, rar, zstd)
+dataset = load_dataset("imagefolder", data_files="https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip")
+# example 4: providing several splits
+dataset = load_dataset("imagefolder", data_files={"train": ["path/to/file1", "path/to/file2"], "test": ["path/to/file3", "path/to/file4"]})
+```
+`ImageFolder` will create an `image` column containing the PIL-encoded images.
+Next, push it to the hub!
+```python
+# assuming you have ran the huggingface-cli login command in a terminal
+dataset.push_to_hub("name_of_your_dataset")
+# if you want to push to a private repo, simply pass private=True:
+dataset.push_to_hub("name_of_your_dataset", private=True)
+```
+and that's it! You can now train your model by simply setting the `--dataset_name` argument to the name of your dataset on the hub.
+More on this can also be found in [this blog post](https://huggingface.co/blog/image-search-datasets).
--- a/diffusers-0.27.0/examples/unconditional_image_generation/requirements.txt
+++ b/diffusers-0.27.0/examples/unconditional_image_generation/requirements.txt
+accelerate>=0.16.0
+torchvision
+datasets
--- a/diffusers-0.27.0/examples/unconditional_image_generation/test_unconditional.py
+++ b/diffusers-0.27.0/examples/unconditional_image_generation/test_unconditional.py
+# coding=utf-8
+# Copyright 2024 HuggingFace Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import logging
+import os
+import sys
+import tempfile
+sys.path.append("..")
+from test_examples_utils import ExamplesTestsAccelerate, run_command  # noqa: E402
+logging.basicConfig(level=logging.DEBUG)
+logger = logging.getLogger()
+stream_handler = logging.StreamHandler(sys.stdout)
+logger.addHandler(stream_handler)
+class Unconditional(ExamplesTestsAccelerate):
+    def test_train_unconditional(self):
+        with tempfile.TemporaryDirectory() as tmpdir:
+            test_args = f"""
+                examples/unconditional_image_generation/train_unconditional.py
+                --dataset_name hf-internal-testing/dummy_image_class_data
+                --model_config_name_or_path diffusers/ddpm_dummy
+                --resolution 64
+                --output_dir {tmpdir}
+                --train_batch_size 2
+                --num_epochs 1
+                --gradient_accumulation_steps 1
+                --ddpm_num_inference_steps 2
+                --learning_rate 1e-3
+                --lr_warmup_steps 5
+                """.split()
+            run_command(self._launch_args + test_args, return_stdout=True)
+            # save_pretrained smoke test
+            self.assertTrue(os.path.isfile(os.path.join(tmpdir, "unet", "diffusion_pytorch_model.safetensors")))
+            self.assertTrue(os.path.isfile(os.path.join(tmpdir, "scheduler", "scheduler_config.json")))
+    def test_unconditional_checkpointing_checkpoints_total_limit(self):
+        with tempfile.TemporaryDirectory() as tmpdir:
+            initial_run_args = f"""
+                examples/unconditional_image_generation/train_unconditional.py
+                --dataset_name hf-internal-testing/dummy_image_class_data
+                --model_config_name_or_path diffusers/ddpm_dummy
+                --resolution 64
+                --output_dir {tmpdir}
+                --train_batch_size 1
+                --num_epochs 1
+                --gradient_accumulation_steps 1
+                --ddpm_num_inference_steps 2
+                --learning_rate 1e-3
+                --lr_warmup_steps 5
+                --checkpointing_steps=2
+                --checkpoints_total_limit=2
+                """.split()
+            run_command(self._launch_args + initial_run_args)
+            # check checkpoint directories exist
+            self.assertEqual(
+                {x for x in os.listdir(tmpdir) if "checkpoint" in x},
+                # checkpoint-2 should have been deleted
+                {"checkpoint-4", "checkpoint-6"},
+            )
+    def test_unconditional_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints(self):
+        with tempfile.TemporaryDirectory() as tmpdir:
+            initial_run_args = f"""
+                examples/unconditional_image_generation/train_unconditional.py
+                --dataset_name hf-internal-testing/dummy_image_class_data
+                --model_config_name_or_path diffusers/ddpm_dummy
+                --resolution 64
+                --output_dir {tmpdir}
+                --train_batch_size 1
+                --num_epochs 1
+                --gradient_accumulation_steps 1
+                --ddpm_num_inference_steps 1
+                --learning_rate 1e-3
+                --lr_warmup_steps 5
+                --checkpointing_steps=2
+                """.split()
+            run_command(self._launch_args + initial_run_args)
+            # check checkpoint directories exist
+            self.assertEqual(
+                {x for x in os.listdir(tmpdir) if "checkpoint" in x},
+                {"checkpoint-2", "checkpoint-4", "checkpoint-6"},
+            )
+            resume_run_args = f"""
+                examples/unconditional_image_generation/train_unconditional.py
+                --dataset_name hf-internal-testing/dummy_image_class_data
+                --model_config_name_or_path diffusers/ddpm_dummy
+                --resolution 64
+                --output_dir {tmpdir}
+                --train_batch_size 1
+                --num_epochs 2
+                --gradient_accumulation_steps 1
+                --ddpm_num_inference_steps 1
+                --learning_rate 1e-3
+                --lr_warmup_steps 5
+                --resume_from_checkpoint=checkpoint-6
+                --checkpointing_steps=2
+                --checkpoints_total_limit=2
+                """.split()
+            run_command(self._launch_args + resume_run_args)
+            # check checkpoint directories exist
+            self.assertEqual(
+                {x for x in os.listdir(tmpdir) if "checkpoint" in x},
+                {"checkpoint-10", "checkpoint-12"},
+            )
--- a/diffusers-0.27.0/examples/unconditional_image_generation/train_unconditional.py
+++ b/diffusers-0.27.0/examples/unconditional_image_generation/train_unconditional.py
--- a/diffusers-0.27.0/examples/wuerstchen/text_to_image/README.md
+++ b/diffusers-0.27.0/examples/wuerstchen/text_to_image/README.md
+# Würstchen text-to-image fine-tuning
+## Running locally with PyTorch
+Before running the scripts, make sure to install the library's training dependencies:
+**Important**
+To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date. To do this, execute the following steps in a new virtual environment:
+```bash
+git clone https://github.com/huggingface/diffusers
+cd diffusers
+pip install .
+```
+Then cd into the example folder and run
+```bash
+cd examples/wuerstchen/text_to_image
+pip install -r requirements.txt
+```
+And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:
+```bash
+accelerate config
+```
+For this example we want to directly store the trained LoRA embeddings on the Hub, so we need to be logged in and add the `--push_to_hub` flag to the training script. To log in, run:
+```bash
+huggingface-cli login
+```
+## Prior training
+You can fine-tune the Würstchen prior model with the `train_text_to_image_prior.py` script. Note that we currently support `--gradient_checkpointing` for prior model fine-tuning so you can use it for more GPU memory constrained setups.
+<br>
+<!-- accelerate_snippet_start -->
+```bash
+export DATASET_NAME="lambdalabs/pokemon-blip-captions"
+accelerate launch  train_text_to_image_prior.py \
+  --mixed_precision="fp16" \
+  --dataset_name=$DATASET_NAME \
+  --resolution=768 \
+  --train_batch_size=4 \
+  --gradient_accumulation_steps=4 \
+  --gradient_checkpointing \
+  --dataloader_num_workers=4 \
+  --max_train_steps=15000 \
+  --learning_rate=1e-05 \
+  --max_grad_norm=1 \
+  --checkpoints_total_limit=3 \
+  --lr_scheduler="constant" --lr_warmup_steps=0 \
+  --validation_prompts="A robot pokemon, 4k photo" \
+  --report_to="wandb" \
+  --push_to_hub \
+  --output_dir="wuerstchen-prior-pokemon-model"
+```
+<!-- accelerate_snippet_end -->
+## Training with LoRA
+Low-Rank Adaption of Large Language Models (or LoRA) was first introduced by Microsoft in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685) by *Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen*.
+In a nutshell, LoRA allows adapting pretrained models by adding pairs of rank-decomposition matrices to existing weights and **only** training those newly added weights. This has a couple of advantages:
+- Previous pretrained weights are kept frozen so that the model is not prone to [catastrophic forgetting](https://www.pnas.org/doi/10.1073/pnas.1611835114).
+- Rank-decomposition matrices have significantly fewer parameters than original model, which means that trained LoRA weights are easily portable.
+- LoRA attention layers allow to control to which extent the model is adapted toward new training images via a `scale` parameter.
+### Prior Training
+First, you need to set up your development environment as explained in the [installation](#Running-locally-with-PyTorch) section. Make sure to set the `DATASET_NAME` environment variable. Here, we will use the [Pokemon captions dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions).
+```bash
+export DATASET_NAME="lambdalabs/pokemon-blip-captions"
+accelerate launch train_text_to_image_lora_prior.py \
+  --mixed_precision="fp16" \
+  --dataset_name=$DATASET_NAME --caption_column="text" \
+  --resolution=768 \
+  --train_batch_size=8 \
+  --num_train_epochs=100 --checkpointing_steps=5000 \
+  --learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
+  --seed=42 \
+  --rank=4 \
+  --validation_prompt="cute dragon creature" \
+  --report_to="wandb" \
+  --push_to_hub \
+  --output_dir="wuerstchen-prior-pokemon-lora"
+```
--- a/diffusers-0.27.0/examples/wuerstchen/text_to_image/__init__.py
+++ b/diffusers-0.27.0/examples/wuerstchen/text_to_image/__init__.py
--- a/diffusers-0.27.0/examples/wuerstchen/text_to_image/modeling_efficient_net_encoder.py
+++ b/diffusers-0.27.0/examples/wuerstchen/text_to_image/modeling_efficient_net_encoder.py
+import torch.nn as nn
+from torchvision.models import efficientnet_v2_l, efficientnet_v2_s
+from diffusers.configuration_utils import ConfigMixin, register_to_config
+from diffusers.models.modeling_utils import ModelMixin
+class EfficientNetEncoder(ModelMixin, ConfigMixin):
+    @register_to_config
+    def __init__(self, c_latent=16, c_cond=1280, effnet="efficientnet_v2_s"):
+        super().__init__()
+        if effnet == "efficientnet_v2_s":
+            self.backbone = efficientnet_v2_s(weights="DEFAULT").features
+        else:
+            self.backbone = efficientnet_v2_l(weights="DEFAULT").features
+        self.mapper = nn.Sequential(
+            nn.Conv2d(c_cond, c_latent, kernel_size=1, bias=False),
+            nn.BatchNorm2d(c_latent),  # then normalize them to have mean 0 and std 1
+        )
+    def forward(self, x):
+        return self.mapper(self.backbone(x))