Commits · aa541b9fabdbdd5ac690fd820c706c68e129816e · renzhc / diffusers_dcu

14 Apr, 2025 1 commit

make KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference pass on XPU (#11308) · aa541b9f

Yao Matrix authored Apr 14, 2025



loose expected_max_diff from 5e-1 to 8e-1 to make
KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference
pass on XPU
Signed-off-by: Matrix Yao <matrix.yao@intel.com>

aa541b9f

13 Apr, 2025 4 commits

[ControlNet] Adds controlnet for SanaTransformer (#11040) · f1f38ffb

Ishan Modi authored Apr 13, 2025



* added controlnet for sana transformer

* improve code quality

* addressed PR comments

* bug fixes

* added test cases

* update

* added dummy objects

* addressed PR comments

* update

* Forcing update

* add to docs

* code quality

* addressed PR comments

* addressed PR comments

* update

* addressed PR comments

* added proper styling

* update

* Revert "added proper styling"

This reverts commit 344ee8a7014ada095b295034ef84341f03b0e359.

* manually ordered

* Apply suggestions from code review

---------
Co-authored-by: Aryan <contact.aryanvs@gmail.com>

f1f38ffb

Fix incorrect tile_latent_min_width calculations (#11305) · 36538e11
Tuna Tuncer authored Apr 13, 2025

36538e11

Hidream refactoring follow ups (#11299) · 97e0ef4d

Aryan authored Apr 13, 2025



* HiDream Image

* update

* -einops

* py3.8

* fix -einops

* mixins, offload_seq, option_components

* docs

* Apply style fixes

* trigger tests

* Apply suggestions from code review
Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* joint_attention_kwargs -> attention_kwargs, fixes

* fast tests

* -_init_weights

* style tests

* move reshape logic

* update slice 😴

* supports_dduf

* 🤷🏻

‍♂️

* Update src/diffusers/models/transformers/transformer_hidream_image.py
Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* address review comments

* update tests

* doc updates

* update

* Update src/diffusers/models/transformers/transformer_hidream_image.py

* Apply style fixes

---------
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

97e0ef4d

Update autoencoderkl_allegro.md (#11303) · ed41db85
Adrien B authored Apr 13, 2025
```
Correction typo
```
ed41db85

12 Apr, 2025 1 commit

flow matching lcm scheduler (#11170) · ec0b2b39

Nikita Starodubcev authored Apr 13, 2025



* add flow matching lcm scheduler
* stochastic sampling
* upscaling for scale-wise generation

* Apply style fixes

* Apply suggestions from code review
Co-authored-by: hlky <hlky@hlky.ac>

---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>

ec0b2b39

11 Apr, 2025 4 commits
- HiDream Image (#11231) · 0ef29355
  hlky authored Apr 11, 2025
```
* HiDream Image


---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
```
  0ef29355
- Fix incorrect tile_latent_min_width calculation in AutoencoderKLMochi (#11294) · bc261058
  Tuna Tuncer authored Apr 11, 2025
  
  bc261058
- do not use `DIFFUSERS_REQUEST_TIMEOUT` for notification bot (#11273) · 7054a349
  Sayak Paul authored Apr 11, 2025
```
fix to a constant
```
  7054a349
- [CI] relax tolerance for unclip further (#11268) · 511d7381
  Sayak Paul authored Apr 11, 2025
```
relax tolerance for unclip further.
```
  511d7381
10 Apr, 2025 9 commits

[Tests] Cleanup lora tests utils (#11276) · ea5a6a8b
Sayak Paul authored Apr 10, 2025
```
* start cleaning up lora test utils for reusability

* update

* updates

* updates
```
ea5a6a8b
Fix LTX 0.9.5 single file (#11271) · b8093e66
hlky authored Apr 10, 2025

b8093e66

[BUG] Fix convert_vae_pt_to_diffusers bug (#11078) · e121d0ef

Yuqian Hong authored Apr 10, 2025



* fix attention

* Apply style fixes

---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

e121d0ef

make test_instant_style_multiple_masks pass on XPU (#11266) · 31c4f24f
Yao Matrix authored Apr 10, 2025
```
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
```
31c4f24f
add onnxruntime-qnn & onnxruntime-cann (#11269) · 0efdf411
xieofxie authored Apr 10, 2025
```
Co-authored-by: hualxie <hualxie@microsoft.com>
```
0efdf411
make test_dict_tuple_outputs_equivalent pass on XPU (#11265) · 450dc48a
Yao Matrix authored Apr 10, 2025
```
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
```
450dc48a
make test_stable_diffusion_inpaint_fp16 pass on XPU (#11264) · 77b4f66b
Yao Matrix authored Apr 10, 2025
```
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
```
77b4f66b

fix test_vanilla_funetuning failure on XPU and A100 (#11263) · 68663f8a

Yao Matrix authored Apr 10, 2025



* fix test_vanilla_funetuning failure on XPU and A100
Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* change back to 5e-2
Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>

68663f8a

[LoRA] support musubi wan loras. (#11243) · ffda8735

Sayak Paul authored Apr 10, 2025



* support musubi wan loras.

* Update src/diffusers/loaders/lora_conversion_utils.py
Co-authored-by: hlky <hlky@hlky.ac>

* support i2v loras from musubi too.

---------
Co-authored-by: hlky <hlky@hlky.ac>

ffda8735

09 Apr, 2025 13 commits

fix wan ftfy import (#11262) · 0706786e
YiYi Xu authored Apr 09, 2025

0706786e
fix consisid imports (#11254) · 5b27f8ab
Sayak Paul authored Apr 09, 2025
```
* fix consisid imports

* fix opencv import

* fix
```
5b27f8ab
fix timeout constant (#11252) · d1387ece
Sayak Paul authored Apr 09, 2025
```
* fix timeout constant

* style

* fix
```
d1387ece

fix flux controlnet bug (#11152) · 6a7c2d0a

Ilya Drobyshevskiy authored Apr 09, 2025

Before this if txt_ids was 3d tensor, line with txt_ids[:1] concat txt_ids by batch dim. Now we first check that txt_ids is 2d tensor (or take first batch element) and then concat by token dim

6a7c2d0a

Update Ruff to latest Version (#10919) · edc154da
Dhruv Nair authored Apr 09, 2025
```
* update

* update

* update

* update
```
edc154da
[docs] AutoModel (#11250) · 552cd320
hlky authored Apr 09, 2025
```
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
```
552cd320

fix FluxReduxSlowTests::test_flux_redux_inference case failure on XPU (#11245) · c36c745c

Yao Matrix authored Apr 09, 2025



* loose test_float16_inference's tolerance from 5e-2 to 6e-2, so XPU can
pass UT
Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix test_pipeline_flux_redux fail on XPU
Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>

c36c745c

AutoModel (#11115) · 437cb36c

hlky authored Apr 09, 2025



* AutoModel

* ...

* lol

* ...

* add test

* update

* make fix-copies

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

437cb36c

AudioLDM2 Fixes (#11244) · 9ee3dd38
hlky authored Apr 09, 2025

9ee3dd38
fix: SD3 ControlNet validation so that it runs on a A100. (#11238) · fd02aad4
Sayak Paul authored Apr 09, 2025
```
* fix: SD3 ControlNet validation so that it runs on a A100.

* use backend-agnostic cache and pass devide.
```
fd02aad4

[LoRA] support more comyui loras for Flux

🚨

(#10985) · 6bfacf04

Sayak Paul authored Apr 09, 2025

* support more comyui loras.

* fix

* fixes

* revert changes in LoRA base.

* no position_embedding

* 🚨

 introduce a breaking change to let peft handle module ambiguity

* styling

* remove position embeddings.

* improvements.

* style

* make info instead of NotImplementedError

* Update src/diffusers/loaders/peft.py
Co-authored-by: hlky <hlky@hlky.ac>

* add example.

* robust checks

* updates

---------
Co-authored-by: hlky <hlky@hlky.ac>

6bfacf04

[docs] minor updates to dtype map docs. (#11237) · f685981e
Sayak Paul authored Apr 09, 2025
```
minor updates to dtype map docs.
```
f685981e
minor update to sana sprint docs. (#11236) · b924251d
Sayak Paul authored Apr 09, 2025

b924251d

08 Apr, 2025 8 commits

[bistandbytes] improve replacement warnings for bnb (#11132) · 1a048124
Sayak Paul authored Apr 08, 2025
```
* improve replacement warnings for bnb

* updates to docs.
```
1a048124

[feat] implement `record_stream` when using CUDA streams during group offloading (#11081) · 4b27c4a4

Sayak Paul authored Apr 08, 2025



* implement record_stream for better performance.

* fix

* style.

* merge #11097

* Update src/diffusers/hooks/group_offloading.py
Co-authored-by: Aryan <aryan@huggingface.co>

* fixes

* docstring.

* remaining todos in low_cpu_mem_usage

* tests

* updates to docs.

---------
Co-authored-by: Aryan <aryan@huggingface.co>

4b27c4a4

Flux quantized with lora (#10990) · 5d49b3e8

hlky authored Apr 08, 2025



* Flux quantized with lora

* fix

* changes

* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Apply style fixes

* enable model cpu offload()

* Update src/diffusers/loaders/lora_pipeline.py
Co-authored-by: hlky <hlky@hlky.ac>

* update

* Apply suggestions from code review

* update

* add peft as an additional dependency for gguf

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

5d49b3e8

[Flux LoRA] fix issues in flux lora scripts (#11111) · 71f34fc5

Linoy Tsaban authored Apr 08, 2025



* remove custom scheduler

* update requirements.txt

* log_validation with mixed precision

* add intermediate embeddings saving when checkpointing is enabled

* remove comment

* fix validation

* add unwrap_model for accelerator, torch.no_grad context for validation, fix accelerator.accumulate call in advanced script

* revert unwrap_model change temp

* add .module to address distributed training bug + replace accelerator.unwrap_model with unwrap model

* changes to align advanced script with canonical script

* make changes for distributed training + unify unwrap_model calls in advanced script

* add module.dtype fix to dreambooth script

* unify unwrap_model calls in dreambooth script

* fix condition in validation run

* mixed precision

* Update examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* smol style change

* change autocast

* Apply style fixes

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

71f34fc5

introduce compute arch specific expectations and fix test_sd3_img2img_inference failure (#11227) · c51b6bd8

Yao Matrix authored Apr 08, 2025



* add arch specfic expectations support, to support different arch's numerical characteristics
Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix typo
Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Apply suggestions from code review

* Apply style fixes

* Update src/diffusers/utils/testing_utils.py

---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

c51b6bd8

[LoRA] Implement hot-swapping of LoRA (#9453) · fb544996

Benjamin Bossan authored Apr 08, 2025

* [WIP][LoRA] Implement hot-swapping of LoRA

This PR adds the possibility to hot-swap LoRA adapters. It is WIP.

Description

As of now, users can already load multiple LoRA adapters. They can
offload existing adapters or they can unload them (i.e. delete them).
However, they cannot "hotswap" adapters yet, i.e. substitute the weights
from one LoRA adapter with the weights of another, without the need to
create a separate LoRA adapter.

Generally, hot-swapping may not appear not super useful but when the
model is compiled, it is necessary to prevent recompilation. See #9279
for more context.

Caveats

To hot-swap a LoRA adapter for another, these two adapters should target
exactly the same layers and the "hyper-parameters" of the two adapters
should be identical. For instance, the LoRA alpha has to be the same:
Given that we keep the alpha from the first adapter, the LoRA scaling
would be incorrect for the second adapter otherwise.

Theoretically, we could override the scaling dict with the alpha values
derived from the second adapter's config, but changing the dict will
trigger a guard for recompilation, defeating the main purpose of the
feature.

I also found that compilation flags can have an impact on whether this
works or not. E.g. when passing "reduce-overhead", there will be errors
of the type:

> input name: arg861_1. data pointer changed from 139647332027392 to
139647331054592

I don't know enough about compilation to determine whether this is
problematic or not.

Current state

This is obviously WIP right now to collect feedback and discuss which
direction to take this. If this PR turns out to be useful, the
hot-swapping functions will be added to PEFT itself and can be imported
here (or there is a separate copy in diffusers to avoid the need for a
min PEFT version to use this feature).

Moreover, more tests need to be added to better cover this feature,
although we don't necessarily need tests for the hot-swapping
functionality itself, since those tests will be added to PEFT.

Furthermore, as of now, this is only implemented for the unet. Other
pipeline components have yet to implement this feature.

Finally, it should be properly documented.

I would like to collect feedback on the current state of the PR before
putting more time into finalizing it.

* Reviewer feedback

* Reviewer feedback, adjust test

* Fix, doc

* Make fix

* Fix for possible g++ error

* Add test for recompilation w/o hotswapping

* Make hotswap work

Requires https://github.com/huggingface/peft/pull/2366

More changes to make hotswapping work. Together with the mentioned PEFT
PR, the tests pass for me locally.

List of changes:

- docstring for hotswap
- remove code copied from PEFT, import from PEFT now
- adjustments to PeftAdapterMixin.load_lora_adapter (unfortunately, some
  state dict renaming was necessary, LMK if there is a better solution)
- adjustments to UNet2DConditionLoadersMixin._process_lora: LMK if this
  is even necessary or not, I'm unsure what the overall relationship is
  between this and PeftAdapterMixin.load_lora_adapter
- also in UNet2DConditionLoadersMixin._process_lora, I saw that there is
  no LoRA unloading when loading the adapter fails, so I added it
  there (in line with what happens in PeftAdapterMixin.load_lora_adapter)
- rewritten tests to avoid shelling out, make the test more precise by
  making sure that the outputs align, parametrize it
- also checked the pipeline code mentioned in this comment:
  https://github.com/huggingface/diffusers/pull/9453#issuecomment-2418508871;


  when running this inside the with
  torch._dynamo.config.patch(error_on_recompile=True) context, there is
  no error, so I think hotswapping is now working with pipelines.

* Address reviewer feedback:

- Revert deprecated method
- Fix PEFT doc link to main
- Don't use private function
- Clarify magic numbers
- Add pipeline test

Moreover:
- Extend docstrings
- Extend existing test for outputs != 0
- Extend existing test for wrong adapter name

* Change order of test decorators

parameterized.expand seems to ignore skip decorators if added in last
place (i.e. innermost decorator).

* Split model and pipeline tests

Also increase test coverage by also targeting conv2d layers (support of
which was added recently on the PEFT PR).

* Reviewer feedback: Move decorator to test classes

... instead of having them on each test method.

* Apply suggestions from code review
Co-authored-by: hlky <hlky@hlky.ac>

* Reviewer feedback: version check, TODO comment

* Add enable_lora_hotswap method

* Reviewer feedback: check _lora_loadable_modules

* Revert changes in unet.py

* Add possibility to ignore enabled at wrong time

* Fix docstrings

* Log possible PEFT error, test

* Raise helpful error if hotswap not supported

I.e. for the text encoder

* Formatting

* More linter

* More ruff

* Doc-builder complaint

* Update docstring:

- mention no text encoder support yet
- make it clear that LoRA is meant
- mention that same adapter name should be passed

* Fix error in docstring

* Update more methods with hotswap argument

- SDXL
- SD3
- Flux

No changes were made to load_lora_into_transformer.

* Add hotswap argument to load_lora_into_transformer

For SD3 and Flux. Use shorter docstring for brevity.

* Extend docstrings

* Add version guards to tests

* Formatting

* Fix LoRA loading call to add prefix=None

See:
https://github.com/huggingface/diffusers/pull/10187#issuecomment-2717571064



* Run make fix-copies

* Add hot swap documentation to the docs

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

fb544996

[Training] Better image interpolation in training scripts (#11206) · 723dbdd3

Álvaro Somoza authored Apr 08, 2025



* initial

* Update examples/dreambooth/train_dreambooth_lora_sdxl.py
Co-authored-by: hlky <hlky@hlky.ac>

* update

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>

723dbdd3

[train_controlnet.py] Fix the LR schedulers when num_train_epochs is passed in... · fbf61f46

Bhavay Malhotra authored Apr 08, 2025


[train_controlnet.py] Fix the LR schedulers when num_train_epochs is passed in a distributed training env (#8461)

* Create diffusers.yml

* fix num_train_epochs

* Delete diffusers.yml

* Fixed Changes

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>

fbf61f46