Commits · f12d161d6763cff0f45b0ec3b3f6072a2b7c7f9d · renzhc / diffusers_dcu

"test/vscode:/vscode.git/clone" did not exist on "93c3025f935b9dae47ba8b8f78a8003eb3bb1ffd"

05 Dec, 2025 1 commit

Fix broken group offloading with block_level for models with standalone layers (#12692) · f12d161d

swappy authored Dec 05, 2025



* fix: group offloading to support standalone computational layers in block-level offloading

* test: for models with standalone and deeply nested layers in block-level offloading

* feat: support for block-level offloading in group offloading config

* fix: group offload block modules to AutoencoderKL and AutoencoderKLWan

* fix: update group offloading tests to use AutoencoderKL and adjust input dimensions

* refactor: streamline block offloading logic

* Apply style fixes

* update tests

* update

* fix for failing tests

* clean up

* revert to use skip_keys

* clean up

---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

f12d161d

03 Dec, 2025 2 commits

[Z-Image] various small changes, Z-Image transformer tests, etc. (#12741) · a1f36ee3

Sayak Paul authored Dec 03, 2025



* start zimage model tests.

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* Revert "up"

This reverts commit bca3e27c96b942db49ccab8ddf824e7a54d43ed1.

* expand upon compilation failure reason.

* Update tests/models/transformers/test_models_transformer_z_image.py
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* reinitialize the padding tokens to ones to prevent NaN problems.

* updates

* up

* skipping ZImage DiT tests

* up

* up

---------
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

a1f36ee3

Fixes #12673. `record_stream` in group offloading is not working properly (#12721) · 3c05b9f7

Kimbing Ng authored Dec 03, 2025



* Fixes #12673.

    Wrong default_stream is used. leading to wrong execution order when record_steram is enabled.

* update

* Update test

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

3c05b9f7

17 Oct, 2025 1 commit
- [tests] introduce `VAETesterMixin` to consolidate tests for slicing and tiling (#12374) · af769881
  Sayak Paul authored Oct 17, 2025
```
* up

* up

* up

* up

* up

* u[

* up

* up

* up
```
  af769881
02 Oct, 2025 1 commit

FIX Test to ignore warning for enable_lora_hotswap (#12421) · 7242b5ff

Benjamin Bossan authored Oct 02, 2025

I noticed that the test should be for the option check_compiled="ignore"
but it was using check_compiled="warn". This has been fixed, now the
correct argument is passed.

However, the fact that the test passed means that it was incorrect to
begin with. The way that logs are collected does not collect the
logger.warning call here (not sure why). To amend this, I'm now using
assertNoLogs. With this change, the test correctly fails when the wrong
argument is passed.

7242b5ff

30 Sep, 2025 1 commit

Install latest prerelease from huggingface_hub when installing transformers from main (#12395) · b5965454

Lucain authored Sep 30, 2025

* Allow prerelease when installing transformers from main

* maybe better

* maybe better

* and now?

* just bored

* should be better

* works now

b5965454

29 Sep, 2025 1 commit
- Don't skip Qwen model tests for group offloading with disk (#12382) · 19085ac8
  Sayak Paul authored Sep 29, 2025
```
u[
```
  19085ac8
25 Sep, 2025 1 commit
- Support both huggingface_hub `v0.x` and `v1.x` (#12389) · ec5449f3
  Lucain authored Sep 25, 2025
```
* Support huggingface_hub 0.x and 1.x

* httpx
```
  ec5449f3
03 Sep, 2025 1 commit
- [tests] feat: add AoT compilation tests (#12203) · ffc8c0c1
  Sayak Paul authored Sep 03, 2025
```
* feat: add a test for aot.

* up
```
  ffc8c0c1
28 Aug, 2025 1 commit

[Refactor] Move testing utils out of src (#12238) · 7aa6af11

Dhruv Nair authored Aug 28, 2025

* update

* update

* update

* update

* update

* merge main

* Revert "merge main"

This reverts commit 65efbcead58644b31596ed2d714f7cee0e0238d3.

7aa6af11

14 Aug, 2025 1 commit

[core] respect `local_files_only=True` when using sharded checkpoints (#12005) · 1b48db4c

Sayak Paul authored Aug 14, 2025



* tighten compilation tests for quantization

* feat: model_info but local.

* up

* Revert "tighten compilation tests for quantization"

This reverts commit 8d431dc967a4118168af74aae9c41f2a68764851.

* up

* reviewer feedback.

* reviewer feedback.

* up

* up

* empty

* update

---------
Co-authored-by: DN6 <dhruv.nair@gmail.com>

1b48db4c

13 Aug, 2025 1 commit

[core] parallel loading of shards (#12028) · baa9b582

Sayak Paul authored Aug 13, 2025



* checking.

* checking

* checking

* up

* up

* up

* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* up

* up

* fix

* review feedback.

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

baa9b582

11 Aug, 2025 1 commit

enable compilation in qwen image. (#12061) · 4a9dbd56

Sayak Paul authored Aug 11, 2025



* update

* update

* update

* enable compilation in qwen image.

* add tests

---------
Co-authored-by: Aryan <aryan@huggingface.co>

4a9dbd56

29 Jul, 2025 1 commit

[refactor] some shared parts between hooks + docs (#11968) · 6f3ac305

Aryan authored Jul 29, 2025

* update

* try test fix

* add missing link

* fix tests

* Update src/diffusers/hooks/first_block_cache.py

* make style

6f3ac305

23 Jul, 2025 1 commit
- [tests] enforce torch version in the compilation tests. (#11979) · 1c50a5f7
  Sayak Paul authored Jul 23, 2025
```
enforce torch version in the compilation tests.
```
  1c50a5f7
22 Jul, 2025 1 commit

fix "Expected all tensors to be on the same device, but found at least two devices" error (#11690) · 14725164

Yao Matrix authored Jul 22, 2025



* xx

* fix
Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Update model_loading_utils.py

* Update test_models_unet_2d_condition.py

* Update test_models_unet_2d_condition.py

* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix comments
Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* Update unet_2d_blocks.py

* update
Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

14725164

09 Jul, 2025 1 commit

Fix unique memory address when doing group-offloading with disk (#11767) · 2d3d376b

Sayak Paul authored Jul 09, 2025



* fix memory address problem

* add more tests

* updates

* updates

* update

* _group_id = group_id

* update

* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* update

* update

* update

* fix

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

2d3d376b

01 Jul, 2025 1 commit
- [tests] add test for hotswapping + compilation on resolution changes (#11825) · 87f83d3d
  Sayak Paul authored Jul 01, 2025
```
* add resolution changes tests to hotswapping test suite.

* fixes

* docs

* explain duck shapes

* fix
```
  87f83d3d
26 Jun, 2025 2 commits

[tests] add a test on torch compile for varied resolutions (#11776) · a185e1ab

Sayak Paul authored Jun 26, 2025



* add test for checking compile on different shapes.

* update

* update

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

a185e1ab

[rfc][compile] compile method for DiffusionPipeline (#11705) · d93381cd

Animesh Jain authored Jun 25, 2025



* [rfc][compile] compile method for DiffusionPipeline

* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Apply style fixes

* Update docs/source/en/optimization/fp16.md

* check

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

d93381cd

25 Jun, 2025 1 commit
- [tests] skip instead of returning. (#11793) · 80f27d7e
  Sayak Paul authored Jun 25, 2025
```
skip instead of returning.
```
  80f27d7e
24 Jun, 2025 1 commit
- [tests] Fix group offloading and layerwise casting test interaction (#11796) · 5df02fc1
  Aryan authored Jun 24, 2025
```
* update

* update

* update
```
  5df02fc1
23 Jun, 2025 1 commit
- [tests] properly skip tests instead of `return` (#11771) · fbddf028
  Sayak Paul authored Jun 23, 2025
```
model test updates
```
  fbddf028
19 Jun, 2025 1 commit

make group offloading work with disk/nvme transfers (#11682) · 85a916bb

Sayak Paul authored Jun 19, 2025

* start implementing disk offloading in group.

* delete diff file.

* updates.patch

* offload_to_disk_path

* check if safetensors already exist.

* add test and clarify.

* updates

* update todos.

* update more docs.

* update docs

85a916bb

18 Jun, 2025 2 commits

[chore] change to 2025 licensing for remaining (#11741) · 62cce304
Sayak Paul authored Jun 18, 2025
```
change to 2025 licensing for remaining
```
62cce304

[tests] device_map tests for all models. (#11708) · 05e86778

Sayak Paul authored Jun 18, 2025



* device_map tests for all models.

* updates

* Update tests/models/test_modeling_common.py
Co-authored-by: Aryan <aryan@huggingface.co>

* fix device_map in test

---------
Co-authored-by: Aryan <aryan@huggingface.co>

05e86778

13 Jun, 2025 1 commit

[LoRA] parse metadata from LoRA and save metadata (#11324) · 368958df

Sayak Paul authored Jun 13, 2025



* feat: parse metadata from lora state dicts.

* tests

* fix tests

* key renaming

* fix

* smol update

* smol updates

* load metadata.

* automatically save metadata in save_lora_adapter.

* propagate changes.

* changes

* add test to models too.

* tigher tests.

* updates

* fixes

* rename tests.

* sorted.

* Update src/diffusers/loaders/lora_base.py
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* review suggestions.

* removeprefix.

* propagate changes.

* fix-copies

* sd

* docs.

* fixes

* get review ready.

* one more test to catch error.

* change to a different approach.

* fix-copies.

* todo

* sd3

* update

* revert changes in get_peft_kwargs.

* update

* fixes

* fixes

* simplify _load_sft_state_dict_metadata

* update

* style fix

* uipdate

* update

* update

* empty commit

* _pack_dict_with_prefix

* update

* TODO 1.

* todo: 2.

* todo: 3.

* update

* update

* Apply suggestions from code review
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* reraise.

* move argument.

---------
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>

368958df

06 Jun, 2025 1 commit
- [tests] add test for torch.compile + group offloading (#11670) · 16c955c5
  Sayak Paul authored Jun 06, 2025
```
* add a test for group offloading + compilation.

* tests
```
  16c955c5
02 Jun, 2025 1 commit
- [tests] chore: rename lora model-level tests. (#11481) · 20273e55
  Sayak Paul authored Jun 02, 2025
```
chore: rename lora model-level tests.
```
  20273e55
26 May, 2025 1 commit

[tests] Changes to the `torch.compile()` CI and tests (#11508) · 4af76d0d

Sayak Paul authored May 26, 2025

* remove compile cuda docker.

* replace compile cuda docker path.

* better manage compilation cache.

* propagate similar to the pipeline tests.

* remove unneeded compile test.

* small.

* don't check for deleted files.

4af76d0d

15 May, 2025 1 commit
- [tests] add tests for combining layerwise upcasting and groupoffloading. (#11558) · 20379d9d
  Sayak Paul authored May 15, 2025
```
* add tests for combining layerwise upcasting and groupoffloading.

* feedback
```
  20379d9d
09 May, 2025 1 commit
- [Tests] Enable more general testing for `torch.compile()` with LoRA hotswapping (#11322) · 7acf8345
  Sayak Paul authored May 09, 2025
```
* refactor hotswap tester.

* fix seeds..

* add to nightly ci.

* move comment.

* move to nightly
```
  7acf8345
28 Apr, 2025 3 commits

[tests] fix import. (#11434) · 0e3f2713
Sayak Paul authored Apr 28, 2025
```
fix import.
```
0e3f2713

enable test_layerwise_casting_memory cases on XPU (#11406) · a7e9f85e

Yao Matrix authored Apr 28, 2025



* enable test_layerwise_casting_memory cases on XPU
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* fix style
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

---------
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

a7e9f85e

[tests] add tests to check for graph breaks, recompilation, cuda syncs in... · aa5f5d41

Sayak Paul authored Apr 28, 2025

[tests] add tests to check for graph breaks, recompilation, cuda syncs in pipelines during torch.compile() (#11085)

* test for better torch.compile stuff.

* fixes

* recompilation and graph break.

* clear compilation cache.

* change to modeling level test.

* allow running compilation tests during nightlies.

aa5f5d41

09 Apr, 2025 2 commits

Update Ruff to latest Version (#10919) · edc154da
Dhruv Nair authored Apr 09, 2025
```
* update

* update

* update

* update
```
edc154da

AutoModel (#11115) · 437cb36c

hlky authored Apr 09, 2025



* AutoModel

* ...

* lol

* ...

* add test

* update

* make fix-copies

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

437cb36c

08 Apr, 2025 2 commits

[feat] implement `record_stream` when using CUDA streams during group offloading (#11081) · 4b27c4a4

Sayak Paul authored Apr 08, 2025



* implement record_stream for better performance.

* fix

* style.

* merge #11097

* Update src/diffusers/hooks/group_offloading.py
Co-authored-by: Aryan <aryan@huggingface.co>

* fixes

* docstring.

* remaining todos in low_cpu_mem_usage

* tests

* updates to docs.

---------
Co-authored-by: Aryan <aryan@huggingface.co>

4b27c4a4

[LoRA] Implement hot-swapping of LoRA (#9453) · fb544996

Benjamin Bossan authored Apr 08, 2025

* [WIP][LoRA] Implement hot-swapping of LoRA

This PR adds the possibility to hot-swap LoRA adapters. It is WIP.

Description

As of now, users can already load multiple LoRA adapters. They can
offload existing adapters or they can unload them (i.e. delete them).
However, they cannot "hotswap" adapters yet, i.e. substitute the weights
from one LoRA adapter with the weights of another, without the need to
create a separate LoRA adapter.

Generally, hot-swapping may not appear not super useful but when the
model is compiled, it is necessary to prevent recompilation. See #9279
for more context.

Caveats

To hot-swap a LoRA adapter for another, these two adapters should target
exactly the same layers and the "hyper-parameters" of the two adapters
should be identical. For instance, the LoRA alpha has to be the same:
Given that we keep the alpha from the first adapter, the LoRA scaling
would be incorrect for the second adapter otherwise.

Theoretically, we could override the scaling dict with the alpha values
derived from the second adapter's config, but changing the dict will
trigger a guard for recompilation, defeating the main purpose of the
feature.

I also found that compilation flags can have an impact on whether this
works or not. E.g. when passing "reduce-overhead", there will be errors
of the type:

> input name: arg861_1. data pointer changed from 139647332027392 to
139647331054592

I don't know enough about compilation to determine whether this is
problematic or not.

Current state

This is obviously WIP right now to collect feedback and discuss which
direction to take this. If this PR turns out to be useful, the
hot-swapping functions will be added to PEFT itself and can be imported
here (or there is a separate copy in diffusers to avoid the need for a
min PEFT version to use this feature).

Moreover, more tests need to be added to better cover this feature,
although we don't necessarily need tests for the hot-swapping
functionality itself, since those tests will be added to PEFT.

Furthermore, as of now, this is only implemented for the unet. Other
pipeline components have yet to implement this feature.

Finally, it should be properly documented.

I would like to collect feedback on the current state of the PR before
putting more time into finalizing it.

* Reviewer feedback

* Reviewer feedback, adjust test

* Fix, doc

* Make fix

* Fix for possible g++ error

* Add test for recompilation w/o hotswapping

* Make hotswap work

Requires https://github.com/huggingface/peft/pull/2366

More changes to make hotswapping work. Together with the mentioned PEFT
PR, the tests pass for me locally.

List of changes:

- docstring for hotswap
- remove code copied from PEFT, import from PEFT now
- adjustments to PeftAdapterMixin.load_lora_adapter (unfortunately, some
  state dict renaming was necessary, LMK if there is a better solution)
- adjustments to UNet2DConditionLoadersMixin._process_lora: LMK if this
  is even necessary or not, I'm unsure what the overall relationship is
  between this and PeftAdapterMixin.load_lora_adapter
- also in UNet2DConditionLoadersMixin._process_lora, I saw that there is
  no LoRA unloading when loading the adapter fails, so I added it
  there (in line with what happens in PeftAdapterMixin.load_lora_adapter)
- rewritten tests to avoid shelling out, make the test more precise by
  making sure that the outputs align, parametrize it
- also checked the pipeline code mentioned in this comment:
  https://github.com/huggingface/diffusers/pull/9453#issuecomment-2418508871;


  when running this inside the with
  torch._dynamo.config.patch(error_on_recompile=True) context, there is
  no error, so I think hotswapping is now working with pipelines.

* Address reviewer feedback:

- Revert deprecated method
- Fix PEFT doc link to main
- Don't use private function
- Clarify magic numbers
- Add pipeline test

Moreover:
- Extend docstrings
- Extend existing test for outputs != 0
- Extend existing test for wrong adapter name

* Change order of test decorators

parameterized.expand seems to ignore skip decorators if added in last
place (i.e. innermost decorator).

* Split model and pipeline tests

Also increase test coverage by also targeting conv2d layers (support of
which was added recently on the PEFT PR).

* Reviewer feedback: Move decorator to test classes

... instead of having them on each test method.

* Apply suggestions from code review
Co-authored-by: hlky <hlky@hlky.ac>

* Reviewer feedback: version check, TODO comment

* Add enable_lora_hotswap method

* Reviewer feedback: check _lora_loadable_modules

* Revert changes in unet.py

* Add possibility to ignore enabled at wrong time

* Fix docstrings

* Log possible PEFT error, test

* Raise helpful error if hotswap not supported

I.e. for the text encoder

* Formatting

* More linter

* More ruff

* Doc-builder complaint

* Update docstring:

- mention no text encoder support yet
- make it clear that LoRA is meant
- mention that same adapter name should be passed

* Fix error in docstring

* Update more methods with hotswap argument

- SDXL
- SD3
- Flux

No changes were made to load_lora_into_transformer.

* Add hotswap argument to load_lora_into_transformer

For SD3 and Flux. Use shorter docstring for brevity.

* Extend docstrings

* Add version guards to tests

* Formatting

* Fix LoRA loading call to add prefix=None

See:
https://github.com/huggingface/diffusers/pull/10187#issuecomment-2717571064



* Run make fix-copies

* Add hot swap documentation to the docs

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

fb544996

20 Mar, 2025 1 commit

[tests] make cuda only tests device-agnostic (#11058) · 15ad97f7

Fanli Lin authored Mar 20, 2025

* enable bnb on xpu

* add 2 more cases

* add missing change

* add missing change

* add one more

* enable cuda only tests on xpu

* enable big gpu cases

15ad97f7