Commits · cfd6ec7465514f75b13696c514132b27c325591a · renzhc / diffusers_dcu

06 Aug, 2025 1 commit
- [refactor] condense group offloading (#11990) · cfd6ec74
  Aryan authored Aug 06, 2025
```
* update

* update

* refactor

* add test

* address review comment

* nit
```
  cfd6ec74
03 Aug, 2025 1 commit

Qwen-Image (#12055) · 8e53cd95

naykun authored Aug 04, 2025



* (feat): qwen-image integration

* fix(qwen-image):
- remove unused logics related to controlnet/ip-adapter

* fix(qwen-image):
- compatible with attention dispatcher
- cond cache support

* fix(qwen-image):
- cond cache registry
- attention backend argument
- fix copies

* fix(qwen-image):
- remove local test

* Update src/diffusers/models/transformers/transformer_qwenimage.py

---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>

8e53cd95

29 Jul, 2025 2 commits

[modular] add Modular flux for text-to-image (#11995) · 203dc520
Sayak Paul authored Jul 29, 2025
```
* start flux.

* more

* up

* up

* up

* up

* get back the deleted files.

* up

* empathy
```
203dc520

[refactor] some shared parts between hooks + docs (#11968) · 6f3ac305

Aryan authored Jul 29, 2025

* update

* try test fix

* add missing link

* fix tests

* Update src/diffusers/hooks/first_block_cache.py

* make style

6f3ac305

25 Jul, 2025 1 commit
- [compile] logger statements create unnecessary guards during dynamo tracing (#11987) · 3d2f8ae9
  Aryan authored Jul 26, 2025
```
* update

* update
```
  3d2f8ae9
23 Jul, 2025 1 commit
- [modular diffusers] Wan (#11913) · f36ba9f0
  Aryan authored Jul 23, 2025
```
* update
```
  f36ba9f0
17 Jul, 2025 1 commit

[refactor] Flux/Chroma single file implementation + Attention Dispatcher (#11916) · 18c8f10f

Aryan authored Jul 17, 2025



* update

* update

* add coauthor
Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* improve test

* handle ip adapter params correctly

* fix chroma qkv fusion test

* fix fastercache implementation

* fix more tests

* fight more tests

* add back set_attention_backend

* update

* update

* make style

* make fix-copies

* make ip adapter processor compatible with attention dispatcher

* refactor chroma as well

* remove rmsnorm assert

* minify and deprecate npu/xla processors

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

18c8f10f

10 Jul, 2025 1 commit

The Modular Diffusers (#9672) · f33b89ba

YiYi Xu authored Jul 09, 2025



adding modular diffusers as experimental feature 

---------
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

f33b89ba

09 Jul, 2025 1 commit

Fix unique memory address when doing group-offloading with disk (#11767) · 2d3d376b

Sayak Paul authored Jul 09, 2025



* fix memory address problem

* add more tests

* updates

* updates

* update

* _group_id = group_id

* update

* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* update

* update

* update

* fix

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

2d3d376b

08 Jul, 2025 1 commit

First Block Cache (#11180) · 0454fbb3

Aryan authored Jul 09, 2025



* update

* modify flux single blocks to make compatible with cache techniques (without too much model-specific intrusion code)

* remove debug logs

* update

* cache context for different batches of data

* fix hs residual bug for single return outputs; support ltx

* fix controlnet flux

* support flux, ltx i2v, ltx condition

* update

* update

* Update docs/source/en/api/cache.md

* Update src/diffusers/hooks/hooks.py
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* address review comments pt. 1

* address review comments pt. 2

* cache context refacotr; address review pt. 3

* address review comments

* metadata registration with decorators instead of centralized

* support cogvideox

* support mochi

* fix

* remove unused function

* remove central registry based on review

* update

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

0454fbb3

27 Jun, 2025 1 commit

Support dynamically loading/unloading loras with group offloading (#11804) · 76ec3d1f

Aryan authored Jun 27, 2025

* update

* add test

* address review comments

* update

* fixes

* change decorator order to fix tests

* try fix

* fight tests

76ec3d1f

26 Jun, 2025 1 commit

Follow up for Group Offload to Disk (#11760) · 3649d7b9

Dhruv Nair authored Jun 26, 2025



* update

* update

* update

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

3649d7b9

24 Jun, 2025 1 commit
- [chore] raise as early as possible in group offloading (#11792) · 7392c8ff
  Sayak Paul authored Jun 24, 2025
```
* raise as early as possible in group offloading

* remove check from ModuleGroup
```
  7392c8ff
19 Jun, 2025 2 commits

make group offloading work with disk/nvme transfers (#11682) · 85a916bb

Sayak Paul authored Jun 19, 2025

* start implementing disk offloading in group.

* delete diff file.

* updates.patch

* offload_to_disk_path

* check if safetensors already exist.

* add test and clarify.

* updates

* update todos.

* update more docs.

* update docs

85a916bb

Update more licenses to 2025 (#11746) · a4df8dbc
Aryan authored Jun 19, 2025
```
update
```
a4df8dbc

30 May, 2025 1 commit

Fix typos in strings and comments (#11476) · 8183d0f1

co63oc authored May 30, 2025



* Fix typos in strings and comments
Signed-off-by: co63oc <co63oc@users.noreply.github.com>

* Update src/diffusers/hooks/hooks.py
Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* Update src/diffusers/hooks/hooks.py
Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* Update layerwise_casting.py

* Apply style fixes

* update

---------
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

8183d0f1

27 May, 2025 1 commit
- Make group offloading compatible with torch.compile() (#11605) · 5f5d02fb
  Sayak Paul authored May 26, 2025
```
wip: check if we can make go compile compat
```
  5f5d02fb
01 May, 2025 1 commit

Fix typos in docs and comments (#11416) · 86294d3c

co63oc authored May 01, 2025



* Fix typos in docs and comments

* Apply style fixes

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

86294d3c

30 Apr, 2025 2 commits

make autoencoders. controlnet_flux and wan_transformer3d_single_file pass on xpu (#11461) · 06beecaf

Yao Matrix authored May 01, 2025



* make autoencoders. controlnet_flux and wan_transformer3d_single_file
pass on XPU
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* Apply style fixes

---------
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>

06beecaf

Raise warning instead of error for block offloading with streams (#11425) · 8fe5a14d
Aryan authored Apr 30, 2025
```
raise warning instead of error
```
8fe5a14d

23 Apr, 2025 1 commit
- Fix group offloading with block_level and use_stream=True (#11375) · 6cef71de
  Aryan authored Apr 23, 2025
```
* fix

* add tests

* add message check
```
  6cef71de
08 Apr, 2025 1 commit

[feat] implement `record_stream` when using CUDA streams during group offloading (#11081) · 4b27c4a4

Sayak Paul authored Apr 08, 2025



* implement record_stream for better performance.

* fix

* style.

* merge #11097

* Update src/diffusers/hooks/group_offloading.py
Co-authored-by: Aryan <aryan@huggingface.co>

* fixes

* docstring.

* remaining todos in low_cpu_mem_usage

* tests

* updates to docs.

---------
Co-authored-by: Aryan <aryan@huggingface.co>

4b27c4a4

24 Mar, 2025 1 commit

Improve information about group offloading and layerwise casting (#11101) · 1ddf3f3a

Aryan authored Mar 24, 2025



* update

* Update docs/source/en/optimization/memory.md

* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* apply review suggestions

* update

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

1ddf3f3a

21 Mar, 2025 1 commit

[core] FasterCache (#10163) · 844221ae

Aryan authored Mar 21, 2025



* init

* update

* update

* update

* make style

* update

* fix

* make it work with guidance distilled models

* update

* make fix-copies

* add tests

* update

* apply_faster_cache -> apply_fastercache

* fix

* reorder

* update

* refactor

* update docs

* add fastercache to CacheMixin

* update tests

* Apply suggestions from code review

* make style

* try to fix partial import error

* Apply style fixes

* raise warning

* update

---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

844221ae

20 Mar, 2025 1 commit
- Provide option to reduce CPU RAM usage in Group Offload (#11106) · 2c1ed50f
  Dhruv Nair authored Mar 20, 2025
```
* update

* update

* clean up
```
  2c1ed50f
18 Mar, 2025 2 commits
- Fix Group offloading behaviour when using streams (#11097) · 3be67060
  Aryan authored Mar 18, 2025
```
* update

* update
```
  3be67060
- Group offloading improvements (#11094) · 813d42cc
  Aryan authored Mar 18, 2025
```
update
```
  813d42cc
14 Feb, 2025 1 commit

Module Group Offloading (#10503) · 9a147b82

Aryan authored Feb 14, 2025



* update

* fix

* non_blocking; handle parameters and buffers

* update

* Group offloading with cuda stream prefetching (#10516)

* cuda stream prefetch

* remove breakpoints

* update

* copy model hook implementation from pab

* update; ~very workaround based implementation but it seems to work as expected; needs cleanup and rewrite

* more workarounds to make it actually work

* cleanup

* rewrite

* update

* make sure to sync current stream before overwriting with pinned params

not doing so will lead to erroneous computations on the GPU and cause bad results

* better check

* update

* remove hook implementation to not deal with merge conflict

* re-add hook changes

* why use more memory when less memory do trick

* why still use slightly more memory when less memory do trick

* optimise

* add model tests

* add pipeline tests

* update docs

* add layernorm and groupnorm

* address review comments

* improve tests; add docs

* improve docs

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply suggestions from code review

* update tests

* apply suggestions from review

* enable_group_offloading -> enable_group_offload for naming consistency

* raise errors if multiple offloading strategies used; add relevant tests

* handle .to() when group offload applied

* refactor some repeated code

* remove unintentional change from merge conflict

* handle .cuda()

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

9a147b82

13 Feb, 2025 1 commit

Disable PEFT input autocast when using fp8 layerwise casting (#10685) · a0c22997

Aryan authored Feb 13, 2025

* disable peft input autocast

* use new peft method name; only disable peft input autocast if submodule layerwise casting active

* add test; reference PeftInputAutocastDisableHook in peft docs

* add load_lora_weights test

* casted -> cast

* Update tests/lora/utils.py

a0c22997

27 Jan, 2025 1 commit

[core] Pyramid Attention Broadcast (#9562) · 658e24e8

Aryan authored Jan 28, 2025



* start pyramid attention broadcast

* add coauthor
Co-Authored-By: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>

* update

* make style

* update

* make style

* add docs

* add tests

* update

* Update docs/source/en/api/pipelines/cogvideox.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/pipelines/cogvideox.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Pyramid Attention Broadcast rewrite + introduce hooks (#9826)

* rewrite implementation with hooks

* make style

* update

* merge pyramid-attention-rewrite-2

* make style

* remove changes from latte transformer

* revert docs changes

* better debug message

* add todos for future

* update tests

* make style

* cleanup

* fix

* improve log message; fix latte test

* refactor

* update

* update

* update

* revert changes to tests

* update docs

* update tests

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update

* fix flux test

* reorder

* refactor

* make fix-copies

* update docs

* fixes

* more fixes

* make style

* update tests

* update code example

* make fix-copies

* refactor based on reviews

* use maybe_free_model_hooks

* CacheMixin

* make style

* update

* add current_timestep property; update docs

* make fix-copies

* update

* improve tests

* try circular import fix

* apply suggestions from review

* address review comments

* Apply suggestions from code review

* refactor hook implementation

* add test suite for hooks

* PAB Refactor (#10667)

* update

* update

* update

---------
Co-authored-by: DN6 <dhruv.nair@gmail.com>

* update

* fix remove hook behaviour

---------
Co-authored-by: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: DN6 <dhruv.nair@gmail.com>

658e24e8

22 Jan, 2025 1 commit

[core] Layerwise Upcasting (#10347) · beacaa55

Aryan authored Jan 22, 2025



* update

* update

* make style

* remove dynamo disable

* add coauthor
Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* update

* update

* update

* update mixin

* add some basic tests

* update

* update

* non_blocking

* improvements

* update

* norm.* -> norm

* apply suggestions from review

* add example

* update hook implementation to the latest changes from pyramid attention broadcast

* deinitialize should raise an error

* update doc page

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update docs

* update

* refactor

* fix _always_upcast_modules for asym ae and vq_model

* fix lumina embedding forward to not depend on weight dtype

* refactor tests

* add simple lora inference tests

* _always_upcast_modules -> _precision_sensitive_module_patterns

* remove todo comments about review; revert changes to self.dtype in unets because .dtype on ModelMixin should be able to handle fp8 weight case

* check layer dtypes in lora test

* fix UNet1DModelTests::test_layerwise_upcasting_inference

* _precision_sensitive_module_patterns -> _skip_layerwise_casting_patterns based on feedback

* skip test in NCSNppModelTests

* skip tests for AutoencoderTinyTests

* skip tests for AutoencoderOobleckTests

* skip tests for UNet1DModelTests - unsupported pytorch operations

* layerwise_upcasting -> layerwise_casting

* skip tests for UNetRLModelTests; needs next pytorch release for currently unimplemented operation support

* add layerwise fp8 pipeline test

* use xfail

* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* add assertion with fp32 comparison; add tolerance to fp8-fp32 vs fp32-fp32 comparison (required for a few models' test to pass)

* add note about memory consumption on tesla CI runner for failing test

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

beacaa55