Commits · f12d161d6763cff0f45b0ec3b3f6072a2b7c7f9d · renzhc / diffusers_dcu

05 Dec, 2025 1 commit

Fix broken group offloading with block_level for models with standalone layers (#12692) · f12d161d

swappy authored Dec 05, 2025



* fix: group offloading to support standalone computational layers in block-level offloading

* test: for models with standalone and deeply nested layers in block-level offloading

* feat: support for block-level offloading in group offloading config

* fix: group offload block modules to AutoencoderKL and AutoencoderKLWan

* fix: update group offloading tests to use AutoencoderKL and adjust input dimensions

* refactor: streamline block offloading logic

* Apply style fixes

* update tests

* update

* fix for failing tests

* clean up

* revert to use skip_keys

* clean up

---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

f12d161d

03 Dec, 2025 1 commit

Fixes #12673. `record_stream` in group offloading is not working properly (#12721) · 3c05b9f7

Kimbing Ng authored Dec 03, 2025



* Fixes #12673.

    Wrong default_stream is used. leading to wrong execution order when record_steram is enabled.

* update

* Update test

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

3c05b9f7

25 Nov, 2025 1 commit

Add Support for Z-Image Series (#12703) · 4088e8a8

Jerry Wu authored Nov 25, 2025



* Add Support for Z-Image.

* Reformatting with make style, black & isort.

* Remove init, Modify import utils, Merge forward in transformers block, Remove once func in pipeline.

* modified main model forward, freqs_cis left

* refactored to add B dim

* fixed stack issue

* fixed modulation bug

* fixed modulation bug

* fix bug

* remove value_from_time_aware_config

* styling

* Fix neg embed and devide / bug; Reuse pad zero tensor; Turn cat -> repeat; Add hint for attn processor.

* Replace padding with pad_sequence; Add gradient checkpointing.

* Fix flash_attn3 in dispatch attn backend by _flash_attn_forward, replace its origin implement; Add DocString in pipeline for that.

* Fix Docstring and Make Style.

* Revert "Fix flash_attn3 in dispatch attn backend by _flash_attn_forward, replace its origin implement; Add DocString in pipeline for that."

This reverts commit fbf26b7ed11d55146103c97740bad4a5f91744e0.

* update z-image docstring

* Revert attention dispatcher

* update z-image docstring

* styling

* Recover attention_dispatch.py with its origin impl, later would special commit for fa3 compatibility.

* Fix prev bug, and support for prompt_embeds pass in args after prompt pre-encode as List of torch Tensor.

* Remove einop dependency.

* remove redundant imports & make fix-copies

* fix import

---------
Co-authored-by: liudongyang <liudongyang0114@gmail.com>

4088e8a8

07 Nov, 2025 1 commit

fix the crash in Wan-AI/Wan2.2-TI2V-5B-Diffusers if CP is enabled (#12562) · a9cb08af

Wang, Yi authored Nov 07, 2025



* fix the crash in Wan-AI/Wan2.2-TI2V-5B-Diffusers if CP is enabled
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* address review comment
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* refine
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

a9cb08af

24 Oct, 2025 1 commit

HunyuanImage21 (#12333) · a138d71e

YiYi Xu authored Oct 23, 2025



* add hunyuanimage2.1


---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

a138d71e

08 Oct, 2025 1 commit
- fix more torch.distributed imports (#12425) · 345864eb
  Sayak Paul authored Oct 08, 2025
```
* up

* unguard.
```
  345864eb
24 Sep, 2025 1 commit

Context Parallel w/ Ring & Ulysses & Unified Attention (#11941) · dcb6dd9b

Aryan authored Sep 24, 2025



* update

* update

* add coauthor
Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* improve test

* handle ip adapter params correctly

* fix chroma qkv fusion test

* fix fastercache implementation

* fix more tests

* fight more tests

* add back set_attention_backend

* update

* update

* make style

* make fix-copies

* make ip adapter processor compatible with attention dispatcher

* refactor chroma as well

* remove rmsnorm assert

* minify and deprecate npu/xla processors

* update

* refactor

* refactor; support flash attention 2 with cp

* fix

* support sage attention with cp

* make torch compile compatible

* update

* refactor

* update

* refactor

* refactor

* add ulysses backward

* try to make dreambooth script work; accelerator backward not playing well

* Revert "try to make dreambooth script work; accelerator backward not playing well"

This reverts commit 768d0ea6fa6a305d12df1feda2afae3ec80aa449.

* workaround compilation problems with triton when doing all-to-all

* support wan

* handle backward correctly

* support qwen

* support ltx

* make fix-copies

* Update src/diffusers/models/modeling_utils.py
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* apply review suggestions

* update docs

* add explanation

* make fix-copies

* add docstrings

* support passing parallel_config to from_pretrained

* apply review suggestions

* make style

* update

* Update docs/source/en/api/parallel.md
Co-authored-by: Aryan <aryan@huggingface.co>

* up

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>

dcb6dd9b

08 Sep, 2025 1 commit
- [Modular] Qwen (#12220) · f50b18ee
  YiYi Xu authored Sep 08, 2025
```
* add qwen modular
```
  f50b18ee
03 Sep, 2025 1 commit
- fix some typos (#12265) · 764b6247
  co63oc authored Sep 03, 2025
```
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
```
  764b6247
20 Aug, 2025 1 commit

Bria 3 2 pipeline (#12010) · 7993be9e

galbria authored Aug 20, 2025



* Add Bria model and pipeline to diffusers

- Introduced `BriaTransformer2DModel` and `BriaPipeline` for enhanced image generation capabilities.
- Updated import structures across various modules to include the new Bria components.
- Added utility functions and output classes specific to the Bria pipeline.
- Implemented tests for the Bria pipeline to ensure functionality and output integrity.

* with working tests

* style and quality pass

* adding docs

* add to overview

* fixes from "make fix-copies"

* Refactor transformer_bria.py and pipeline_bria.py: Introduce new EmbedND class for rotary position embedding, and enhance Timestep and TimestepProjEmbeddings classes. Add utility functions for handling negative prompts and generating original sigmas in pipeline_bria.py.

* remove redundent and duplicates tests and fix bf16
slow test

* style fixes

* small doc update

* Enhance Bria 3.2 documentation and implementation

- Updated the GitHub repository link for Bria 3.2.
- Added usage instructions for the gated model access.
- Introduced the BriaTransformerBlock and BriaAttention classes to the model architecture.
- Refactored existing classes to integrate Bria-specific components, including BriaEmbedND and BriaPipeline.
- Updated the pipeline output class to reflect Bria-specific functionality.
- Adjusted test cases to align with the new Bria model structure.

* Refactor Bria model components and update documentation

- Removed outdated inference example from Bria 3.2 documentation.
- Introduced the BriaTransformerBlock class to enhance model architecture.
- Updated attention handling to use `attention_kwargs` instead of `joint_attention_kwargs`.
- Improved import structure in the Bria pipeline to handle optional dependencies.
- Adjusted test cases to reflect changes in model dtype assertions.

* Update Bria model reference in documentation to reflect new file naming convention

* Update docs/source/en/_toctree.yml

* Refactor BriaPipeline to inherit from DiffusionPipeline instead of FluxPipeline, updating imports accordingly.

* move the __call__ func to the end of file

* Update BriaPipeline example to use bfloat16 for precision sensitivity for better result

* make style && make quality &&  make fix-copiessource

---------
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>

7993be9e

06 Aug, 2025 3 commits

Helper functions to return skip-layer compatible layers (#12048) · f19421e2
Aryan authored Aug 06, 2025
```
update
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
```
f19421e2

Fix group offloading synchronization bug for parameter-only GroupModule's (#12077) · 69cdc257

Aryan authored Aug 06, 2025



* update

* update

* refactor

* fuck yeah

* make style

* Update src/diffusers/hooks/group_offloading.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update src/diffusers/hooks/group_offloading.py

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

69cdc257

[refactor] condense group offloading (#11990) · cfd6ec74
Aryan authored Aug 06, 2025
```
* update

* update

* refactor

* add test

* address review comment

* nit
```
cfd6ec74

03 Aug, 2025 1 commit

Qwen-Image (#12055) · 8e53cd95

naykun authored Aug 04, 2025



* (feat): qwen-image integration

* fix(qwen-image):
- remove unused logics related to controlnet/ip-adapter

* fix(qwen-image):
- compatible with attention dispatcher
- cond cache support

* fix(qwen-image):
- cond cache registry
- attention backend argument
- fix copies

* fix(qwen-image):
- remove local test

* Update src/diffusers/models/transformers/transformer_qwenimage.py

---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>

8e53cd95

29 Jul, 2025 2 commits

[modular] add Modular flux for text-to-image (#11995) · 203dc520
Sayak Paul authored Jul 29, 2025
```
* start flux.

* more

* up

* up

* up

* up

* get back the deleted files.

* up

* empathy
```
203dc520

[refactor] some shared parts between hooks + docs (#11968) · 6f3ac305

Aryan authored Jul 29, 2025

* update

* try test fix

* add missing link

* fix tests

* Update src/diffusers/hooks/first_block_cache.py

* make style

6f3ac305

25 Jul, 2025 1 commit
- [compile] logger statements create unnecessary guards during dynamo tracing (#11987) · 3d2f8ae9
  Aryan authored Jul 26, 2025
```
* update

* update
```
  3d2f8ae9
23 Jul, 2025 1 commit
- [modular diffusers] Wan (#11913) · f36ba9f0
  Aryan authored Jul 23, 2025
```
* update
```
  f36ba9f0
17 Jul, 2025 1 commit

[refactor] Flux/Chroma single file implementation + Attention Dispatcher (#11916) · 18c8f10f

Aryan authored Jul 17, 2025



* update

* update

* add coauthor
Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* improve test

* handle ip adapter params correctly

* fix chroma qkv fusion test

* fix fastercache implementation

* fix more tests

* fight more tests

* add back set_attention_backend

* update

* update

* make style

* make fix-copies

* make ip adapter processor compatible with attention dispatcher

* refactor chroma as well

* remove rmsnorm assert

* minify and deprecate npu/xla processors

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

18c8f10f

10 Jul, 2025 1 commit

The Modular Diffusers (#9672) · f33b89ba

YiYi Xu authored Jul 09, 2025



adding modular diffusers as experimental feature 

---------
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

f33b89ba

09 Jul, 2025 1 commit

Fix unique memory address when doing group-offloading with disk (#11767) · 2d3d376b

Sayak Paul authored Jul 09, 2025



* fix memory address problem

* add more tests

* updates

* updates

* update

* _group_id = group_id

* update

* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* update

* update

* update

* fix

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

2d3d376b

08 Jul, 2025 1 commit

First Block Cache (#11180) · 0454fbb3

Aryan authored Jul 09, 2025



* update

* modify flux single blocks to make compatible with cache techniques (without too much model-specific intrusion code)

* remove debug logs

* update

* cache context for different batches of data

* fix hs residual bug for single return outputs; support ltx

* fix controlnet flux

* support flux, ltx i2v, ltx condition

* update

* update

* Update docs/source/en/api/cache.md

* Update src/diffusers/hooks/hooks.py
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* address review comments pt. 1

* address review comments pt. 2

* cache context refacotr; address review pt. 3

* address review comments

* metadata registration with decorators instead of centralized

* support cogvideox

* support mochi

* fix

* remove unused function

* remove central registry based on review

* update

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

0454fbb3

27 Jun, 2025 1 commit

Support dynamically loading/unloading loras with group offloading (#11804) · 76ec3d1f

Aryan authored Jun 27, 2025

* update

* add test

* address review comments

* update

* fixes

* change decorator order to fix tests

* try fix

* fight tests

76ec3d1f

26 Jun, 2025 1 commit

Follow up for Group Offload to Disk (#11760) · 3649d7b9

Dhruv Nair authored Jun 26, 2025



* update

* update

* update

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

3649d7b9

24 Jun, 2025 1 commit
- [chore] raise as early as possible in group offloading (#11792) · 7392c8ff
  Sayak Paul authored Jun 24, 2025
```
* raise as early as possible in group offloading

* remove check from ModuleGroup
```
  7392c8ff
19 Jun, 2025 2 commits

make group offloading work with disk/nvme transfers (#11682) · 85a916bb

Sayak Paul authored Jun 19, 2025

* start implementing disk offloading in group.

* delete diff file.

* updates.patch

* offload_to_disk_path

* check if safetensors already exist.

* add test and clarify.

* updates

* update todos.

* update more docs.

* update docs

85a916bb

Update more licenses to 2025 (#11746) · a4df8dbc
Aryan authored Jun 19, 2025
```
update
```
a4df8dbc

30 May, 2025 1 commit

Fix typos in strings and comments (#11476) · 8183d0f1

co63oc authored May 30, 2025



* Fix typos in strings and comments
Signed-off-by: co63oc <co63oc@users.noreply.github.com>

* Update src/diffusers/hooks/hooks.py
Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* Update src/diffusers/hooks/hooks.py
Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* Update layerwise_casting.py

* Apply style fixes

* update

---------
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

8183d0f1

27 May, 2025 1 commit
- Make group offloading compatible with torch.compile() (#11605) · 5f5d02fb
  Sayak Paul authored May 26, 2025
```
wip: check if we can make go compile compat
```
  5f5d02fb
01 May, 2025 1 commit

Fix typos in docs and comments (#11416) · 86294d3c

co63oc authored May 01, 2025



* Fix typos in docs and comments

* Apply style fixes

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

86294d3c

30 Apr, 2025 2 commits

make autoencoders. controlnet_flux and wan_transformer3d_single_file pass on xpu (#11461) · 06beecaf

Yao Matrix authored May 01, 2025



* make autoencoders. controlnet_flux and wan_transformer3d_single_file
pass on XPU
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* Apply style fixes

---------
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>

06beecaf

Raise warning instead of error for block offloading with streams (#11425) · 8fe5a14d
Aryan authored Apr 30, 2025
```
raise warning instead of error
```
8fe5a14d

23 Apr, 2025 1 commit
- Fix group offloading with block_level and use_stream=True (#11375) · 6cef71de
  Aryan authored Apr 23, 2025
```
* fix

* add tests

* add message check
```
  6cef71de
08 Apr, 2025 1 commit

[feat] implement `record_stream` when using CUDA streams during group offloading (#11081) · 4b27c4a4

Sayak Paul authored Apr 08, 2025



* implement record_stream for better performance.

* fix

* style.

* merge #11097

* Update src/diffusers/hooks/group_offloading.py
Co-authored-by: Aryan <aryan@huggingface.co>

* fixes

* docstring.

* remaining todos in low_cpu_mem_usage

* tests

* updates to docs.

---------
Co-authored-by: Aryan <aryan@huggingface.co>

4b27c4a4

24 Mar, 2025 1 commit

Improve information about group offloading and layerwise casting (#11101) · 1ddf3f3a

Aryan authored Mar 24, 2025



* update

* Update docs/source/en/optimization/memory.md

* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* apply review suggestions

* update

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

1ddf3f3a

21 Mar, 2025 1 commit

[core] FasterCache (#10163) · 844221ae

Aryan authored Mar 21, 2025



* init

* update

* update

* update

* make style

* update

* fix

* make it work with guidance distilled models

* update

* make fix-copies

* add tests

* update

* apply_faster_cache -> apply_fastercache

* fix

* reorder

* update

* refactor

* update docs

* add fastercache to CacheMixin

* update tests

* Apply suggestions from code review

* make style

* try to fix partial import error

* Apply style fixes

* raise warning

* update

---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

844221ae

20 Mar, 2025 1 commit
- Provide option to reduce CPU RAM usage in Group Offload (#11106) · 2c1ed50f
  Dhruv Nair authored Mar 20, 2025
```
* update

* update

* clean up
```
  2c1ed50f
18 Mar, 2025 2 commits
- Fix Group offloading behaviour when using streams (#11097) · 3be67060
  Aryan authored Mar 18, 2025
```
* update

* update
```
  3be67060
- Group offloading improvements (#11094) · 813d42cc
  Aryan authored Mar 18, 2025
```
update
```
  813d42cc
14 Feb, 2025 1 commit

Module Group Offloading (#10503) · 9a147b82

Aryan authored Feb 14, 2025



* update

* fix

* non_blocking; handle parameters and buffers

* update

* Group offloading with cuda stream prefetching (#10516)

* cuda stream prefetch

* remove breakpoints

* update

* copy model hook implementation from pab

* update; ~very workaround based implementation but it seems to work as expected; needs cleanup and rewrite

* more workarounds to make it actually work

* cleanup

* rewrite

* update

* make sure to sync current stream before overwriting with pinned params

not doing so will lead to erroneous computations on the GPU and cause bad results

* better check

* update

* remove hook implementation to not deal with merge conflict

* re-add hook changes

* why use more memory when less memory do trick

* why still use slightly more memory when less memory do trick

* optimise

* add model tests

* add pipeline tests

* update docs

* add layernorm and groupnorm

* address review comments

* improve tests; add docs

* improve docs

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply suggestions from code review

* update tests

* apply suggestions from review

* enable_group_offloading -> enable_group_offload for naming consistency

* raise errors if multiple offloading strategies used; add relevant tests

* handle .to() when group offload applied

* refactor some repeated code

* remove unintentional change from merge conflict

* handle .cuda()

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

9a147b82