Commits · 3c05b9f71c82e4cdaef579cb13f363b6c1d7964d · renzhc / diffusers_dcu

"git@developer.sourcefind.cn:modelzoo/paddleocr_migraphx.git" did not exist on "218c2664fdf6cea8a5ba9b65bf69a118490f1537"

03 Dec, 2025 1 commit

Fixes #12673. `record_stream` in group offloading is not working properly (#12721) · 3c05b9f7

Kimbing Ng authored Dec 03, 2025



* Fixes #12673.

    Wrong default_stream is used. leading to wrong execution order when record_steram is enabled.

* update

* Update test

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

3c05b9f7

06 Aug, 2025 2 commits

Fix group offloading synchronization bug for parameter-only GroupModule's (#12077) · 69cdc257

Aryan authored Aug 06, 2025



* update

* update

* refactor

* fuck yeah

* make style

* Update src/diffusers/hooks/group_offloading.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update src/diffusers/hooks/group_offloading.py

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

69cdc257

[refactor] condense group offloading (#11990) · cfd6ec74
Aryan authored Aug 06, 2025
```
* update

* update

* refactor

* add test

* address review comment

* nit
```
cfd6ec74

29 Jul, 2025 1 commit

[refactor] some shared parts between hooks + docs (#11968) · 6f3ac305

Aryan authored Jul 29, 2025

* update

* try test fix

* add missing link

* fix tests

* Update src/diffusers/hooks/first_block_cache.py

* make style

6f3ac305

25 Jul, 2025 1 commit
- [compile] logger statements create unnecessary guards during dynamo tracing (#11987) · 3d2f8ae9
  Aryan authored Jul 26, 2025
```
* update

* update
```
  3d2f8ae9
09 Jul, 2025 1 commit

Fix unique memory address when doing group-offloading with disk (#11767) · 2d3d376b

Sayak Paul authored Jul 09, 2025



* fix memory address problem

* add more tests

* updates

* updates

* update

* _group_id = group_id

* update

* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* update

* update

* update

* fix

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

2d3d376b

27 Jun, 2025 1 commit

Support dynamically loading/unloading loras with group offloading (#11804) · 76ec3d1f

Aryan authored Jun 27, 2025

* update

* add test

* address review comments

* update

* fixes

* change decorator order to fix tests

* try fix

* fight tests

76ec3d1f

26 Jun, 2025 1 commit

Follow up for Group Offload to Disk (#11760) · 3649d7b9

Dhruv Nair authored Jun 26, 2025



* update

* update

* update

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

3649d7b9

24 Jun, 2025 1 commit
- [chore] raise as early as possible in group offloading (#11792) · 7392c8ff
  Sayak Paul authored Jun 24, 2025
```
* raise as early as possible in group offloading

* remove check from ModuleGroup
```
  7392c8ff
19 Jun, 2025 2 commits

make group offloading work with disk/nvme transfers (#11682) · 85a916bb

Sayak Paul authored Jun 19, 2025

* start implementing disk offloading in group.

* delete diff file.

* updates.patch

* offload_to_disk_path

* check if safetensors already exist.

* add test and clarify.

* updates

* update todos.

* update more docs.

* update docs

85a916bb

Update more licenses to 2025 (#11746) · a4df8dbc
Aryan authored Jun 19, 2025
```
update
```
a4df8dbc

27 May, 2025 1 commit
- Make group offloading compatible with torch.compile() (#11605) · 5f5d02fb
  Sayak Paul authored May 26, 2025
```
wip: check if we can make go compile compat
```
  5f5d02fb
01 May, 2025 1 commit

Fix typos in docs and comments (#11416) · 86294d3c

co63oc authored May 01, 2025



* Fix typos in docs and comments

* Apply style fixes

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

86294d3c

30 Apr, 2025 2 commits

make autoencoders. controlnet_flux and wan_transformer3d_single_file pass on xpu (#11461) · 06beecaf

Yao Matrix authored May 01, 2025



* make autoencoders. controlnet_flux and wan_transformer3d_single_file
pass on XPU
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* Apply style fixes

---------
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>

06beecaf

Raise warning instead of error for block offloading with streams (#11425) · 8fe5a14d
Aryan authored Apr 30, 2025
```
raise warning instead of error
```
8fe5a14d

23 Apr, 2025 1 commit
- Fix group offloading with block_level and use_stream=True (#11375) · 6cef71de
  Aryan authored Apr 23, 2025
```
* fix

* add tests

* add message check
```
  6cef71de
08 Apr, 2025 1 commit

[feat] implement `record_stream` when using CUDA streams during group offloading (#11081) · 4b27c4a4

Sayak Paul authored Apr 08, 2025



* implement record_stream for better performance.

* fix

* style.

* merge #11097

* Update src/diffusers/hooks/group_offloading.py
Co-authored-by: Aryan <aryan@huggingface.co>

* fixes

* docstring.

* remaining todos in low_cpu_mem_usage

* tests

* updates to docs.

---------
Co-authored-by: Aryan <aryan@huggingface.co>

4b27c4a4

24 Mar, 2025 1 commit

Improve information about group offloading and layerwise casting (#11101) · 1ddf3f3a

Aryan authored Mar 24, 2025



* update

* Update docs/source/en/optimization/memory.md

* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* apply review suggestions

* update

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

1ddf3f3a

20 Mar, 2025 1 commit
- Provide option to reduce CPU RAM usage in Group Offload (#11106) · 2c1ed50f
  Dhruv Nair authored Mar 20, 2025
```
* update

* update

* clean up
```
  2c1ed50f
18 Mar, 2025 2 commits
- Fix Group offloading behaviour when using streams (#11097) · 3be67060
  Aryan authored Mar 18, 2025
```
* update

* update
```
  3be67060
- Group offloading improvements (#11094) · 813d42cc
  Aryan authored Mar 18, 2025
```
update
```
  813d42cc
14 Feb, 2025 1 commit

Module Group Offloading (#10503) · 9a147b82

Aryan authored Feb 14, 2025



* update

* fix

* non_blocking; handle parameters and buffers

* update

* Group offloading with cuda stream prefetching (#10516)

* cuda stream prefetch

* remove breakpoints

* update

* copy model hook implementation from pab

* update; ~very workaround based implementation but it seems to work as expected; needs cleanup and rewrite

* more workarounds to make it actually work

* cleanup

* rewrite

* update

* make sure to sync current stream before overwriting with pinned params

not doing so will lead to erroneous computations on the GPU and cause bad results

* better check

* update

* remove hook implementation to not deal with merge conflict

* re-add hook changes

* why use more memory when less memory do trick

* why still use slightly more memory when less memory do trick

* optimise

* add model tests

* add pipeline tests

* update docs

* add layernorm and groupnorm

* address review comments

* improve tests; add docs

* improve docs

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply suggestions from code review

* update tests

* apply suggestions from review

* enable_group_offloading -> enable_group_offload for naming consistency

* raise errors if multiple offloading strategies used; add relevant tests

* handle .to() when group offload applied

* refactor some repeated code

* remove unintentional change from merge conflict

* handle .cuda()

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

9a147b82