Commits · 43459079ab06d4f94f2d93ba7153e2c5310928d3 · renzhc / diffusers_dcu

"vscode:/vscode.git/clone" did not exist on "026841447cd2e143b6d5d2fa9621b3bd1d975e25"

10 Sep, 2025 1 commit

[core] feat: support group offloading at the pipeline level (#12283) · 43459079

Sayak Paul authored Sep 10, 2025



* feat: support group offloading at the pipeline level.

* add tests

* up

* [docs] Pipeline group offloading (#12286)

init
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

43459079

20 Jun, 2025 2 commits
- [docs] device_map (#11711) · 6184d8a4
  Steven Liu authored Jun 20, 2025
```
draft
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
```
  6184d8a4
- [docs] Quantization + torch.compile + offloading (#11703) · 5a6e3864
  Steven Liu authored Jun 20, 2025
```
* draft

* feedback

* update

* feedback

* fix

* feedback

* feedback

* fix

* feedback
```
  5a6e3864
19 Jun, 2025 2 commits

make group offloading work with disk/nvme transfers (#11682) · 85a916bb

Sayak Paul authored Jun 19, 2025

* start implementing disk offloading in group.

* delete diff file.

* updates.patch

* offload_to_disk_path

* check if safetensors already exist.

* add test and clarify.

* updates

* update todos.

* update more docs.

* update docs

85a916bb

Update more licenses to 2025 (#11746) · a4df8dbc
Aryan authored Jun 19, 2025
```
update
```
a4df8dbc

19 May, 2025 1 commit

[docs] tip for group offloding + quantization (#11576) · 6918f6d1

Sayak Paul authored May 19, 2025



* tip for group offloding + quantization
Co-authored-by: Aryan VS <contact.aryanvs@gmail.com>

* Apply suggestions from code review
Co-authored-by: Aryan <aryan@huggingface.co>

---------
Co-authored-by: Aryan VS <contact.aryanvs@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>

6918f6d1

01 May, 2025 1 commit

[docs] Memory optims (#11385) · b848d479

Steven Liu authored May 01, 2025

* reformat

* initial

* fin

* review

* inference

* feedback

* feedback

* feedback

b848d479

08 Apr, 2025 1 commit

[feat] implement `record_stream` when using CUDA streams during group offloading (#11081) · 4b27c4a4

Sayak Paul authored Apr 08, 2025



* implement record_stream for better performance.

* fix

* style.

* merge #11097

* Update src/diffusers/hooks/group_offloading.py
Co-authored-by: Aryan <aryan@huggingface.co>

* fixes

* docstring.

* remaining todos in low_cpu_mem_usage

* tests

* updates to docs.

---------
Co-authored-by: Aryan <aryan@huggingface.co>

4b27c4a4

24 Mar, 2025 1 commit

Improve information about group offloading and layerwise casting (#11101) · 1ddf3f3a

Aryan authored Mar 24, 2025



* update

* Update docs/source/en/optimization/memory.md

* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* apply review suggestions

* update

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

1ddf3f3a

14 Feb, 2025 1 commit

Module Group Offloading (#10503) · 9a147b82

Aryan authored Feb 14, 2025



* update

* fix

* non_blocking; handle parameters and buffers

* update

* Group offloading with cuda stream prefetching (#10516)

* cuda stream prefetch

* remove breakpoints

* update

* copy model hook implementation from pab

* update; ~very workaround based implementation but it seems to work as expected; needs cleanup and rewrite

* more workarounds to make it actually work

* cleanup

* rewrite

* update

* make sure to sync current stream before overwriting with pinned params

not doing so will lead to erroneous computations on the GPU and cause bad results

* better check

* update

* remove hook implementation to not deal with merge conflict

* re-add hook changes

* why use more memory when less memory do trick

* why still use slightly more memory when less memory do trick

* optimise

* add model tests

* add pipeline tests

* update docs

* add layernorm and groupnorm

* address review comments

* improve tests; add docs

* improve docs

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply suggestions from code review

* update tests

* apply suggestions from review

* enable_group_offloading -> enable_group_offload for naming consistency

* raise errors if multiple offloading strategies used; add relevant tests

* handle .to() when group offload applied

* refactor some repeated code

* remove unintentional change from merge conflict

* handle .cuda()

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

9a147b82

22 Jan, 2025 1 commit

[core] Layerwise Upcasting (#10347) · beacaa55

Aryan authored Jan 22, 2025



* update

* update

* make style

* remove dynamo disable

* add coauthor
Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* update

* update

* update

* update mixin

* add some basic tests

* update

* update

* non_blocking

* improvements

* update

* norm.* -> norm

* apply suggestions from review

* add example

* update hook implementation to the latest changes from pyramid attention broadcast

* deinitialize should raise an error

* update doc page

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update docs

* update

* refactor

* fix _always_upcast_modules for asym ae and vq_model

* fix lumina embedding forward to not depend on weight dtype

* refactor tests

* add simple lora inference tests

* _always_upcast_modules -> _precision_sensitive_module_patterns

* remove todo comments about review; revert changes to self.dtype in unets because .dtype on ModelMixin should be able to handle fp8 weight case

* check layer dtypes in lora test

* fix UNet1DModelTests::test_layerwise_upcasting_inference

* _precision_sensitive_module_patterns -> _skip_layerwise_casting_patterns based on feedback

* skip test in NCSNppModelTests

* skip tests for AutoencoderTinyTests

* skip tests for AutoencoderOobleckTests

* skip tests for UNet1DModelTests - unsupported pytorch operations

* layerwise_upcasting -> layerwise_casting

* skip tests for UNetRLModelTests; needs next pytorch release for currently unimplemented operation support

* add layerwise fp8 pipeline test

* use xfail

* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* add assertion with fp32 comparison; add tolerance to fp8-fp32 vs fp32-fp32 comparison (required for a few models' test to pass)

* add note about memory consumption on tesla CI runner for failing test

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

beacaa55

16 Sep, 2024 1 commit

[docs] Replace runwayml/stable-diffusion-v1-5 with Lykon/dreamshaper-8 (#9428) · b52119ae

suzukimain authored Sep 17, 2024



* [docs] Replace runwayml/stable-diffusion-v1-5 with Lykon/dreamshaper-8

Updated documentation as runwayml/stable-diffusion-v1-5 has been removed from Huggingface.

* Update docs/source/en/using-diffusers/inpaint.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Replace with stable-diffusion-v1-5/stable-diffusion-v1-5

* Update inpaint.md

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

b52119ae

10 May, 2024 1 commit

#7535 Update FloatTensor type hints to Tensor (#7883) · be4afa0b

Mark Van Aken authored May 10, 2024

* find & replace all FloatTensors to Tensor

* apply formatting

* Update torch.FloatTensor to torch.Tensor in the remaining files

* formatting

* Fix the rest of the places where FloatTensor is used as well as in documentation

* formatting

* Update new file from FloatTensor to Tensor

be4afa0b

08 Feb, 2024 1 commit
- change to 2024 in the license (#6902) · 30e5e81d
  Sayak Paul authored Feb 08, 2024
```
change to 2024
```
  30e5e81d
16 Nov, 2023 1 commit
- [`Docs`] Update and make improvements (#5819) · c697f524
  M. Tolga Cangöz authored Nov 17, 2023
```
Update and make improvements
```
  c697f524
09 Nov, 2023 1 commit

[`Docs`] Fix typos and update files at Optimization Page (#5674) · 53a8439f

M. Tolga Cangöz authored Nov 10, 2023



* Fix typos, update, trim trailing whitespace

* Trim trailing whitespaces

* Update docs/source/en/optimization/memory.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/memory.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update _toctree.yml

* Update adapt_a_model.md

* Reverse

* Reverse

* Reverse

* Update dreambooth.md

* Update instructpix2pix.md

* Update lora.md

* Update overview.md

* Update t2i_adapters.md

* Update text2image.md

* Update text_inversion.md

* Update create_dataset.md

* Update create_dataset.md

* Update create_dataset.md

* Update create_dataset.md

* Update coreml.md

* Delete docs/source/en/training/create_dataset.md

* Original create_dataset.md

* Update create_dataset.md

* Delete docs/source/en/training/create_dataset.md

* Add original file

* Delete docs/source/en/training/create_dataset.md

* Add original one

* Delete docs/source/en/training/text2image.md

* Delete docs/source/en/training/instructpix2pix.md

* Delete docs/source/en/training/dreambooth.md

* Add original files

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

53a8439f

27 Sep, 2023 1 commit
- [Docs] Improve xformers page (#5196) · ad06e510
  Patrick von Platen authored Sep 27, 2023
```
[Docs] Improve
```
  ad06e510
13 Sep, 2023 1 commit

[docs] Create clearer optimization sections (#4870) · 19edca82

Steven Liu authored Sep 13, 2023

* refactor

* update general optim sections

* update more sections

* few more updates

* benchmark code

19edca82