Commits · 0454fbb30bfbe21aa4ea29c827c396bac57dc518 · renzhc / diffusers_dcu

08 Jul, 2025 2 commits

Aryan authored Jul 09, 2025



* update

* modify flux single blocks to make compatible with cache techniques (without too much model-specific intrusion code)

* remove debug logs

* update

* cache context for different batches of data

* fix hs residual bug for single return outputs; support ltx

* fix controlnet flux

* support flux, ltx i2v, ltx condition

* update

* update

* Update docs/source/en/api/cache.md

* Update src/diffusers/hooks/hooks.py
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* address review comments pt. 1

* address review comments pt. 2

* cache context refacotr; address review pt. 3

* address review comments

* metadata registration with decorators instead of centralized

* support cogvideox

* support mochi

* fix

* remove unused function

* remove central registry based on review

* update

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

0454fbb3

[CI] Fix big GPU test marker (#11786) · cbc8ced2
Dhruv Nair authored Jul 08, 2025
```
* update

* update
```
cbc8ced2

11 Jun, 2025 1 commit

enable torchao test cases on XPU and switch to device agnostic APIs for test cases (#11654) · 33e636ce

Yao Matrix authored Jun 11, 2025



* enable torchao cases on XPU
Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* device agnostic APIs
Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* more
Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* enable test_torch_compile_recompilation_and_graph_break on XPU
Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* resolve comments
Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>

33e636ce

09 Apr, 2025 1 commit
- Update Ruff to latest Version (#10919) · edc154da
  Dhruv Nair authored Apr 09, 2025
```
* update

* update

* update

* update
```
  edc154da
21 Mar, 2025 1 commit

[core] FasterCache (#10163) · 844221ae

Aryan authored Mar 21, 2025



* init

* update

* update

* update

* make style

* update

* fix

* make it work with guidance distilled models

* update

* make fix-copies

* add tests

* update

* apply_faster_cache -> apply_fastercache

* fix

* reorder

* update

* refactor

* update docs

* add fastercache to CacheMixin

* update tests

* Apply suggestions from code review

* make style

* try to fix partial import error

* Apply style fixes

* raise warning

* update

---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

844221ae

20 Mar, 2025 1 commit

[tests] make cuda only tests device-agnostic (#11058) · 15ad97f7

Fanli Lin authored Mar 20, 2025

* enable bnb on xpu

* add 2 more cases

* add missing change

* add missing change

* add one more

* enable cuda only tests on xpu

* enable big gpu cases

15ad97f7

04 Mar, 2025 1 commit

[tests] make tests device-agnostic (part 4) (#10508) · 7855ac59

Fanli Lin authored Mar 04, 2025



* initial comit

* fix empty cache

* fix one more

* fix style

* update device functions

* update

* update

* Update src/diffusers/utils/testing_utils.py
Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/utils/testing_utils.py
Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/utils/testing_utils.py
Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/controlnet/test_controlnet.py
Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/utils/testing_utils.py
Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/utils/testing_utils.py
Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/controlnet/test_controlnet.py
Co-authored-by: hlky <hlky@hlky.ac>

* with gc.collect

* update

* make style

* check_torch_dependencies

* add mps empty cache

* add changes

* bug fix

* enable on xpu

* update more cases

* revert

* revert back

* Update test_stable_diffusion_xl.py

* Update tests/pipelines/stable_diffusion/test_stable_diffusion.py
Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/stable_diffusion/test_stable_diffusion.py
Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/stable_diffusion/test_stable_diffusion_img2img.py
Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/stable_diffusion/test_stable_diffusion_img2img.py
Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/stable_diffusion/test_stable_diffusion_img2img.py
Co-authored-by: hlky <hlky@hlky.ac>

* Apply suggestions from code review
Co-authored-by: hlky <hlky@hlky.ac>

* add test marker

---------
Co-authored-by: hlky <hlky@hlky.ac>

7855ac59

03 Mar, 2025 1 commit

[Tests] Remove more encode prompts tests (#10942) · 7513162b

Sayak Paul authored Mar 03, 2025

* fix-copies went uncaught it seems.

* remove more unneeded encode_prompt() tests

* Revert "fix-copies went uncaught it seems."

This reverts commit eefb302791172a4fb8ef008e400f94878de2c6c9.

* empty

7513162b

14 Feb, 2025 1 commit

Module Group Offloading (#10503) · 9a147b82

Aryan authored Feb 14, 2025



* update

* fix

* non_blocking; handle parameters and buffers

* update

* Group offloading with cuda stream prefetching (#10516)

* cuda stream prefetch

* remove breakpoints

* update

* copy model hook implementation from pab

* update; ~very workaround based implementation but it seems to work as expected; needs cleanup and rewrite

* more workarounds to make it actually work

* cleanup

* rewrite

* update

* make sure to sync current stream before overwriting with pinned params

not doing so will lead to erroneous computations on the GPU and cause bad results

* better check

* update

* remove hook implementation to not deal with merge conflict

* re-add hook changes

* why use more memory when less memory do trick

* why still use slightly more memory when less memory do trick

* optimise

* add model tests

* add pipeline tests

* update docs

* add layernorm and groupnorm

* address review comments

* improve tests; add docs

* improve docs

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply suggestions from code review

* update tests

* apply suggestions from review

* enable_group_offloading -> enable_group_offload for naming consistency

* raise errors if multiple offloading strategies used; add relevant tests

* handle .to() when group offload applied

* refactor some repeated code

* remove unintentional change from merge conflict

* handle .cuda()

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

9a147b82

27 Jan, 2025 1 commit

[core] Pyramid Attention Broadcast (#9562) · 658e24e8

Aryan authored Jan 28, 2025



* start pyramid attention broadcast

* add coauthor
Co-Authored-By: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>

* update

* make style

* update

* make style

* add docs

* add tests

* update

* Update docs/source/en/api/pipelines/cogvideox.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/pipelines/cogvideox.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Pyramid Attention Broadcast rewrite + introduce hooks (#9826)

* rewrite implementation with hooks

* make style

* update

* merge pyramid-attention-rewrite-2

* make style

* remove changes from latte transformer

* revert docs changes

* better debug message

* add todos for future

* update tests

* make style

* cleanup

* fix

* improve log message; fix latte test

* refactor

* update

* update

* update

* revert changes to tests

* update docs

* update tests

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update

* fix flux test

* reorder

* refactor

* make fix-copies

* update docs

* fixes

* more fixes

* make style

* update tests

* update code example

* make fix-copies

* refactor based on reviews

* use maybe_free_model_hooks

* CacheMixin

* make style

* update

* add current_timestep property; update docs

* make fix-copies

* update

* improve tests

* try circular import fix

* apply suggestions from review

* address review comments

* Apply suggestions from code review

* refactor hook implementation

* add test suite for hooks

* PAB Refactor (#10667)

* update

* update

* update

---------
Co-authored-by: DN6 <dhruv.nair@gmail.com>

* update

* fix remove hook behaviour

---------
Co-authored-by: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: DN6 <dhruv.nair@gmail.com>

658e24e8

22 Jan, 2025 1 commit

[core] Layerwise Upcasting (#10347) · beacaa55

Aryan authored Jan 22, 2025



* update

* update

* make style

* remove dynamo disable

* add coauthor
Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* update

* update

* update

* update mixin

* add some basic tests

* update

* update

* non_blocking

* improvements

* update

* norm.* -> norm

* apply suggestions from review

* add example

* update hook implementation to the latest changes from pyramid attention broadcast

* deinitialize should raise an error

* update doc page

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update docs

* update

* refactor

* fix _always_upcast_modules for asym ae and vq_model

* fix lumina embedding forward to not depend on weight dtype

* refactor tests

* add simple lora inference tests

* _always_upcast_modules -> _precision_sensitive_module_patterns

* remove todo comments about review; revert changes to self.dtype in unets because .dtype on ModelMixin should be able to handle fp8 weight case

* check layer dtypes in lora test

* fix UNet1DModelTests::test_layerwise_upcasting_inference

* _precision_sensitive_module_patterns -> _skip_layerwise_casting_patterns based on feedback

* skip test in NCSNppModelTests

* skip tests for AutoencoderTinyTests

* skip tests for AutoencoderOobleckTests

* skip tests for UNet1DModelTests - unsupported pytorch operations

* layerwise_upcasting -> layerwise_casting

* skip tests for UNetRLModelTests; needs next pytorch release for currently unimplemented operation support

* add layerwise fp8 pipeline test

* use xfail

* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* add assertion with fp32 comparison; add tolerance to fp8-fp32 vs fp32-fp32 comparison (required for a few models' test to pass)

* add note about memory consumption on tesla CI runner for failing test

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

beacaa55

12 Jan, 2025 1 commit
- [Flux] Improve true cfg condition (#10539) · edb8c1bc
  Sayak Paul authored Jan 12, 2025
```
* improve flux true cfg condition

* add test
```
  edb8c1bc
10 Jan, 2025 1 commit

[LoRA] allow big CUDA tests to run properly for LoRA (and others) (#9845) · a6f043a8

Sayak Paul authored Jan 10, 2025



* allow big lora tests to run on the CI.

* print

* print.

* print

* print

* print

* print

* more

* print

* remove print.

* remove print

* directly place on cuda.

* remove pipeline.

* remove

* fix

* fix

* spaces

* quality

* updates

* directly place flux controlnet pipeline on cuda.

* torch_device instead of cuda.

* style

* device placement.

* fixes

* add big gpu marker for mochi; rename test correctly

* address feedback

* fix

---------
Co-authored-by: Aryan <aryan@huggingface.co>

a6f043a8

21 Dec, 2024 1 commit

Support Flux IP Adapter (#10261) · be207099

hlky authored Dec 21, 2024



* Flux IP-Adapter

* test cfg

* make style

* temp remove copied from

* fix test

* fix test

* v2

* fix

* make style

* temp remove copied from

* Apply suggestions from code review
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Move encoder_hid_proj to inside FluxTransformer2DModel

* merge

* separate encode_prompt, add copied from, image_encoder offload

* make

* fix test

* fix

* Update src/diffusers/pipelines/flux/pipeline_flux.py

* test_flux_prompt_embeds change not needed

* true_cfg -> true_cfg_scale

* fix merge conflict

* test_flux_ip_adapter_inference

* add fast test

* FluxIPAdapterMixin not test mixin

* Update pipeline_flux.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>

be207099

20 Nov, 2024 1 commit

Flux latents fix (#9929) · f6f7afa1

Dhruv Nair authored Nov 20, 2024



* update

* update

* update

* update

* update

* update

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

f6f7afa1

31 Oct, 2024 3 commits

Revert "[LoRA] fix: lora loading when using with a device_mapped mode… (#9823) · d2e5cb3c
YiYi Xu authored Oct 31, 2024
```
Revert "[LoRA] fix: lora loading when using with a device_mapped model. (#9449)"

This reverts commit 41e4779d.
```
d2e5cb3c

[LoRA] fix: lora loading when using with a device_mapped model. (#9449) · 41e4779d

Sayak Paul authored Oct 31, 2024



* fix: lora loading when using with a device_mapped model.

* better attibutung

* empty
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* minors

* better error messages.

* fix-copies

* add: tests, docs.

* add hardware note.

* quality

* Update docs/source/en/training/distributed_inference.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fixes

* skip properly.

* fixes

---------
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

41e4779d

[CI] add a big GPU marker to run memory-intensive tests separately on CI (#9691) · ff182ad6

Sayak Paul authored Oct 31, 2024



* add a marker for big gpu tests

* update

* trigger on PRs temporarily.

* onnx

* fix

* total memory

* fixes

* reduce memory threshold.

* bigger gpu

* empty

* g6e

* Apply suggestions from code review

* address comments.

* fix

* fix

* fix

* fix

* fix

* okay

* further reduce.

* updates

* remove

* updates

* updates

* updates

* updates

* fixes

* fixes

* updates.

* fix

* workflow fixes.

---------
Co-authored-by: Aryan <aryan@huggingface.co>

ff182ad6

02 Sep, 2024 1 commit
- [CI] More fixes for Fast GPU Tests on main (#9300) · 007ad0e2
  Dhruv Nair authored Sep 02, 2024
```
update
```
  007ad0e2
23 Aug, 2024 1 commit
- [Core] fuse_qkv_projection() to Flux (#9185) · 2d9ccf39
  Sayak Paul authored Aug 23, 2024
```
* start fusing flux.

* test

* finish fusion

* fix-copues
```
  2d9ccf39
02 Aug, 2024 1 commit

[Flux] allow tests to run (#9050) · 0e460675

Sayak Paul authored Aug 02, 2024

* fix tests

* fix

* float64 skip

* remove sample_size.

* remove

* remove more

* default_sample_size.

* credit black forest for flux model.

* skip

* fix: tests

* remove OriginalModelMixin

* add transformer model test

* add: transformer model tests

0e460675

01 Aug, 2024 1 commit

Flux pipeline (#9043) · 27637a54

Sayak Paul authored Aug 02, 2024



add flux!
Signed-off-by: Adrien <adrien@huggingface.co>
Co-authored-by: Adrien <adrien.69740@gmail.com>
Co-authored-by: Anatoly Belikov <abelikov@singularitynet.io>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>

27637a54

24 Jul, 2024 1 commit

[Core] fix QKV fusion for attention (#8829) · 50d21f7c

Sayak Paul authored Jul 24, 2024

* start debugging the problem,

* start

* fix

* fix

* fix imports.

* handle hunyuan

* remove residuals.

* add a check for making sure there's appropriate procs.

* add more rigor to the tests.

* fix test

* remove redundant check

* fix-copies

* move check_qkv_fusion_matches_attn_procs_length and check_qkv_fusion_processors_exist.

50d21f7c

12 Jun, 2024 1 commit

Add Stable Diffusion 3 (#8483) · 04717fd8

Dhruv Nair authored Jun 13, 2024



* up

* add sd3

* update

* update

* add tests

* fix copies

* fix docs

* update

* add dreambooth lora

* add LoRA

* update

* update

* update

* update

* import fix

* update

* Update src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* import fix 2

* update

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* update

* update

* update

* fix ckpt id

* fix more ids

* update

* missing doc

* Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update'

* fix

* update

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

* note on gated access.

* requirements

* licensing

---------
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>

04717fd8