Commits · e5c43b8af7e913a0c8d1fe232ebdda3539f25025 · renzhc / diffusers_dcu

27 Feb, 2025 1 commit

[CI] Fix Fast GPU tests on PR (#10912) · e5c43b8a

Dhruv Nair authored Feb 27, 2025



* update

* update

* update

* update

* update

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

e5c43b8a

19 Feb, 2025 1 commit

[FEAT] Model loading refactor (#10604) · f5929e03

Marc Sun authored Feb 19, 2025



* first draft model loading refactor

* revert name change

* fix bnb

* revert name

* fix dduf

* fix huanyan

* style

* Update src/diffusers/models/model_loading_utils.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* suggestions from reviews

* Update src/diffusers/models/modeling_utils.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove safetensors check

* fix default value

* more fix from suggestions

* revert logic for single file

* style

* typing + fix couple of issues

* improve speed

* Update src/diffusers/models/modeling_utils.py
Co-authored-by: Aryan <aryan@huggingface.co>

* fp8 dtype

* add tests

* rename resolved_archive_file to resolved_model_file

* format

* map_location default cpu

* add utility function

* switch to smaller model + test inference

* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* rm comment

* add log

* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* add decorator

* cosine sim instead

* fix use_keep_in_fp32_modules

* comm

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>

f5929e03

14 Feb, 2025 1 commit

Module Group Offloading (#10503) · 9a147b82

Aryan authored Feb 14, 2025



* update

* fix

* non_blocking; handle parameters and buffers

* update

* Group offloading with cuda stream prefetching (#10516)

* cuda stream prefetch

* remove breakpoints

* update

* copy model hook implementation from pab

* update; ~very workaround based implementation but it seems to work as expected; needs cleanup and rewrite

* more workarounds to make it actually work

* cleanup

* rewrite

* update

* make sure to sync current stream before overwriting with pinned params

not doing so will lead to erroneous computations on the GPU and cause bad results

* better check

* update

* remove hook implementation to not deal with merge conflict

* re-add hook changes

* why use more memory when less memory do trick

* why still use slightly more memory when less memory do trick

* optimise

* add model tests

* add pipeline tests

* update docs

* add layernorm and groupnorm

* address review comments

* improve tests; add docs

* improve docs

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply suggestions from code review

* update tests

* apply suggestions from review

* enable_group_offloading -> enable_group_offload for naming consistency

* raise errors if multiple offloading strategies used; add relevant tests

* handle .to() when group offload applied

* refactor some repeated code

* remove unintentional change from merge conflict

* handle .cuda()

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

9a147b82

11 Feb, 2025 1 commit
- [Tests] Test layerwise casting with training (#10765) · c80eda9d
  Sayak Paul authored Feb 11, 2025
```
* add a test to check if we can train with layerwise casting.

* updates

* updates

* style
```
  c80eda9d
28 Jan, 2025 2 commits

[Tests] conditionally check `fp8_e4m3_bf16_max_memory < fp8_e4m3_fp32_max_memory` (#10669) · 7b100ce5
Sayak Paul authored Jan 28, 2025
```
* conditionally check if compute capability is met.

* log info.

* fix condition.

* updates

* updates

* updates

* updates
```
7b100ce5

Refactor gradient checkpointing (#10611) · c4d4ac21

Aryan authored Jan 28, 2025

* update

* remove unused fn

* apply suggestions based on review

* update + cleanup 🧹

* more cleanup 🧹

* make fix-copies

* update test

c4d4ac21

22 Jan, 2025 1 commit

[core] Layerwise Upcasting (#10347) · beacaa55

Aryan authored Jan 22, 2025



* update

* update

* make style

* remove dynamo disable

* add coauthor
Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* update

* update

* update

* update mixin

* add some basic tests

* update

* update

* non_blocking

* improvements

* update

* norm.* -> norm

* apply suggestions from review

* add example

* update hook implementation to the latest changes from pyramid attention broadcast

* deinitialize should raise an error

* update doc page

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update docs

* update

* refactor

* fix _always_upcast_modules for asym ae and vq_model

* fix lumina embedding forward to not depend on weight dtype

* refactor tests

* add simple lora inference tests

* _always_upcast_modules -> _precision_sensitive_module_patterns

* remove todo comments about review; revert changes to self.dtype in unets because .dtype on ModelMixin should be able to handle fp8 weight case

* check layer dtypes in lora test

* fix UNet1DModelTests::test_layerwise_upcasting_inference

* _precision_sensitive_module_patterns -> _skip_layerwise_casting_patterns based on feedback

* skip test in NCSNppModelTests

* skip tests for AutoencoderTinyTests

* skip tests for AutoencoderOobleckTests

* skip tests for UNet1DModelTests - unsupported pytorch operations

* layerwise_upcasting -> layerwise_casting

* skip tests for UNetRLModelTests; needs next pytorch release for currently unimplemented operation support

* add layerwise fp8 pipeline test

* use xfail

* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* add assertion with fp32 comparison; add tolerance to fp8-fp32 vs fp32-fp32 comparison (required for a few models' test to pass)

* add note about memory consumption on tesla CI runner for failing test

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

beacaa55

21 Jan, 2025 2 commits

fix offload gpu tests etc (#10366) · a1f9a712
YiYi Xu authored Jan 21, 2025
```
* add

* style
```
a1f9a712

[tests] make tests device-agnostic (part 3) (#10437) · ec37e209

Fanli Lin authored Jan 21, 2025



* initial comit

* fix empty cache

* fix one more

* fix style

* update device functions

* update

* update

* Update src/diffusers/utils/testing_utils.py
Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/utils/testing_utils.py
Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/utils/testing_utils.py
Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/controlnet/test_controlnet.py
Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/utils/testing_utils.py
Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/utils/testing_utils.py
Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/controlnet/test_controlnet.py
Co-authored-by: hlky <hlky@hlky.ac>

* with gc.collect

* update

* make style

* check_torch_dependencies

* add mps empty cache

* bug fix

* Apply suggestions from code review

---------
Co-authored-by: hlky <hlky@hlky.ac>

ec37e209

23 Dec, 2024 1 commit

[Sana bug] bug fix for 2K model config (#10340) · b58868e6

Junsong Chen authored Dec 23, 2024



* fix the Positinoal Embedding bug in 2K model;

* Change the default model to the BF16 one for more stable training and output

* make style

* substract buffer size

* add compute_module_persistent_sizes

---------
Co-authored-by: yiyixuxu <yixu310@gmail.com>

b58868e6

20 Dec, 2024 1 commit

Enable Gradient Checkpointing for UNet2DModel (New) (#7201) · 648d968c

dg845 authored Dec 19, 2024



* Port UNet2DModel gradient checkpointing code from #6718.


---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Vincent Neemie <92559302+VincentNeemie@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>

648d968c

04 Dec, 2024 1 commit

[tests] refactor vae tests (#9808) · c1926cef

Sayak Paul authored Dec 04, 2024



* add: autoencoderkl tests

* autoencodertiny.

* fix

* asymmetric autoencoder.

* more

* integration tests for stable audio decoder.

* consistency decoder vae tests

* remove grad check from consistency decoder.

* cog

* bye test_models_vae.py

* fix

* fix

* remove allegro

* fixes

* fixes

* fixes

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

c1926cef

19 Nov, 2024 1 commit
- [LoRA] feat: `save_lora_adapter()` (#9862) · 7d0b9c4d
  Sayak Paul authored Nov 19, 2024
```
* feat: save_lora_adapter.
```
  7d0b9c4d
31 Oct, 2024 1 commit
- [Tests] clean up and refactor gradient checkpointing tests (#9494) · 4adf6aff
  Sayak Paul authored Oct 31, 2024
```
* check.

* fixes

* fixes

* updates

* fixes

* fixes
```
  4adf6aff
28 Sep, 2024 1 commit

[Core] fix variant-identification. (#9253) · 11542431

Sayak Paul authored Sep 28, 2024



* fix variant-idenitification.

* fix variant

* fix sharded variant checkpoint loading.

* Apply suggestions from code review

* fixes.

* more fixes.

* remove print.

* fixes

* fixes

* comments

* fixes

* apply suggestions.

* hub_utils.py

* fix test

* updates

* fixes

* fixes

* Apply suggestions from code review
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* updates.

* removep patch file.

---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>

11542431

03 Sep, 2024 2 commits

[tests] remove/speedup some low signal tests (#9285) · 24053832

Aryan authored Sep 03, 2024

* remove 2 shapes from SDFunctionTesterMixin::test_vae_tiling

* combine freeu enable/disable test to reduce many inference runs

* remove low signal unet test for signature

* remove low signal embeddings test

* remove low signal progress bar test from PipelineTesterMixin

* combine ip-adapter single and multi tests to save many inferences

* fix broken tests

* Update tests/pipelines/test_pipelines_common.py

* Update tests/pipelines/test_pipelines_common.py

* add progress bar tests

24053832

[CI] More Fast GPU Test Fixes (#9346) · f6f16a0c
Dhruv Nair authored Sep 03, 2024
```
* update

* update

* update

* update
```
f6f16a0c

02 Sep, 2024 1 commit
- [CI] More fixes for Fast GPU Tests on main (#9300) · 007ad0e2
  Dhruv Nair authored Sep 02, 2024
```
update
```
  007ad0e2
21 Aug, 2024 1 commit

Flux followup (#9074) · c2916175

YiYi Xu authored Aug 21, 2024

* refactor rotary embeds

* adding jsmidt as co-author of this PR for https://github.com/huggingface/diffusers/pull/9133



---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Joseph Smidt <josephsmidt@gmail.com>

c2916175

24 Jul, 2024 1 commit
- [CI] Skip flaky download tests in PR CI (#8945) · 93983b67
  Dhruv Nair authored Jul 24, 2024
```
update
```
  93983b67
22 Jul, 2024 1 commit
- [Tests] proper skipping of request caching test (#8908) · af400040
  Sayak Paul authored Jul 23, 2024
```
proper skipping of request caching test
```
  af400040
17 Jul, 2024 1 commit
- [Core] fix: shard loading and saving when variant is provided. (#8869) · 0f09b01a
  Sayak Paul authored Jul 17, 2024
```
fix: shard loading and saving when variant is provided.
```
  0f09b01a
09 Jul, 2024 1 commit
- [Tests] fix more sharding tests (#8797) · a785992c
  Sayak Paul authored Jul 09, 2024
```
* fix

* fix

* ugly

* okay

* fix more

* fix oops
```
  a785992c
04 Jul, 2024 1 commit
- [Tests] fix sharding tests (#8764) · 31adeb41
  Sayak Paul authored Jul 04, 2024
```
fix sharding tests
```
  31adeb41
26 Jun, 2024 2 commits
- Update xformers SD3 test (#8712) · effe4b97
  Dhruv Nair authored Jun 27, 2024
```
update
```
  effe4b97
- Add decorator for compile tests (#8703) · 0f0b5318
  Dhruv Nair authored Jun 26, 2024
```
* update

* update

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
```
  0f0b5318
25 Jun, 2024 1 commit
- [Chore] create a utility for calculating the expected number of shards. (#8692) · 4ad7a1f5
  Sayak Paul authored Jun 25, 2024
```
create a utility for calculating the expected number of shards.
```
  4ad7a1f5
21 Jun, 2024 1 commit
- a few fix for shard checkpoints (#8656) · c71c19c5
  YiYi Xu authored Jun 20, 2024
```
fix
Co-authored-by: yiyixuxu <yixu310@gmail,com>
```
  c71c19c5
18 Jun, 2024 1 commit

Fix sharding when no device_map is passed (#8531) · 96399c3e

Marc Sun authored Jun 18, 2024



* Fix sharding when no device_map is passed

* style

* add tests

* align

* add docstring

* format

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

96399c3e

07 Jun, 2024 1 commit

[Core] support saving and loading of sharded checkpoints (#7830) · 7d887118

Sayak Paul authored Jun 07, 2024



* feat: support saving a model in sharded checkpoints.

* feat: make loading of sharded checkpoints work.

* add tests

* cleanse the loading logic a bit more.

* more resilience while loading from the Hub.

* parallelize shard downloads by using snapshot_download()/

* default to a shard size.

* more fix

* Empty-Commit

* debug

* fix

* uality

* more debugging

* fix more

* initial comments from Benjamin

* move certain methods to loading_utils

* add test to check if the correct number of shards are present.

* add a test to check if loading of sharded checkpoints from the Hub is okay

* clarify the unit when passed as an int.

* use hf_hub for sharding.

* remove unnecessary code

* remove unnecessary function

* lucain's comments.

* fixes

* address high-level comments.

* fix test

* subfolder shenanigans./

* Update src/diffusers/utils/hub_utils.py
Co-authored-by: Lucain <lucainp@gmail.com>

* Apply suggestions from code review
Co-authored-by: Lucain <lucainp@gmail.com>

* remove _huggingface_hub_version as not needed.

* address more feedback.

* add a test for local_files_only=True/

* need hf hub to be at least 0.23.2

* style

* final comment.

* clean up subfolder.

* deal with suffixes in code.

* _add_variant default.

* use weights_name_pattern

* remove add_suffix_keyword

* clean up downloading of sharded ckpts.

* don't return something special when using index.json

* fix more

* don't use bare except

* remove comments and catch the errors better

* fix a couple of things when using is_file()

* empty

---------
Co-authored-by: Lucain <lucainp@gmail.com>

7d887118

31 May, 2024 1 commit

[Core] Introduce class variants for `Transformer2DModel` (#7647) · 983dec3b

Sayak Paul authored May 31, 2024

* init for patches

* finish patched model.

* continuous transformer

* vectorized transformer2d.

* style.

* inits.

* fix-copies.

* introduce DiTTransformer2DModel.

* fixes

* use REMAPPING as suggested by @DN6

* better logging.

* add pixart transformer model.

* inits.

* caption_channels.

* attention masking.

* fix use_additional_conditions.

* remove print.

* debug

* flatten

* fix: assertion for sigma

* handle remapping for modeling_utils

* add tests for dit transformer2d

* quality

* placeholder for pixart tests

* pixart tests

* add _no_split_modules

* add docs.

* check

* check

* check

* check

* fix tests

* fix tests

* move Transformer output to modeling_output

* move errors better and bring back use_additional_conditions attribute.

* add unnecessary things from DiT.

* clean up pixart

* fix remapping

* fix device_map things in pixart2d.

* replace Transformer2DModel with appropriate classes in dit, pixart tests

* empty

* legacy mixin classes./

* use a remapping dict for fetching class names.

* change to specifc model types in the pipeline implementations.

* move _fetch_remapped_cls_from_config to modeling_loading_utils.py

* fix dependency problems.

* add deprecation note.

983dec3b

03 May, 2024 1 commit

Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when... · 58237364

HelloWorldBeginner authored May 04, 2024


Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed. (#7816)

* Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed.

* fix check code quality

* Decouple the NPU flash attention and make it an independent module.

* add doc and unit tests for npu flash attention.

---------
Co-authored-by: mhh001 <mahonghao1@huawei.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

58237364

01 May, 2024 1 commit
- [Tests] fix: device map tests for models (#7825) · 8909ab4b
  Sayak Paul authored May 01, 2024
```
* fix: device module tests

* remove patch file

* Empty-Commit
```
  8909ab4b
30 Apr, 2024 1 commit

[Core] introduce _no_split_modules to `ModelMixin` (#6396) · 3fd31eef

Sayak Paul authored Apr 30, 2024

* introduce _no_split_modules.

* unnecessary spaces.

* remove unnecessary kwargs and style

* fix: accelerate imports.

* change to _determine_device_map

* add the blocks that have residual connections.

* add: CrossAttnUpBlock2D

* add: testin

* style

* line-spaces

* quality

* add disk offload test without safetensors.

* checking disk offloading percentages.

* change model split

* add: utility for checking multi-gpu requirement.

* model parallelism test

* splits.

* splits.

* splits

* splits.

* splits.

* splits.

* offload folder to test_disk_offload_with_safetensors

* add _no_split_modules

* fix-copies

3fd31eef

26 Mar, 2024 1 commit
- [tests] skip dynamo tests when python is 3.12. (#7458) · 484c8ef3
  Sayak Paul authored Mar 26, 2024
```
skip dynamo tests when python is 3.12.
```
  484c8ef3
08 Feb, 2024 1 commit
- change to 2024 in the license (#6902) · 30e5e81d
  Sayak Paul authored Feb 08, 2024
```
change to 2024
```
  30e5e81d
26 Jan, 2024 1 commit

[Hub] feat: explicitly tag to diffusers when using push_to_hub (#6678) · d4c7ab7b

Sayak Paul authored Jan 27, 2024



* feat: explicitly tag to diffusers when using push_to_hub

* remove tags.

* reset repo.

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix: tests

* fix: push_to_hub behaviour for tagging from save_pretrained

* Apply suggestions from code review
Co-authored-by: Lucain <lucainp@gmail.com>

* Apply suggestions from code review
Co-authored-by: Lucain <lucainp@gmail.com>

* import fixes.

* add library name to existing model card.

* add: standalone test for generate_model_card

* fix tests for standalone method

* moved library_name to a better place.

* merge create_model_card and generate_model_card.

* fix test

* address lucain's comments

* fix return identation

* Apply suggestions from code review
Co-authored-by: Lucain <lucainp@gmail.com>

* address further comments.

* Update src/diffusers/pipelines/pipeline_utils.py
Co-authored-by: Lucain <lucainp@gmail.com>

---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lucain <lucainp@gmail.com>

d4c7ab7b

05 Dec, 2023 1 commit

Device agnostic testing (#5612) · f427345a

Arsalan authored Dec 05, 2023

* utils and test modifications to enable device agnostic testing

* device for manual seed in unet1d

* fix generator condition in vae test

* consistency changes to testing

* make style

* add device agnostic testing changes to source and one model test

* make dtype check fns private, log cuda fp16 case

* remove dtype checks from import utils, move to testing_utils

* adding tests for most model classes and one pipeline

* fix vae import

f427345a

09 Nov, 2023 1 commit

consistency decoder (#5694) · 2fd46405

Will Berman authored Nov 09, 2023



* consistency decoder

* rename

* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update src/diffusers/pipelines/consistency_models/pipeline_consistency_models.py

* uP

* Apply suggestions from code review

* uP

* uP

* uP

---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

2fd46405

07 Nov, 2023 1 commit
- Model tests xformers fixes (#5679) · 71f56c77
  Dhruv Nair authored Nov 07, 2023
```
* fix model xformers test

* update
```
  71f56c77