1. 14 Feb, 2025 1 commit
    • Aryan's avatar
      Module Group Offloading (#10503) · 9a147b82
      Aryan authored
      
      
      * update
      
      * fix
      
      * non_blocking; handle parameters and buffers
      
      * update
      
      * Group offloading with cuda stream prefetching (#10516)
      
      * cuda stream prefetch
      
      * remove breakpoints
      
      * update
      
      * copy model hook implementation from pab
      
      * update; ~very workaround based implementation but it seems to work as expected; needs cleanup and rewrite
      
      * more workarounds to make it actually work
      
      * cleanup
      
      * rewrite
      
      * update
      
      * make sure to sync current stream before overwriting with pinned params
      
      not doing so will lead to erroneous computations on the GPU and cause bad results
      
      * better check
      
      * update
      
      * remove hook implementation to not deal with merge conflict
      
      * re-add hook changes
      
      * why use more memory when less memory do trick
      
      * why still use slightly more memory when less memory do trick
      
      * optimise
      
      * add model tests
      
      * add pipeline tests
      
      * update docs
      
      * add layernorm and groupnorm
      
      * address review comments
      
      * improve tests; add docs
      
      * improve docs
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * apply suggestions from code review
      
      * update tests
      
      * apply suggestions from review
      
      * enable_group_offloading -> enable_group_offload for naming consistency
      
      * raise errors if multiple offloading strategies used; add relevant tests
      
      * handle .to() when group offload applied
      
      * refactor some repeated code
      
      * remove unintentional change from merge conflict
      
      * handle .cuda()
      
      ---------
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      9a147b82
  2. 13 Feb, 2025 1 commit
    • Aryan's avatar
      Disable PEFT input autocast when using fp8 layerwise casting (#10685) · a0c22997
      Aryan authored
      * disable peft input autocast
      
      * use new peft method name; only disable peft input autocast if submodule layerwise casting active
      
      * add test; reference PeftInputAutocastDisableHook in peft docs
      
      * add load_lora_weights test
      
      * casted -> cast
      
      * Update tests/lora/utils.py
      a0c22997
  3. 12 Feb, 2025 2 commits
  4. 11 Feb, 2025 2 commits
  5. 03 Feb, 2025 2 commits
  6. 27 Jan, 2025 1 commit
  7. 23 Jan, 2025 3 commits
  8. 22 Jan, 2025 1 commit
    • Aryan's avatar
      [core] Layerwise Upcasting (#10347) · beacaa55
      Aryan authored
      
      
      * update
      
      * update
      
      * make style
      
      * remove dynamo disable
      
      * add coauthor
      Co-Authored-By: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      * update
      
      * update
      
      * update
      
      * update mixin
      
      * add some basic tests
      
      * update
      
      * update
      
      * non_blocking
      
      * improvements
      
      * update
      
      * norm.* -> norm
      
      * apply suggestions from review
      
      * add example
      
      * update hook implementation to the latest changes from pyramid attention broadcast
      
      * deinitialize should raise an error
      
      * update doc page
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * update docs
      
      * update
      
      * refactor
      
      * fix _always_upcast_modules for asym ae and vq_model
      
      * fix lumina embedding forward to not depend on weight dtype
      
      * refactor tests
      
      * add simple lora inference tests
      
      * _always_upcast_modules -> _precision_sensitive_module_patterns
      
      * remove todo comments about review; revert changes to self.dtype in unets because .dtype on ModelMixin should be able to handle fp8 weight case
      
      * check layer dtypes in lora test
      
      * fix UNet1DModelTests::test_layerwise_upcasting_inference
      
      * _precision_sensitive_module_patterns -> _skip_layerwise_casting_patterns based on feedback
      
      * skip test in NCSNppModelTests
      
      * skip tests for AutoencoderTinyTests
      
      * skip tests for AutoencoderOobleckTests
      
      * skip tests for UNet1DModelTests - unsupported pytorch operations
      
      * layerwise_upcasting -> layerwise_casting
      
      * skip tests for UNetRLModelTests; needs next pytorch release for currently unimplemented operation support
      
      * add layerwise fp8 pipeline test
      
      * use xfail
      
      * Apply suggestions from code review
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      * add assertion with fp32 comparison; add tolerance to fp8-fp32 vs fp32-fp32 comparison (required for a few models' test to pass)
      
      * add note about memory consumption on tesla CI runner for failing test
      
      ---------
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      beacaa55
  9. 20 Jan, 2025 1 commit
  10. 19 Jan, 2025 1 commit
  11. 16 Jan, 2025 2 commits
  12. 14 Jan, 2025 1 commit
    • Marc Sun's avatar
      [FEAT] DDUF format (#10037) · fbff43ac
      Marc Sun authored
      
      
      * load and save dduf archive
      
      * style
      
      * switch to zip uncompressed
      
      * updates
      
      * Update src/diffusers/pipelines/pipeline_utils.py
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * Update src/diffusers/pipelines/pipeline_utils.py
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * first draft
      
      * remove print
      
      * switch to dduf_file for consistency
      
      * switch to huggingface hub api
      
      * fix log
      
      * add a basic test
      
      * Update src/diffusers/configuration_utils.py
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * Update src/diffusers/pipelines/pipeline_utils.py
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * Update src/diffusers/pipelines/pipeline_utils.py
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * fix
      
      * fix variant
      
      * change saving logic
      
      * DDUF - Load transformers components manually (#10171)
      
      * update hfh version
      
      * Load transformers components manually
      
      * load encoder from_pretrained with state_dict
      
      * working version with transformers and tokenizer !
      
      * add generation_config case
      
      * fix tests
      
      * remove saving for now
      
      * typing
      
      * need next version from transformers
      
      * Update src/diffusers/configuration_utils.py
      Co-authored-by: default avatarLucain <lucain@huggingface.co>
      
      * check path corectly
      
      * Apply suggestions from code review
      Co-authored-by: default avatarLucain <lucain@huggingface.co>
      
      * udapte
      
      * typing
      
      * remove check for subfolder
      
      * quality
      
      * revert setup changes
      
      * oups
      
      * more readable condition
      
      * add loading from the hub test
      
      * add basic docs.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarLucain <lucain@huggingface.co>
      
      * add example
      
      * add
      
      * make functions private
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * minor.
      
      * fixes
      
      * fix
      
      * change the precdence of parameterized.
      
      * error out when custom pipeline is passed with dduf_file.
      
      * updates
      
      * fix
      
      * updates
      
      * fixes
      
      * updates
      
      * fix xfail condition.
      
      * fix xfail
      
      * fixes
      
      * sharded checkpoint compat
      
      * add test for sharded checkpoint
      
      * add suggestions
      
      * Update src/diffusers/models/model_loading_utils.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * from suggestions
      
      * add class attributes to flag dduf tests
      
      * last one
      
      * fix logic
      
      * remove comment
      
      * revert changes
      
      ---------
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      Co-authored-by: default avatarLucain <lucain@huggingface.co>
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      fbff43ac
  13. 13 Jan, 2025 2 commits
  14. 09 Jan, 2025 1 commit
  15. 08 Jan, 2025 1 commit
  16. 06 Jan, 2025 3 commits
  17. 02 Jan, 2025 1 commit
  18. 31 Dec, 2024 3 commits
  19. 28 Dec, 2024 1 commit
  20. 25 Dec, 2024 1 commit
  21. 24 Dec, 2024 1 commit
  22. 23 Dec, 2024 5 commits
  23. 21 Dec, 2024 1 commit
  24. 20 Dec, 2024 2 commits