1. 06 Dec, 2025 1 commit
    • Tran Thanh Luan's avatar
      [Feat] TaylorSeer Cache (#12648) · 6290fdfd
      Tran Thanh Luan authored
      
      
      * init taylor_seer cache
      
      * make compatible with any tuple size returned
      
      * use logger for printing, add warmup feature
      
      * still update in warmup steps
      
      * refractor, add docs
      
      * add configurable cache, skip compute module
      
      * allow special cache ids only
      
      * add stop_predicts (cooldown)
      
      * update docs
      
      * apply ruff
      
      * update to handle multple calls per timestep
      
      * refractor to use state manager
      
      * fix format & doc
      
      * chores: naming, remove redundancy
      
      * add docs
      
      * quality & style
      
      * fix taylor precision
      
      * Apply style fixes
      
      * add tests
      
      * Apply style fixes
      
      * Remove TaylorSeerCacheTesterMixin from flux2 tests
      
      * rename identifiers, use more expressive taylor predict loop
      
      * torch compile compatible
      
      * Apply style fixes
      
      * Update src/diffusers/hooks/taylorseer_cache.py
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      * update docs
      
      * make fix-copies
      
      * fix example usage.
      
      * remove tests on flux kontext
      
      ---------
      Co-authored-by: default avatartoilaluan <toilaluan@github.com>
      Co-authored-by: default avatargithub-actions[bot] <github-actions[bot]@users.noreply.github.com>
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      6290fdfd
  2. 04 Dec, 2025 1 commit
  3. 03 Dec, 2025 1 commit
  4. 24 Nov, 2025 1 commit
  5. 27 Oct, 2025 1 commit
  6. 16 Oct, 2025 1 commit
  7. 30 Sep, 2025 1 commit
  8. 26 Sep, 2025 1 commit
  9. 24 Sep, 2025 1 commit
    • DefTruth's avatar
      Introduce cache-dit to community optimization (#12366) · 310fdaf5
      DefTruth authored
      * docs: introduce cache-dit to diffusers
      
      * docs: introduce cache-dit to diffusers
      
      * docs: introduce cache-dit to diffusers
      
      * docs: introduce cache-dit to diffusers
      
      * docs: introduce cache-dit to diffusers
      
      * docs: introduce cache-dit to diffusers
      
      * docs: introduce cache-dit to diffusers
      
      * misc: update examples link
      
      * misc: update examples link
      
      * docs: introduce cache-dit to diffusers
      
      * docs: introduce cache-dit to diffusers
      
      * docs: introduce cache-dit to diffusers
      
      * docs: introduce cache-dit to diffusers
      
      * docs: introduce cache-dit to diffusers
      
      * Refine documentation for CacheDiT features
      
      Updated the wording for clarity and consistency in the documentation. Adjusted sections on cache acceleration, automatic block adapter, patch functor, and hybrid cache configuration.
      310fdaf5
  10. 23 Sep, 2025 1 commit
  11. 10 Sep, 2025 1 commit
  12. 25 Aug, 2025 2 commits
  13. 18 Jul, 2025 1 commit
  14. 11 Jul, 2025 1 commit
  15. 26 Jun, 2025 2 commits
  16. 20 Jun, 2025 2 commits
  17. 19 Jun, 2025 2 commits
  18. 16 Jun, 2025 1 commit
    • David Berenstein's avatar
      Add Pruna optimization framework documentation (#11688) · 9b834f87
      David Berenstein authored
      
      
      * Add Pruna optimization framework documentation
      
      - Introduced a new section for Pruna in the table of contents.
      - Added comprehensive documentation for Pruna, detailing its optimization techniques, installation instructions, and examples for optimizing and evaluating models
      
      * Enhance Pruna documentation with image alt text and code block formatting
      
      - Added alt text to images for better accessibility and context.
      - Changed code block syntax from diff to python for improved clarity.
      
      * Add installation section to Pruna documentation
      
      - Introduced a new installation section in the Pruna documentation to guide users on how to install the framework.
      - Enhanced the overall clarity and usability of the documentation for new users.
      
      * Update pruna.md
      
      * Update pruna.md
      
      * Update Pruna documentation for model optimization and evaluation
      
      - Changed section titles for consistency and clarity, from "Optimizing models" to "Optimize models" and "Evaluating and benchmarking optimized models" to "Evaluate and benchmark models".
      - Enhanced descriptions to clarify the use of `diffusers` models and the evaluation process.
      - Added a new example for evaluating standalone `diffusers` models.
      - Updated references and links for better navigation within the documentation.
      
      * Refactor Pruna documentation for clarity and consistency
      
      - Removed outdated references to FLUX-juiced and streamlined the explanation of benchmarking.
      - Enhanced the description of evaluating standalone `diffusers` models.
      - Cleaned up code examples by removing unnecessary imports and comments for better readability.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Enhance Pruna documentation with new examples and clarifications
      
      - Added an image to illustrate the optimization process.
      - Updated the explanation for sharing and loading optimized models on the Hugging Face Hub.
      - Clarified the evaluation process for optimized models using the EvaluationAgent.
      - Improved descriptions for defining metrics and evaluating standalone diffusers models.
      
      ---------
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      9b834f87
  19. 02 Jun, 2025 1 commit
  20. 28 May, 2025 1 commit
  21. 23 May, 2025 1 commit
  22. 19 May, 2025 2 commits
  23. 15 May, 2025 1 commit
  24. 01 May, 2025 1 commit
  25. 08 Apr, 2025 2 commits
  26. 24 Mar, 2025 1 commit
  27. 14 Feb, 2025 1 commit
    • Aryan's avatar
      Module Group Offloading (#10503) · 9a147b82
      Aryan authored
      
      
      * update
      
      * fix
      
      * non_blocking; handle parameters and buffers
      
      * update
      
      * Group offloading with cuda stream prefetching (#10516)
      
      * cuda stream prefetch
      
      * remove breakpoints
      
      * update
      
      * copy model hook implementation from pab
      
      * update; ~very workaround based implementation but it seems to work as expected; needs cleanup and rewrite
      
      * more workarounds to make it actually work
      
      * cleanup
      
      * rewrite
      
      * update
      
      * make sure to sync current stream before overwriting with pinned params
      
      not doing so will lead to erroneous computations on the GPU and cause bad results
      
      * better check
      
      * update
      
      * remove hook implementation to not deal with merge conflict
      
      * re-add hook changes
      
      * why use more memory when less memory do trick
      
      * why still use slightly more memory when less memory do trick
      
      * optimise
      
      * add model tests
      
      * add pipeline tests
      
      * update docs
      
      * add layernorm and groupnorm
      
      * address review comments
      
      * improve tests; add docs
      
      * improve docs
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * apply suggestions from code review
      
      * update tests
      
      * apply suggestions from review
      
      * enable_group_offloading -> enable_group_offload for naming consistency
      
      * raise errors if multiple offloading strategies used; add relevant tests
      
      * handle .to() when group offload applied
      
      * refactor some repeated code
      
      * remove unintentional change from merge conflict
      
      * handle .cuda()
      
      ---------
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      9a147b82
  28. 23 Jan, 2025 1 commit
  29. 22 Jan, 2025 1 commit
    • Aryan's avatar
      [core] Layerwise Upcasting (#10347) · beacaa55
      Aryan authored
      
      
      * update
      
      * update
      
      * make style
      
      * remove dynamo disable
      
      * add coauthor
      Co-Authored-By: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      * update
      
      * update
      
      * update
      
      * update mixin
      
      * add some basic tests
      
      * update
      
      * update
      
      * non_blocking
      
      * improvements
      
      * update
      
      * norm.* -> norm
      
      * apply suggestions from review
      
      * add example
      
      * update hook implementation to the latest changes from pyramid attention broadcast
      
      * deinitialize should raise an error
      
      * update doc page
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * update docs
      
      * update
      
      * refactor
      
      * fix _always_upcast_modules for asym ae and vq_model
      
      * fix lumina embedding forward to not depend on weight dtype
      
      * refactor tests
      
      * add simple lora inference tests
      
      * _always_upcast_modules -> _precision_sensitive_module_patterns
      
      * remove todo comments about review; revert changes to self.dtype in unets because .dtype on ModelMixin should be able to handle fp8 weight case
      
      * check layer dtypes in lora test
      
      * fix UNet1DModelTests::test_layerwise_upcasting_inference
      
      * _precision_sensitive_module_patterns -> _skip_layerwise_casting_patterns based on feedback
      
      * skip test in NCSNppModelTests
      
      * skip tests for AutoencoderTinyTests
      
      * skip tests for AutoencoderOobleckTests
      
      * skip tests for UNet1DModelTests - unsupported pytorch operations
      
      * layerwise_upcasting -> layerwise_casting
      
      * skip tests for UNetRLModelTests; needs next pytorch release for currently unimplemented operation support
      
      * add layerwise fp8 pipeline test
      
      * use xfail
      
      * Apply suggestions from code review
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      * add assertion with fp32 comparison; add tolerance to fp8-fp32 vs fp32-fp32 comparison (required for a few models' test to pass)
      
      * add note about memory consumption on tesla CI runner for failing test
      
      ---------
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      beacaa55
  30. 16 Jan, 2025 1 commit
  31. 25 Oct, 2024 1 commit
  32. 12 Oct, 2024 1 commit
  33. 23 Sep, 2024 1 commit
  34. 16 Sep, 2024 1 commit