Commits · 6290fdfda40610ce7b99920146853614ba529c6e · renzhc / diffusers_dcu

06 Dec, 2025 1 commit

[Feat] TaylorSeer Cache (#12648) · 6290fdfd

Tran Thanh Luan authored Dec 06, 2025



* init taylor_seer cache

* make compatible with any tuple size returned

* use logger for printing, add warmup feature

* still update in warmup steps

* refractor, add docs

* add configurable cache, skip compute module

* allow special cache ids only

* add stop_predicts (cooldown)

* update docs

* apply ruff

* update to handle multple calls per timestep

* refractor to use state manager

* fix format & doc

* chores: naming, remove redundancy

* add docs

* quality & style

* fix taylor precision

* Apply style fixes

* add tests

* Apply style fixes

* Remove TaylorSeerCacheTesterMixin from flux2 tests

* rename identifiers, use more expressive taylor predict loop

* torch compile compatible

* Apply style fixes

* Update src/diffusers/hooks/taylorseer_cache.py
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* update docs

* make fix-copies

* fix example usage.

* remove tests on flux kontext

---------
Co-authored-by: toilaluan <toilaluan@github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

6290fdfd

04 Dec, 2025 1 commit
- Update attention_backends.md to format kernels (#12757) · c3186860
  Sayak Paul authored Dec 04, 2025
  
  c3186860
03 Dec, 2025 1 commit

[core] start varlen variants for attn backend kernels. (#12765) · f48f9c25

Sayak Paul authored Dec 03, 2025

* start varlen variants for attn backend kernels.

* maybe unflatten heads.

* updates

* remove unused function.

* doc

* up

f48f9c25

24 Nov, 2025 1 commit

[core] support sage attention + FA2 through `kernels` (#12439) · d176f61f

Sayak Paul authored Nov 24, 2025

* up

* support automatic dispatch.

* disable compile support for now./

* up

* flash too.

* document.

* up

* up

* up

* up

d176f61f

27 Oct, 2025 1 commit

Add AITER attention backend (#12549) · 250f5cb5

Mikko Lauri authored Oct 27, 2025



* add aiter attention backend

* Apply style fixes

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

250f5cb5

16 Oct, 2025 1 commit

[docs] Attention checks (#12486) · 26475082

Steven Liu authored Oct 16, 2025



* checks

* feedback

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

26475082

30 Sep, 2025 1 commit
- [docs] Migrate syntax (#12390) · cc5b31ff
  Steven Liu authored Sep 30, 2025
```
* change syntax

* make style
```
  cc5b31ff
26 Sep, 2025 1 commit

[docs] slight edits to the attention backends docs. (#12394) · 9c094458

Sayak Paul authored Sep 26, 2025



* slight edits to the attention backends docs.

* Update docs/source/en/optimization/attention_backends.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

9c094458

24 Sep, 2025 1 commit

Introduce cache-dit to community optimization (#12366) · 310fdaf5

DefTruth authored Sep 25, 2025

* docs: introduce cache-dit to diffusers

* docs: introduce cache-dit to diffusers

* docs: introduce cache-dit to diffusers

* docs: introduce cache-dit to diffusers

* docs: introduce cache-dit to diffusers

* docs: introduce cache-dit to diffusers

* docs: introduce cache-dit to diffusers

* misc: update examples link

* misc: update examples link

* docs: introduce cache-dit to diffusers

* docs: introduce cache-dit to diffusers

* docs: introduce cache-dit to diffusers

* docs: introduce cache-dit to diffusers

* docs: introduce cache-dit to diffusers

* Refine documentation for CacheDiT features

Updated the wording for clarity and consistency in the documentation. Adjusted sections on cache acceleration, automatic block adapter, patch functor, and hybrid cache configuration.

310fdaf5

23 Sep, 2025 1 commit
- [docs] Attention backends (#12320) · a72bc0c4
  Steven Liu authored Sep 23, 2025
```
* init

* feedback

* update

* feedback

* fixes
```
  a72bc0c4
10 Sep, 2025 1 commit

[core] feat: support group offloading at the pipeline level (#12283) · 43459079

Sayak Paul authored Sep 10, 2025



* feat: support group offloading at the pipeline level.

* add tests

* up

* [docs] Pipeline group offloading (#12286)

init
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

43459079

25 Aug, 2025 2 commits

[docs] typo : corrected 'compile regions' to 'compile_regions' (#12199) · 8f8888a7
Manith Ratnayake authored Aug 26, 2025
```
[docs] typo: corrected 'compile regions' to 'compile_regions'
```
8f8888a7

fix title for compile + offload quantized models (#12233) · cf1ca728

Sayak Paul authored Aug 25, 2025



* up

* up

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

cf1ca728

18 Jul, 2025 1 commit

[docs] include bp link. (#11952) · 5dc503aa

Sayak Paul authored Jul 18, 2025



* include bp link.

* Update docs/source/en/optimization/fp16.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* resources.

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

5dc503aa

11 Jul, 2025 1 commit
- [docs] torch.compile blog post (#11837) · 9feb9464
  Steven Liu authored Jul 11, 2025
```
* add blog post

* feedback

* feedback
```
  9feb9464
26 Jun, 2025 2 commits

[tests] add a test on torch compile for varied resolutions (#11776) · a185e1ab

Sayak Paul authored Jun 26, 2025



* add test for checking compile on different shapes.

* update

* update

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

a185e1ab

[rfc][compile] compile method for DiffusionPipeline (#11705) · d93381cd

Animesh Jain authored Jun 25, 2025



* [rfc][compile] compile method for DiffusionPipeline

* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Apply style fixes

* Update docs/source/en/optimization/fp16.md

* check

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

d93381cd

20 Jun, 2025 2 commits
- [docs] device_map (#11711) · 6184d8a4
  Steven Liu authored Jun 20, 2025
```
draft
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
```
  6184d8a4
- [docs] Quantization + torch.compile + offloading (#11703) · 5a6e3864
  Steven Liu authored Jun 20, 2025
```
* draft

* feedback

* update

* feedback

* fix

* feedback

* feedback

* fix

* feedback
```
  5a6e3864
19 Jun, 2025 2 commits

make group offloading work with disk/nvme transfers (#11682) · 85a916bb

Sayak Paul authored Jun 19, 2025

* start implementing disk offloading in group.

* delete diff file.

* updates.patch

* offload_to_disk_path

* check if safetensors already exist.

* add test and clarify.

* updates

* update todos.

* update more docs.

* update docs

85a916bb

Update more licenses to 2025 (#11746) · a4df8dbc
Aryan authored Jun 19, 2025
```
update
```
a4df8dbc

16 Jun, 2025 1 commit

Add Pruna optimization framework documentation (#11688) · 9b834f87

David Berenstein authored Jun 16, 2025

* Add Pruna optimization framework documentation

- Introduced a new section for Pruna in the table of contents.
- Added comprehensive documentation for Pruna, detailing its optimization techniques, installation instructions, and examples for optimizing and evaluating models

* Enhance Pruna documentation with image alt text and code block formatting

- Added alt text to images for better accessibility and context.
- Changed code block syntax from diff to python for improved clarity.

* Add installation section to Pruna documentation

- Introduced a new installation section in the Pruna documentation to guide users on how to install the framework.
- Enhanced the overall clarity and usability of the documentation for new users.

* Update pruna.md

* Update Pruna documentation for model optimization and evaluation

- Changed section titles for consistency and clarity, from "Optimizing models" to "Optimize models" and "Evaluating and benchmarking optimized models" to "Evaluate and benchmark models".
- Enhanced descriptions to clarify the use of `diffusers` models and the evaluation process.
- Added a new example for evaluating standalone `diffusers` models.
- Updated references and links for better navigation within the documentation.

* Refactor Pruna documentation for clarity and consistency

- Removed outdated references to FLUX-juiced and streamlined the explanation of benchmarking.
- Enhanced the description of evaluating standalone `diffusers` models.
- Cleaned up code examples by removing unnecessary imports and comments for better readability.

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Enhance Pruna documentation with new examples and clarifications

- Added an image to illustrate the optimization process.
- Updated the explanation for sharing and loading optimized models on the Hugging Face Hub.
- Clarified the evaluation process for optimized models using the EvaluationAgent.
- Improved descriptions for defining metrics and evaluating standalone diffusers models.

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

9b834f87

02 Jun, 2025 1 commit
- [docs] Caching methods (#11625) · 9f48394b
  Steven Liu authored Jun 02, 2025
```
* cache

* feedback
```
  9f48394b
28 May, 2025 1 commit

[docs] PyTorch 2.0 (#11618) · be2fb77d

Steven Liu authored May 28, 2025



* combine

* Update docs/source/en/optimization/fp16.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

be2fb77d

23 May, 2025 1 commit

Update Intel Gaudi doc (#11479) · f161e277

regisss authored May 23, 2025


Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

f161e277

19 May, 2025 2 commits

Use HF Papers (#11567) · c8bb1ff5

Quentin Gallouédec authored May 19, 2025



* Use HF Papers

* Apply style fixes

---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

c8bb1ff5

[docs] tip for group offloding + quantization (#11576) · 6918f6d1

Sayak Paul authored May 19, 2025



* tip for group offloding + quantization
Co-authored-by: Aryan VS <contact.aryanvs@gmail.com>

* Apply suggestions from code review
Co-authored-by: Aryan <aryan@huggingface.co>

---------
Co-authored-by: Aryan VS <contact.aryanvs@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>

6918f6d1

15 May, 2025 1 commit

[docs] Regional compilation docs (#11556) · 9836f0e0

Sayak Paul authored May 15, 2025



* add regional compilation docs.

* minor.

* reviwer feedback.

* Update docs/source/en/optimization/torch2.0.md
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>

---------
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>

9836f0e0

01 May, 2025 1 commit

[docs] Memory optims (#11385) · b848d479

Steven Liu authored May 01, 2025

* reformat

* initial

* fin

* review

* inference

* feedback

* feedback

* feedback

b848d479

08 Apr, 2025 2 commits

[feat] implement `record_stream` when using CUDA streams during group offloading (#11081) · 4b27c4a4

Sayak Paul authored Apr 08, 2025



* implement record_stream for better performance.

* fix

* style.

* merge #11097

* Update src/diffusers/hooks/group_offloading.py
Co-authored-by: Aryan <aryan@huggingface.co>

* fixes

* docstring.

* remaining todos in low_cpu_mem_usage

* tests

* updates to docs.

---------
Co-authored-by: Aryan <aryan@huggingface.co>

4b27c4a4

[docs] MPS update (#11212) · fc7a867a
Steven Liu authored Apr 07, 2025
```
mps
```
fc7a867a

24 Mar, 2025 1 commit

Improve information about group offloading and layerwise casting (#11101) · 1ddf3f3a

Aryan authored Mar 24, 2025



* update

* Update docs/source/en/optimization/memory.md

* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* apply review suggestions

* update

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

1ddf3f3a

14 Feb, 2025 1 commit

Module Group Offloading (#10503) · 9a147b82

Aryan authored Feb 14, 2025



* update

* fix

* non_blocking; handle parameters and buffers

* update

* Group offloading with cuda stream prefetching (#10516)

* cuda stream prefetch

* remove breakpoints

* update

* copy model hook implementation from pab

* update; ~very workaround based implementation but it seems to work as expected; needs cleanup and rewrite

* more workarounds to make it actually work

* cleanup

* rewrite

* update

* make sure to sync current stream before overwriting with pinned params

not doing so will lead to erroneous computations on the GPU and cause bad results

* better check

* update

* remove hook implementation to not deal with merge conflict

* re-add hook changes

* why use more memory when less memory do trick

* why still use slightly more memory when less memory do trick

* optimise

* add model tests

* add pipeline tests

* update docs

* add layernorm and groupnorm

* address review comments

* improve tests; add docs

* improve docs

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply suggestions from code review

* update tests

* apply suggestions from review

* enable_group_offloading -> enable_group_offload for naming consistency

* raise errors if multiple offloading strategies used; add relevant tests

* handle .to() when group offload applied

* refactor some repeated code

* remove unintentional change from merge conflict

* handle .cuda()

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

9a147b82

23 Jan, 2025 1 commit
- [docs] fix image path in para attention docs (#10632) · d77c53b6
  Sayak Paul authored Jan 23, 2025
```
fix image path in para attention docs
```
  d77c53b6
22 Jan, 2025 1 commit

[core] Layerwise Upcasting (#10347) · beacaa55

Aryan authored Jan 22, 2025



* update

* update

* make style

* remove dynamo disable

* add coauthor
Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* update

* update

* update

* update mixin

* add some basic tests

* update

* update

* non_blocking

* improvements

* update

* norm.* -> norm

* apply suggestions from review

* add example

* update hook implementation to the latest changes from pyramid attention broadcast

* deinitialize should raise an error

* update doc page

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update docs

* update

* refactor

* fix _always_upcast_modules for asym ae and vq_model

* fix lumina embedding forward to not depend on weight dtype

* refactor tests

* add simple lora inference tests

* _always_upcast_modules -> _precision_sensitive_module_patterns

* remove todo comments about review; revert changes to self.dtype in unets because .dtype on ModelMixin should be able to handle fp8 weight case

* check layer dtypes in lora test

* fix UNet1DModelTests::test_layerwise_upcasting_inference

* _precision_sensitive_module_patterns -> _skip_layerwise_casting_patterns based on feedback

* skip test in NCSNppModelTests

* skip tests for AutoencoderTinyTests

* skip tests for AutoencoderOobleckTests

* skip tests for UNet1DModelTests - unsupported pytorch operations

* layerwise_upcasting -> layerwise_casting

* skip tests for UNetRLModelTests; needs next pytorch release for currently unimplemented operation support

* add layerwise fp8 pipeline test

* use xfail

* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* add assertion with fp32 comparison; add tolerance to fp8-fp32 vs fp32-fp32 comparison (required for a few models' test to pass)

* add note about memory consumption on tesla CI runner for failing test

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

beacaa55

16 Jan, 2025 1 commit

[Docs] Add documentation about using ParaAttention to optimize FLUX and HunyuanVideo (#10544) · 17d99c4d

C authored Jan 17, 2025



* add para_attn_flux.md and para_attn_hunyuan_video.md

* add enable_sequential_cpu_offload in para_attn_hunyuan_video.md

* add comment

* refactor

* fix

* fix

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix

* update links

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

17d99c4d

25 Oct, 2024 1 commit

Add a doc for AWS Neuron in Diffusers (#9766) · 52d44498

Jingya HUANG authored Oct 25, 2024



* start draft

* add doc

* Update docs/source/en/optimization/neuron.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/neuron.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/neuron.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/neuron.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/neuron.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/neuron.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/neuron.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* bref intro of ON

* Update docs/source/en/optimization/neuron.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

52d44498

12 Oct, 2024 1 commit

[docs] Fix xDiT doc image damage (#9655) · 6a5f0648

Jinzhe Pan authored Oct 12, 2024



* docs: fix xDiT doc image damage

* doc: move xdit images to hf dataset

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

6a5f0648

23 Sep, 2024 1 commit
- [Doc] Fix path and and also import imageio (#9506) · 2b5bc5be
  LukeLin authored Sep 23, 2024
```
* Fix bug

* import imageio
```
  2b5bc5be
16 Sep, 2024 1 commit

[docs] Replace runwayml/stable-diffusion-v1-5 with Lykon/dreamshaper-8 (#9428) · b52119ae

suzukimain authored Sep 17, 2024



* [docs] Replace runwayml/stable-diffusion-v1-5 with Lykon/dreamshaper-8

Updated documentation as runwayml/stable-diffusion-v1-5 has been removed from Huggingface.

* Update docs/source/en/using-diffusers/inpaint.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Replace with stable-diffusion-v1-5/stable-diffusion-v1-5

* Update inpaint.md

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

b52119ae