Commits · 0454fbb30bfbe21aa4ea29c827c396bac57dc518 · renzhc / diffusers_dcu

08 Jul, 2025 1 commit

Aryan authored Jul 09, 2025



* update

* modify flux single blocks to make compatible with cache techniques (without too much model-specific intrusion code)

* remove debug logs

* update

* cache context for different batches of data

* fix hs residual bug for single return outputs; support ltx

* fix controlnet flux

* support flux, ltx i2v, ltx condition

* update

* update

* Update docs/source/en/api/cache.md

* Update src/diffusers/hooks/hooks.py
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* address review comments pt. 1

* address review comments pt. 2

* cache context refacotr; address review pt. 3

* address review comments

* metadata registration with decorators instead of centralized

* support cogvideox

* support mochi

* fix

* remove unused function

* remove central registry based on review

* update

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

0454fbb3

24 Jun, 2025 1 commit
- [tests] Fix HunyuanVideo Framepack device tests (#11789) · 474a248f
  Aryan authored Jun 24, 2025
```
update
```
  474a248f
19 Jun, 2025 1 commit
- Update more licenses to 2025 (#11746) · a4df8dbc
  Aryan authored Jun 19, 2025
```
update
```
  a4df8dbc
06 May, 2025 1 commit

Hunyuan Video Framepack (#11428) · d7ffe601

Aryan authored May 06, 2025

* add transformer

* add pipeline

* fixes

* make fix-copies

* update

* add flux mu shift

* update example snippet

* debug

* cleanup

* batch_size=1 optimization

* add pipeline test

* fix for model cpu offloading'

* add last_image support; credits: https://github.com/lllyasviel/FramePack/pull/167

* update example with flf2v

* update penguin url

* fix test

* address review comment: https://github.com/huggingface/diffusers/pull/11428#discussion_r2071032371

* address review comment: https://github.com/huggingface/diffusers/pull/11428#discussion_r2071087689



* Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video_framepack.py

---------
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>

d7ffe601

16 Apr, 2025 1 commit
- Hunyuan I2V fast tests fix (#11341) · 59f1b7b1
  Dhruv Nair authored Apr 16, 2025
```
* update

* update
```
  59f1b7b1
24 Mar, 2025 1 commit

New HunyuanVideo-I2V (#11066) · 8907a70a

Aryan authored Mar 24, 2025

* update

* update

* update

* add tests

* update docs

* raise value error

* warning for true cfg and guidance scale

* fix test

8907a70a

21 Mar, 2025 1 commit

[core] FasterCache (#10163) · 844221ae

Aryan authored Mar 21, 2025



* init

* update

* update

* update

* make style

* update

* fix

* make it work with guidance distilled models

* update

* make fix-copies

* add tests

* update

* apply_faster_cache -> apply_fastercache

* fix

* reorder

* update

* refactor

* update docs

* add fastercache to CacheMixin

* update tests

* Apply suggestions from code review

* make style

* try to fix partial import error

* Apply style fixes

* raise warning

* update

---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

844221ae

07 Mar, 2025 1 commit

Hunyuan I2V (#10983) · 2e5203be

Aryan authored Mar 07, 2025

* update

* update

* update

* add tests

* update

* add model tests

* update docs

* update

* update example

* fix defaults

* update

2e5203be

21 Feb, 2025 1 commit

SkyReels Hunyuan T2V & I2V (#10837) · e3bc4aab

Aryan authored Feb 21, 2025



* update

* make fix-copies

* update

* tests

* update

* update

* add co-author
Co-Authored-By: Langdx <82783347+Langdx@users.noreply.github.com>

* add co-author
Co-Authored-By: howe <howezhang2018@gmail.com>

* update

---------
Co-authored-by: Langdx <82783347+Langdx@users.noreply.github.com>
Co-authored-by: howe <howezhang2018@gmail.com>

e3bc4aab

14 Feb, 2025 1 commit

Module Group Offloading (#10503) · 9a147b82

Aryan authored Feb 14, 2025



* update

* fix

* non_blocking; handle parameters and buffers

* update

* Group offloading with cuda stream prefetching (#10516)

* cuda stream prefetch

* remove breakpoints

* update

* copy model hook implementation from pab

* update; ~very workaround based implementation but it seems to work as expected; needs cleanup and rewrite

* more workarounds to make it actually work

* cleanup

* rewrite

* update

* make sure to sync current stream before overwriting with pinned params

not doing so will lead to erroneous computations on the GPU and cause bad results

* better check

* update

* remove hook implementation to not deal with merge conflict

* re-add hook changes

* why use more memory when less memory do trick

* why still use slightly more memory when less memory do trick

* optimise

* add model tests

* add pipeline tests

* update docs

* add layernorm and groupnorm

* address review comments

* improve tests; add docs

* improve docs

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply suggestions from code review

* update tests

* apply suggestions from review

* enable_group_offloading -> enable_group_offload for naming consistency

* raise errors if multiple offloading strategies used; add relevant tests

* handle .to() when group offload applied

* refactor some repeated code

* remove unintentional change from merge conflict

* handle .cuda()

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

9a147b82

29 Jan, 2025 1 commit
- [tests] update llamatokenizer in hunyuanvideo tests (#10681) · e6037e82
  Sayak Paul authored Jan 29, 2025
```
update llamatokenizer in hunyuanvideo tests
```
  e6037e82
27 Jan, 2025 1 commit

[core] Pyramid Attention Broadcast (#9562) · 658e24e8

Aryan authored Jan 28, 2025



* start pyramid attention broadcast

* add coauthor
Co-Authored-By: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>

* update

* make style

* update

* make style

* add docs

* add tests

* update

* Update docs/source/en/api/pipelines/cogvideox.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/pipelines/cogvideox.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Pyramid Attention Broadcast rewrite + introduce hooks (#9826)

* rewrite implementation with hooks

* make style

* update

* merge pyramid-attention-rewrite-2

* make style

* remove changes from latte transformer

* revert docs changes

* better debug message

* add todos for future

* update tests

* make style

* cleanup

* fix

* improve log message; fix latte test

* refactor

* update

* update

* update

* revert changes to tests

* update docs

* update tests

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update

* fix flux test

* reorder

* refactor

* make fix-copies

* update docs

* fixes

* more fixes

* make style

* update tests

* update code example

* make fix-copies

* refactor based on reviews

* use maybe_free_model_hooks

* CacheMixin

* make style

* update

* add current_timestep property; update docs

* make fix-copies

* update

* improve tests

* try circular import fix

* apply suggestions from review

* address review comments

* Apply suggestions from code review

* refactor hook implementation

* add test suite for hooks

* PAB Refactor (#10667)

* update

* update

* update

---------
Co-authored-by: DN6 <dhruv.nair@gmail.com>

* update

* fix remove hook behaviour

---------
Co-authored-by: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: DN6 <dhruv.nair@gmail.com>

658e24e8

22 Jan, 2025 1 commit

[core] Layerwise Upcasting (#10347) · beacaa55

Aryan authored Jan 22, 2025



* update

* update

* make style

* remove dynamo disable

* add coauthor
Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* update

* update

* update

* update mixin

* add some basic tests

* update

* update

* non_blocking

* improvements

* update

* norm.* -> norm

* apply suggestions from review

* add example

* update hook implementation to the latest changes from pyramid attention broadcast

* deinitialize should raise an error

* update doc page

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update docs

* update

* refactor

* fix _always_upcast_modules for asym ae and vq_model

* fix lumina embedding forward to not depend on weight dtype

* refactor tests

* add simple lora inference tests

* _always_upcast_modules -> _precision_sensitive_module_patterns

* remove todo comments about review; revert changes to self.dtype in unets because .dtype on ModelMixin should be able to handle fp8 weight case

* check layer dtypes in lora test

* fix UNet1DModelTests::test_layerwise_upcasting_inference

* _precision_sensitive_module_patterns -> _skip_layerwise_casting_patterns based on feedback

* skip test in NCSNppModelTests

* skip tests for AutoencoderTinyTests

* skip tests for AutoencoderOobleckTests

* skip tests for UNet1DModelTests - unsupported pytorch operations

* layerwise_upcasting -> layerwise_casting

* skip tests for UNetRLModelTests; needs next pytorch release for currently unimplemented operation support

* add layerwise fp8 pipeline test

* use xfail

* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* add assertion with fp32 comparison; add tolerance to fp8-fp32 vs fp32-fp32 comparison (required for a few models' test to pass)

* add note about memory consumption on tesla CI runner for failing test

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

beacaa55

16 Dec, 2024 1 commit

[core] Hunyuan Video (#10136) · aace1f41

Aryan authored Dec 16, 2024



* copy transformer

* copy vae

* copy pipeline

* make fix-copies

* refactor; make original code work with diffusers; test latents for comparison generated with this commit

* move rope into pipeline; remove flash attention; refactor

* begin conversion script

* make style

* refactor attention

* refactor

* refactor final layer

* their mlp -> our feedforward

* make style

* add docs

* refactor layer names

* refactor modulation

* cleanup

* refactor norms

* refactor activations

* refactor single blocks attention

* refactor attention processor

* make style

* cleanup a bit

* refactor double transformer block attention

* update mochi attn proc

* use diffusers attention implementation in all modules; checkpoint for all values matching original

* remove helper functions in vae

* refactor upsample

* refactor causal conv

* refactor resnet

* refactor

* refactor

* refactor

* grad checkpointing

* autoencoder test

* fix scaling factor

* refactor clip

* refactor llama text encoding

* add coauthor
Co-Authored-By: "Gregory D. Hunkins" <greg@ollano.com>

* refactor rope; diff: 0.14990234375; reason and fix: create rope grid on cpu and move to device

Note: The following line diverges from original behaviour. We create the grid on the device, whereas
original implementation creates it on CPU and then moves it to device. This results in numerical
differences in layerwise debugging outputs, but visually it is the same.

* use diffusers timesteps embedding; diff: 0.10205078125

* rename

* convert

* update

* add tests for transformer

* add pipeline tests; text encoder 2 is not optional

* fix attention implementation for torch

* add example

* update docs

* update docs

* apply suggestions from review

* refactor vae

* update

* Apply suggestions from code review
Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py
Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py
Co-authored-by: hlky <hlky@hlky.ac>

* make fix-copies

* update

---------
Co-authored-by: "Gregory D. Hunkins" <greg@ollano.com>
Co-authored-by: hlky <hlky@hlky.ac>

aace1f41