Commits · aafed3f8dd042bfe786f6c3e902c5cdb5de1fb08 · renzhc / diffusers_dcu

16 Dec, 2024 1 commit

[core] Hunyuan Video (#10136) · aace1f41

Aryan authored Dec 16, 2024



* copy transformer

* copy vae

* copy pipeline

* make fix-copies

* refactor; make original code work with diffusers; test latents for comparison generated with this commit

* move rope into pipeline; remove flash attention; refactor

* begin conversion script

* make style

* refactor attention

* refactor

* refactor final layer

* their mlp -> our feedforward

* make style

* add docs

* refactor layer names

* refactor modulation

* cleanup

* refactor norms

* refactor activations

* refactor single blocks attention

* refactor attention processor

* make style

* cleanup a bit

* refactor double transformer block attention

* update mochi attn proc

* use diffusers attention implementation in all modules; checkpoint for all values matching original

* remove helper functions in vae

* refactor upsample

* refactor causal conv

* refactor resnet

* refactor

* refactor

* refactor

* grad checkpointing

* autoencoder test

* fix scaling factor

* refactor clip

* refactor llama text encoding

* add coauthor
Co-Authored-By: "Gregory D. Hunkins" <greg@ollano.com>

* refactor rope; diff: 0.14990234375; reason and fix: create rope grid on cpu and move to device

Note: The following line diverges from original behaviour. We create the grid on the device, whereas
original implementation creates it on CPU and then moves it to device. This results in numerical
differences in layerwise debugging outputs, but visually it is the same.

* use diffusers timesteps embedding; diff: 0.10205078125

* rename

* convert

* update

* add tests for transformer

* add pipeline tests; text encoder 2 is not optional

* fix attention implementation for torch

* add example

* update docs

* update docs

* apply suggestions from review

* refactor vae

* update

* Apply suggestions from code review
Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py
Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py
Co-authored-by: hlky <hlky@hlky.ac>

* make fix-copies

* update

---------
Co-authored-by: "Gregory D. Hunkins" <greg@ollano.com>
Co-authored-by: hlky <hlky@hlky.ac>

aace1f41

15 Dec, 2024 1 commit

[Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`,... · 5a196e3d

Junsong Chen authored Dec 16, 2024


[Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on. (#9982)

* first add a script for DC-AE;

* DC-AE init

* replace triton with custom implementation

* 1. rename file and remove un-used codes;

* no longer rely on omegaconf and dataclass

* replace custom activation with diffuers activation

* remove dc_ae attention in attention_processor.py

* iinherit from ModelMixin

* inherit from ConfigMixin

* dc-ae reduce to one file

* update downsample and upsample

* clean code

* support DecoderOutput

* remove get_same_padding and val2tuple

* remove autocast and some assert

* update ResBlock

* remove contents within super().__init__

* Update src/diffusers/models/autoencoders/dc_ae.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove opsequential

* update other blocks to support the removal of build_norm

* remove build encoder/decoder project in/out

* remove inheritance of RMSNorm2d from LayerNorm

* remove reset_parameters for RMSNorm2d
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove device and dtype in RMSNorm2d __init__
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/dc_ae.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/dc_ae.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/dc_ae.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove op_list & build_block

* remove build_stage_main

* change file name to autoencoder_dc

* move LiteMLA to attention.py

* align with other vae decode output;

* add DC-AE into init files;

* update

* make quality && make style;

* quick push before dgx disappears again

* update

* make style

* update

* update

* fix

* refactor

* refactor

* refactor

* update

* possibly change to nn.Linear

* refactor

* make fix-copies

* replace vae with ae

* replace get_block_from_block_type to get_block

* replace downsample_block_type from Conv to conv for consistency

* add scaling factors

* incorporate changes for all checkpoints

* make style

* move mla to attention processor file; split qkv conv to linears

* refactor

* add tests

* from original file loader

* add docs

* add standard autoencoder methods

* combine attention processor

* fix tests

* update

* minor fix

* minor fix

* minor fix & in/out shortcut rename

* minor fix

* make style

* fix paper link

* update docs

* update single file loading

* make style

* remove single file loading support; todo for DN6

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add abstract

* 1. add DCAE into diffusers;
2. make style and make quality;

* add DCAE_HF into diffusers;

* bug fixed;

* add SanaPipeline, SanaTransformer2D into diffusers;

* add sanaLinearAttnProcessor2_0;

* first update for SanaTransformer;

* first update for SanaPipeline;

* first success run SanaPipeline;

* model output finally match with original model with the same intput;

* code update;

* code update;

* add a flow dpm-solver scripts

* 🎉[important update]
1. Integrate flow-dpm-sovler into diffusers;
2. finally run successfully on both `FlowMatchEulerDiscreteScheduler` and `FlowDPMSolverMultistepScheduler`;

* 🎉🔧

[important update & fix huge bugs!!]
1. add SanaPAGPipeline & several related Sana linear attention operators;
2. `SanaTransformer2DModel` not supports multi-resolution input;
2. fix the multi-scale HW bugs in SanaPipeline and SanaPAGPipeline;
3. fix the flow-dpm-solver set_timestep() init `model_output` and `lower_order_nums` bugs;

* remove prints;

* add convert sana official checkpoint to diffusers format Safetensor.

* Update src/diffusers/models/transformers/sana_transformer_2d.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/models/transformers/sana_transformer_2d.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/models/transformers/sana_transformer_2d.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/pipelines/pag/pipeline_pag_sana.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/models/transformers/sana_transformer_2d.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/models/transformers/sana_transformer_2d.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/pipelines/sana/pipeline_sana.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/pipelines/sana/pipeline_sana.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update Sana for DC-AE's recent commit;

* make style && make quality

* Add StableDiffusion3PAGImg2Img Pipeline + Fix SD3 Unconditional PAG (#9932)

* fix progress bar updates in SD 1.5 PAG Img2Img pipeline

---------
Co-authored-by: Vinh H. Pham <phamvinh257@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* make the vae can be None in `__init__` of `SanaPipeline`

* Update src/diffusers/models/transformers/sana_transformer_2d.py
Co-authored-by: hlky <hlky@hlky.ac>

* change the ae related code due to the latest update of DCAE branch;

* change the ae related code due to the latest update of DCAE branch;

* 1. change code based on AutoencoderDC;
2. fix the bug of new GLUMBConv;
3. run success;

* update for solving conversation.

* 1. fix bugs and run convert script success;
2. Downloading ckpt from hub automatically;

* make style && make quality;

* 1. remove un-unsed parameters in init;
2. code update;

* remove test file

* refactor; add docs; add tests; update conversion script

* make style

* make fix-copies

* refactor

* udpate pipelines

* pag tests and refactor

* remove sana pag conversion script

* handle weight casting in conversion script

* update conversion script

* add a processor

* 1. add bf16 pth file path;
2. add complex human instruct in pipeline;

* fix fast \tests

* change gemma-2-2b-it ckpt to a non-gated repo;

* fix the pth path bug in conversion script;

* change grad ckpt to original; make style

* fix the complex_human_instruct bug and typo;

* remove dpmsolver flow scheduler

* apply review suggestions

* change the `FlowMatchEulerDiscreteScheduler` to default `DPMSolverMultistepScheduler` with flow matching scheduler.

* fix the tokenizer.padding_side='right' bug;

* update docs

* make fix-copies

* fix imports

* fix docs

* add integration test

* update docs

* update examples

* fix convert_model_output in schedulers

* fix failing tests

---------
Co-authored-by: Junyu Chen <chenjydl2003@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: chenjy2003 <70215701+chenjy2003@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: hlky <hlky@hlky.ac>

5a196e3d

12 Dec, 2024 1 commit

[core] LTX Video (#10021) · 96c376a5

Aryan authored Dec 12, 2024



* transformer

* make style & make fix-copies

* transformer

* add transformer tests

* 80% vae

* make style

* make fix-copies

* fix

* undo cogvideox changes

* update

* update

* match vae

* add docs

* t2v pipeline working; scheduler needs to be checked

* docs

* add pipeline test

* update

* update

* make fix-copies

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update

* copy t2v to i2v pipeline

* update

* apply review suggestions

* update

* make style

* remove framewise encoding/decoding

* pack/unpack latents

* image2video

* update

* make fix-copies

* update

* update

* rope scale fix

* debug layerwise code

* remove debug

* Apply suggestions from code review
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* propagate precision changes to i2v pipeline

* remove downcast

* address review comments

* fix comment

* address review comments

* [Single File] LTX support for loading original weights (#10135)

* from original file mixin for ltx

* undo config mapping fn changes

* update

* add single file to pipelines

* update docs

* Update src/diffusers/models/autoencoders/autoencoder_kl_ltx.py

* Update src/diffusers/models/autoencoders/autoencoder_kl_ltx.py

* rename classes based on ltx review

* point to original repository for inference

* make style

* resolve conflicts correctly

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>

96c376a5

11 Dec, 2024 1 commit
- Add ControlNetUnion (#10131) · 914a585b
  hlky authored Dec 11, 2024
```
* ControlNetUnion model
```
  914a585b
06 Dec, 2024 1 commit

[DC-AE] Add the official Deep Compression Autoencoder code(32x,64x,128x compression ratio); (#9708) · cd892041

Junsong Chen authored Dec 07, 2024



* first add a script for DC-AE;

* DC-AE init

* replace triton with custom implementation

* 1. rename file and remove un-used codes;

* no longer rely on omegaconf and dataclass

* replace custom activation with diffuers activation

* remove dc_ae attention in attention_processor.py

* iinherit from ModelMixin

* inherit from ConfigMixin

* dc-ae reduce to one file

* update downsample and upsample

* clean code

* support DecoderOutput

* remove get_same_padding and val2tuple

* remove autocast and some assert

* update ResBlock

* remove contents within super().__init__

* Update src/diffusers/models/autoencoders/dc_ae.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove opsequential

* update other blocks to support the removal of build_norm

* remove build encoder/decoder project in/out

* remove inheritance of RMSNorm2d from LayerNorm

* remove reset_parameters for RMSNorm2d
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove device and dtype in RMSNorm2d __init__
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/dc_ae.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/dc_ae.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/dc_ae.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove op_list & build_block

* remove build_stage_main

* change file name to autoencoder_dc

* move LiteMLA to attention.py

* align with other vae decode output;

* add DC-AE into init files;

* update

* make quality && make style;

* quick push before dgx disappears again

* update

* make style

* update

* update

* fix

* refactor

* refactor

* refactor

* update

* possibly change to nn.Linear

* refactor

* make fix-copies

* replace vae with ae

* replace get_block_from_block_type to get_block

* replace downsample_block_type from Conv to conv for consistency

* add scaling factors

* incorporate changes for all checkpoints

* make style

* move mla to attention processor file; split qkv conv to linears

* refactor

* add tests

* from original file loader

* add docs

* add standard autoencoder methods

* combine attention processor

* fix tests

* update

* minor fix

* minor fix

* minor fix & in/out shortcut rename

* minor fix

* make style

* fix paper link

* update docs

* update single file loading

* make style

* remove single file loading support; todo for DN6

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add abstract

---------
Co-authored-by: Junyu Chen <chenjydl2003@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: chenjy2003 <70215701+chenjy2003@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

cd892041

18 Nov, 2024 1 commit

Add server example (#9918) · c3c94fe7

Grant Sherrick authored Nov 18, 2024



* Add server example.

* Minor updates to README.

* Add fixes after local testing.

* Apply suggestions from code review

Updates to README from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* More doc updates.

* Maybe this will work to build the docs correctly?

* Fix style issues.

* Fix toc.

* Minor reformatting.

* Move docs to proper loc.

* Fix missing tick.

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Sync docs changes back to README.

* Very minor update to docs to add space.

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

c3c94fe7

05 Nov, 2024 1 commit

[core] Mochi T2V (#9769) · 3f329a42

Aryan authored Nov 05, 2024



* update

* udpate

* update transformer

* make style

* fix

* add conversion script

* update

* fix

* update

* fix

* update

* fixes

* make style

* update

* update

* update

* init

* update

* update

* add

* up

* up

* up

* update

* mochi transformer

* remove original implementation

* make style

* update inits

* update conversion script

* docs

* Update src/diffusers/pipelines/mochi/pipeline_mochi.py
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* Update src/diffusers/pipelines/mochi/pipeline_mochi.py
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* fix docs

* pipeline fixes

* make style

* invert sigmas in scheduler; fix pipeline

* fix pipeline num_frames

* flip proj and gate in swiglu

* make style

* fix

* make style

* fix tests

* latent mean and std fix

* update

* cherry-pick 1069d210e1b9e84a366cdc7a13965626ea258178

* remove additional sigma already handled by flow match scheduler

* fix

* remove hardcoded value

* replace conv1x1 with linear

* Update src/diffusers/pipelines/mochi/pipeline_mochi.py
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* framewise decoding and conv_cache

* make style

* Apply suggestions from code review

* mochi vae encoder changes

* rebase correctly

* Update scripts/convert_mochi_to_diffusers.py

* fix tests

* fixes

* make style

* update

* make style

* update

* add framewise and tiled encoding

* make style

* make original vae implementation behaviour the default; note: framewise encoding does not work

* remove framewise encoding implementation due to presence of attn layers

* fight test 1

* fight test 2

---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>

3f329a42

29 Oct, 2024 1 commit

[core] Allegro T2V (#9736) · 0d1d267b

Aryan authored Oct 29, 2024



* update

* refactor transformer part 1

* refactor part 2

* refactor part 3

* make style

* refactor part 4; modeling tests

* make style

* refactor part 5

* refactor part 6

* gradient checkpointing

* pipeline tests (broken atm)

* update

* add coauthor
Co-Authored-By: Huan Yang <hyang@fastmail.com>

* refactor part 7

* add docs

* make style

* add coauthor
Co-Authored-By: YiYi Xu <yixu310@gmail.com>

* make fix-copies

* undo unrelated change

* revert changes to embeddings, normalization, transformer

* refactor part 8

* make style

* refactor part 9

* make style

* fix

* apply suggestions from review

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update example

* remove attention mask for self-attention

* update

* copied from

* update

* update

---------
Co-authored-by: Huan Yang <hyang@fastmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

0d1d267b

25 Oct, 2024 1 commit

Add a doc for AWS Neuron in Diffusers (#9766) · 52d44498

Jingya HUANG authored Oct 25, 2024



* start draft

* add doc

* Update docs/source/en/optimization/neuron.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/neuron.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/neuron.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/neuron.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/neuron.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/neuron.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/neuron.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* bref intro of ON

* Update docs/source/en/optimization/neuron.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

52d44498

21 Oct, 2024 1 commit

[Quantization] Add quantization support for `bitsandbytes` (#9213) · b821f006

Sayak Paul authored Oct 21, 2024

* quantization config.

* fix-copies

* fix

* modules_to_not_convert

* add bitsandbytes utilities.

* make progress.

* fixes

* quality

* up

rotary embedding refactor 2: update comments, fix dtype for use_real=False (#9312)

fix notes and dtype

* minor

* up

* fix

* provide credits where due.

* make configurations work.

* fixes

* fix

* update_missing_keys

* fix

* make it work.

* fix

* provide credits to transformers.

* empty commit

* handle to() better.

* tests

* change to bnb from bitsandbytes

* fix tests

fix slow quality tests

SD3 remark

fix

complete int4 tests

add a readme to the test files.

add model cpu offload tests

warning test

* better safeguard.

* change merging status

* courtesy to transformers.

* move upper.

* better

* make the unused kwargs warning friendlier.

* harmonize changes with https://github.com/huggingface/transformers/pull/33122

* style

* trainin tests

* feedback part i.

* Add Flux inpainting and Flux Img2Img (#9135)

---------
Co-authored-by: yiyixuxu <yixu310@gmail.com>

Update `UNet2DConditionModel`'s error messages (#9230)

* refactor

[CI] Update Single file Nightly Tests (#9357)

* update

feedback.

improve README for flux dreambooth lora (#9290)

* improve readme

fix one uncaught deprecation warning for accessing vae_latent_channels in VaeImagePreprocessor (#9372)

deprecation warning vae_latent_channels

add mixed int8 tests and more tests to nf4.

[core] Freenoise memory improvements (#9262)

* update

* implement prompt interpolation

* make style

* resnet memory optimizations

* more memory optimizations; todo: refactor

* update

* update animatediff controlnet with latest changes

* refactor chunked inference changes

* remove print statements

* update

* chunk -> split

* remove changes from incorrect conflict resolution

* add explanation of SplitInferenceModule

* update docs

* Revert "update docs"

This reverts commit c55a50a271b2cefa8fe340a4f2a3ab9b9d374ec0.

* update docstring for freenoise split inference

* apply suggestions from review

* add tests

* apply suggestions from review

quantization docs.

docs.

* Revert "Add Flux inpainting and Flux Img2Img (#9135)"

This reverts commit 5799954dd4b3d753c7c1b8d722941350fe4f62ca.

* tests

* don

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* contribution guide.

* changes

* empty

* fix tests

* harmonize with https://github.com/huggingface/transformers/pull/33546

* numpy_cosine_distance

* config_dict modification.

* remove if config comment.

* note for load_state_dict changes.

* float8 check.

* quantizer.

* raise an error for non-True low_cpu_mem_usage values when using quant.

* low_cpu_mem_usage shenanigans when using fp32 modules.

* don't re-assign _pre_quantization_type.

* make comments clear.

* remove comments.

* handle mixed types better when moving to cpu.

* add tests to check if we're throwing warning rightly.

* better check.

* fix 8bit test_quality.

* handle dtype more robustly.

* better message when keep_in_fp32_modules.

* handle dtype casting.

* fix dtype checks in pipeline.

* fix warning message.

* Update src/diffusers/models/modeling_utils.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* mitigate the confusing cpu warning

---------
Co-authored-by: Vishnu V Jaddipal <95531133+Gothos@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>

b821f006

16 Oct, 2024 1 commit

Docs: CogVideoX (#9578) · 0d935df6

glide-the authored Oct 16, 2024



* CogVideoX docs


---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>

0d935df6

14 Oct, 2024 1 commit

CogView3Plus DiT (#9570) · 8d81564b

Yuxuan.Zhang authored Oct 14, 2024

* merge 9588

* max_shard_size="5GB" for colab running

* conversion script updates; modeling test; refactor transformer

* make fix-copies

* Update convert_cogview3_to_diffusers.py

* initial pipeline draft

* make style

* fight bugs 🐛

🪳

* add example

* add tests; refactor

* make style

* make fix-copies

* add co-author

YiYi Xu <yixu310@gmail.com>

* remove files

* add docs

* add co-author
Co-Authored-By: YiYi Xu <yixu310@gmail.com>

* fight docs

* address reviews

* make style

* make model work

* remove qkv fusion

* remove qkv fusion tets

* address review comments

* fix make fix-copies error

* remove None and TODO

* for FP16(draft)

* make style

* remove dynamic cfg

* remove pooled_projection_dim as a parameter

* fix tests

---------
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: YiYi Xu <yixu310@gmail.com>

8d81564b

25 Sep, 2024 1 commit
- [docs] Model sharding (#9521) · d9c96917
  Steven Liu authored Sep 25, 2024
```
* flux shard

* feedback
```
  d9c96917
09 Sep, 2024 1 commit

[docs] Add xDiT in section optimization (#9365) · 2c6a6c97

Jinzhe Pan authored Sep 10, 2024



* docs: add xDiT to optimization methods

* fix: picture layout problem

* docs: add more introduction about xdit & apply suggestions

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

2c6a6c97

25 Aug, 2024 1 commit
- [Flux] Support Union ControlNet (#9175) · c1e6a32a
  王奇勋 authored Aug 25, 2024
```
* refactor
---------
Co-authored-by: haofanwang <haofanwang.ai@gmail.com>
```
  c1e6a32a
08 Aug, 2024 1 commit
- [docs] Organize model toctree (#9118) · ba7e4845
  Steven Liu authored Aug 07, 2024
```
* toctree

* fix
```
  ba7e4845
07 Aug, 2024 1 commit

Add CogVideoX text-to-video generation model (#9082) · 2dad462d

zR authored Aug 07, 2024



* add CogVideoX

---------
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

2dad462d

06 Aug, 2024 1 commit
- [Docs] Add community projects section to docs (#9013) · 325a5de3
  Dhruv Nair authored Aug 06, 2024
```
* update

* update

* update
```
  325a5de3
05 Aug, 2024 1 commit
- [Docs] add stable cascade unet doc. (#9066) · 5934873b
  Sayak Paul authored Aug 05, 2024
```
* add stable cascade unet doc.

* fix path
```
  5934873b
01 Aug, 2024 1 commit

Flux pipeline (#9043) · 27637a54

Sayak Paul authored Aug 02, 2024



add flux!
Signed-off-by: Adrien <adrien@huggingface.co>
Co-authored-by: Adrien <adrien.69740@gmail.com>
Co-authored-by: Anatoly Belikov <abelikov@singularitynet.io>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>

27637a54

30 Jul, 2024 1 commit

Stable Audio integration (#8716) · 69e72b1d

Yoach Lacombe authored Jul 30, 2024



* WIP modeling code and pipeline

* add custom attention processor + custom activation + add to init

* correct ProjectionModel forward

* add stable audio to __initèè

* add autoencoder and update pipeline and modeling code

* add half Rope

* add partial rotary v2

* add temporary modfis to scheduler

* add EDM DPM Solver

* remove TODOs

* clean GLU

* remove att.group_norm to attn processor

* revert back src/diffusers/schedulers/scheduling_dpmsolver_multistep.py

* refactor GLU -> SwiGLU

* remove redundant args

* add channel multiples in autoencoder docstrings

* changes in docsrtings and copyright headers

* clean pipeline

* further cleaning

* remove peft and lora and fromoriginalmodel

* Delete src/diffusers/pipelines/stable_audio/diffusers.code-workspace

* make style

* dummy models

* fix copied from

* add fast oobleck tests

* add brownian tree

* oobleck autoencoder slow tests

* remove TODO

* fast stable audio pipeline tests

* add slow tests

* make style

* add first version of docs

* wrap is_torchsde_available to the scheduler

* fix slow test

* test with input waveform

* add input waveform

* remove some todos

* create stableaudio gaussian projection + make style

* add pipeline to toctree

* fix copied from

* make quality

* refactor timestep_features->time_proj

* refactor joint_attention_kwargs->cross_attention_kwargs

* remove forward_chunk

* move StableAudioDitModel to transformers folder

* correct convert + remove partial rotary embed

* apply suggestions from yiyixuxu -> removing attn.kv_heads

* remove temb

* remove cross_attention_kwargs

* further removal of cross_attention_kwargs

* remove text encoder autocast to fp16

* continue removing autocast

* make style

* refactor how text and audio are embedded

* add paper

* update example code

* make style

* unify projection model forward + fix device placement

* make style

* remove fuse qkv

* apply suggestions from review

* Update src/diffusers/pipelines/stable_audio/pipeline_stable_audio.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* make style

* smaller models in fast tests

* pass sequential offloading fast tests

* add docs for vae and autoencoder

* make style and update example

* remove useless import

* add cosine scheduler

* dummy classes

* cosine scheduler docs

* better description of scheduler

---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>

69e72b1d

26 Jul, 2024 1 commit

[core] AnimateDiff SparseCtrl (#8897) · 5c53ca5e

Aryan authored Jul 26, 2024

* initial sparse control model draft

* remove unnecessary implementation

* copy animatediff pipeline

* remove deprecated callbacks

* update

* update pipeline implementation progress

* make style

* make fix-copies

* update progress

* add partially working pipeline

* remove debug prints

* add model docs

* dummy objects

* improve motion lora conversion script

* fix bugs

* update docstrings

* remove unnecessary model params; docs

* address review comment

* add copied from to zero_module

* copy animatediff test

* add fast tests

* update docs

* update

* update pipeline docs

* fix expected slice values

* fix license

* remove get_down_block usage

* remove temporal_double_self_attention from get_down_block

* update

* update docs with org and documentation images

* make from_unet work in sparsecontrolnetmodel

* add latest freeinit test from #8969

* make fix-copies

* LoraLoaderMixin -> StableDiffsuionLoraLoaderMixin

5c53ca5e

18 Jul, 2024 1 commit

[docs] pipeline docs for latte (#8844) · 12625c1c

Aryan authored Jul 18, 2024

* add pipeline docs for latte

* add inference time to latte docs

* apply review suggestions

12625c1c

12 Jul, 2024 1 commit

[Docs] add AuraFlow docs (#8851) · 973a62d4

Sayak Paul authored Jul 12, 2024

* add pipeline documentation.

* add api spec for pipeline

* model documentation

* model spec

973a62d4

11 Jul, 2024 2 commits

[Core] Add Kolors (#8812) · 87b9db64
Álvaro Somoza authored Jul 11, 2024
```
* initial draft
```
87b9db64

Latte: Latent Diffusion Transformer for Video Generation (#8404) · b8cf84a3

Xin Ma authored Jul 11, 2024



* add Latte to diffusers

* remove print

* remove print

* remove print

* remove unuse codes

* remove layer_norm_latte and add a flag

* remove layer_norm_latte and add a flag

* update latte_pipeline

* update latte_pipeline

* remove unuse squeeze

* add norm_hidden_states.ndim == 2: # for Latte

* fixed test latte pipeline bugs

* fixed test latte pipeline bugs

* delete sh

* add doc for latte

* add licensing

* Move Transformer3DModelOutput to modeling_outputs

* give a default value to sample_size

* remove the einops dependency

* change norm2 for latte

* modify pipeline of latte

* update test for Latte

* modify some codes for latte

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* video_length -> num_frames; update prepare_latents copied from

* make fix-copies

* make style

* typo: videe -> video

* update

* modify for Latte pipeline

* modify latte pipeline

* modify latte pipeline

* modify latte pipeline

* modify latte pipeline

* modify for Latte pipeline

* Delete .vscode directory

* make style

* make fix-copies

* add latte transformer 3d to docs _toctree.yml

* update example

* reduce frames for test

* fixed bug of _text_preprocessing

* set num frame to 1 for testing

* remove unuse print

* add text = self._clean_caption(text) again

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>

b8cf84a3

08 Jul, 2024 1 commit

[Alpha-VLLM Team] Add Lumina-T2X to diffusers (#8652) · 98388670

PommesPeter authored Jul 08, 2024




---------
Co-authored-by: zhuole1025 <zhuole1025@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>

98388670

26 Jun, 2024 2 commits

[Tencent Hunyuan Team] Add Hunyuan-DiT ControlNet Inference (#8694) · fa2abfdb

XCL authored Jun 26, 2024



* add controlnet support

---------
Co-authored-by: xingchaoliu <xingchaoliu@tencent.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>

fa2abfdb

add docs on model sharding (#8658) · e8284281

Sayak Paul authored Jun 26, 2024



* add docs on model sharding

* add entry to _toctree.

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* simplify wording

* add a note on transformer library handling

* move device placement section

* Update docs/source/en/training/distributed_inference.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

e8284281

25 Jun, 2024 1 commit

add PAG support (#7944) · 540399f5

YiYi Xu authored Jun 25, 2024



* first draft


---------
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Junhwa Song <ethan9867@gmail.com>
Co-authored-by: Ahn Donghoon (안동훈 / suno) <suno.vivid@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

540399f5

19 Jun, 2024 1 commit
- Support SD3 ControlNet and Multi-ControlNet. (#8566) · e5564d45
  王奇勋 authored Jun 19, 2024
```
* sd3 controlnet



---------
Co-authored-by: haofanwang <haofanwang.ai@gmail.com>
```
  e5564d45
12 Jun, 2024 1 commit

Add Stable Diffusion 3 (#8483) · 04717fd8

Dhruv Nair authored Jun 13, 2024



* up

* add sd3

* update

* update

* add tests

* fix copies

* fix docs

* update

* add dreambooth lora

* add LoRA

* update

* update

* update

* update

* import fix

* update

* Update src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* import fix 2

* update

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* update

* update

* update

* fix ckpt id

* fix more ids

* update

* missing doc

* Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update'

* fix

* update

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

* note on gated access.

* requirements

* licensing

---------
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>

04717fd8

04 Jun, 2024 2 commits

[HunyuanDiT] minor docs changes in hunyuandit (#8395) · 3ff39e8e
Sayak Paul authored Jun 04, 2024
```
minor docs changes in hunyuandit
```
3ff39e8e

Update transformer2d.md title (#8375) · dc89434b

Marçal Comajoan Cara authored Jun 03, 2024

* Update transformer2d.md title

For the other classes (e.g., UNet2DModel) the title of the documentation coincides with the name of the class, but that was not the case for Transformer2DModel.

* Update model docs titles for consistency with class names

dc89434b

03 Jun, 2024 1 commit

Tencent Hunyuan Team - Updated Doc for HunyuanDiT (#8383) · 174cf868

XCL authored Jun 03, 2024

* add hunyuandit doc

* update hunyuandit doc

* update hunyuandit 2d model

* update toctree.yml for hunyuandit

174cf868

31 May, 2024 1 commit

[Core] Introduce class variants for `Transformer2DModel` (#7647) · 983dec3b

Sayak Paul authored May 31, 2024

* init for patches

* finish patched model.

* continuous transformer

* vectorized transformer2d.

* style.

* inits.

* fix-copies.

* introduce DiTTransformer2DModel.

* fixes

* use REMAPPING as suggested by @DN6

* better logging.

* add pixart transformer model.

* inits.

* caption_channels.

* attention masking.

* fix use_additional_conditions.

* remove print.

* debug

* flatten

* fix: assertion for sigma

* handle remapping for modeling_utils

* add tests for dit transformer2d

* quality

* placeholder for pixart tests

* pixart tests

* add _no_split_modules

* add docs.

* check

* check

* check

* check

* fix tests

* fix tests

* move Transformer output to modeling_output

* move errors better and bring back use_additional_conditions attribute.

* add unnecessary things from DiT.

* clean up pixart

* fix remapping

* fix device_map things in pixart2d.

* replace Transformer2DModel with appropriate classes in dit, pixart tests

* empty

* legacy mixin classes./

* use a remapping dict for fetching class names.

* change to specifc model types in the pipeline implementations.

* move _fetch_remapped_cls_from_config to modeling_loading_utils.py

* fix dependency problems.

* add deprecation note.

983dec3b

29 May, 2024 1 commit
- [docs] Files and formats (#7874) · 9e00b727
  Steven Liu authored May 29, 2024
```
* files and formats

* fix callout

* feedback

* code sample

* feedback
```
  9e00b727
28 May, 2024 2 commits
- [docs] Outpaint (#7964) · 1fa8dbc6
  Steven Liu authored May 28, 2024
```
* first draft

* edits
```
  1fa8dbc6
- [docs] Scheduler features (#7990) · 0ab6dc0f
  Steven Liu authored May 28, 2024
```
* noise schedule

* sigmas and zero snr

* feedback

* feedback
```
  0ab6dc0f
27 May, 2024 1 commit

[Pipeline] Marigold depth and normals estimation (#7847) · b3d10d6d

Anton Obukhov authored May 27, 2024



* implement marigold depth and normals pipelines in diffusers core

* remove bibtex

* remove deprecations

* remove save_memory argument

* remove validate_vae

* remove config output

* remove batch_size autodetection

* remove presets logic
move default denoising_steps and processing_resolution into the model config
make default ensemble_size 1

* remove no_grad

* add fp16 to the example usage

* implement is_matplotlib_available
use is_matplotlib_available, is_scipy_available for conditional imports in the marigold depth pipeline

* move colormap, visualize_depth, and visualize_normals into export_utils.py

* make the denoising loop more lucid
fix the outputs to always be 4d tensors or lists of pil images
support a 4d input_image case
attempt to support model_cpu_offload_seq
move check_inputs into a separate function
change default batch_size to 1, remove any logic to make it bigger implicitly

* style

* rename denoising_steps into num_inference_steps

* rename input_image into image

* rename input_latent into latents

* remove decode_image
change decode_prediction to use the AutoencoderKL.decode method

* move clean_latent outside of progress_bar

* refactor marigold-reusable image processing bits into MarigoldImageProcessor class

* clean up the usage example docstring

* make ensemble functions members of the pipelines

* add early checks in check_inputs
rename E into ensemble_size in depth ensembling

* fix vae_scale_factor computation

* better compatibility with torch.compile
better variable naming

* move export_depth_to_png to export_utils

* remove encode_prediction

* improve visualize_depth and visualize_normals to accept multi-dimensional data and lists
remove visualization functions from the pipelines
move exporting depth as 16-bit PNGs functionality from the depth pipeline
update example docstrings

* do not shortcut vae.config variables

* change all asserts to raise ValueError

* rename output_prediction_type to output_type

* better variable names
clean up variable deletion code

* better variable names

* pass desc and leave kwargs into the diffusers progress_bar
implement nested progress bar for images and steps loops

* implement scale_invariant and shift_invariant flags in the ensemble_depth function
add scale_invariant and shift_invariant flags readout from the model config
further refactor ensemble_depth
support ensembling without alignment
add ensemble_depth docstring

* fix generator device placement checks

* move encode_empty_text body into the pipeline call

* minor empty text encoding simplifications

* adjust pipelines' class docstrings to explain the added construction arguments

* improve the scipy failure condition
add comments
improve docstrings
change the default use_full_z_range to True

* make input image values range check configurable in the preprocessor
refactor load_image_canonical in preprocessor to reject unknown types and return the image in the expected 4D format of tensor and on right device
support a list of everything as inputs to the pipeline, change type to PipelineImageInput
implement a check that all input list elements have the same dimensions
improve docstrings of pipeline outputs
remove check_input pipeline argument

* remove forgotten print

* add prediction_type model config

* add uncertainty visualization into export utils
fix NaN values in normals uncertainties

* change default of output_uncertainty to False
better handle the case of an attempt to export or visualize none

* fix `output_uncertainty=False`

* remove kwargs
fix check_inputs according to the new inputs of the pipeline

* rename prepare_latent into prepare_latents as in other pipelines
annotate prepare_latents in normals pipeline with "Copied from"
annotate encode_image in normals pipeline with "Copied from"

* move nested-capable `progress_bar` method into the pipelines
revert the original `progress_bar` method in pipeline_utils

* minor message improvement

* fix cpu offloading

* move colormap, visualize_depth, export_depth_to_16bit_png, visualize_normals, visualize_uncertainty to marigold_image_processing.py
update example docstrings

* fix missing comma

* change torch.FloatTensor to torch.Tensor

* fix importing of MarigoldImageProcessor

* fix vae offloading
fix batched image encoding
remove separate encode_image function and use vae.encode instead

* implement marigold's intial tests
relax generator checks in line with other pipelines
implement return_dict __call__ argument in line with other pipelines

* fix num_images computation

* remove MarigoldImageProcessor and outputs from import structure
update tests

* update docstrings

* update init

* update

* style

* fix

* fix

* up

* up

* up

* add simple test

* up

* update expected np input/output to be channel last

* move expand_tensor_or_array into the MarigoldImageProcessor

* rewrite tests to follow conventions - hardcoded slices instead of image artifacts
write more smoke tests

* add basic docs.

* add anton's contribution statement

* remove todos.

* fix assertion values for marigold depth slow tests

* fix assertion values for depth normals.

* remove print

* support AutoencoderTiny in the pipelines

* update documentation page
add Available Pipelines section
add Available Checkpoints section
add warning about num_inference_steps

* fix missing import in docstring
fix wrong value in visualize_depth docstring

* [doc] add marigold to pipelines overview

* [doc] add section "usage examples"

* fix an issue with latents check in the pipelines

* add "Frame-by-frame Video Processing with Consistency" section

* grammarly

* replace tables with images with css-styled images (blindly)

* style

* print

* fix the assertions.

* take from the github runner.

* take the slices from action artifacts

* style.

* update with the slices from the runner.

* remove unnecessary code blocks.

* Revert "[doc] add marigold to pipelines overview"

This reverts commit a505165150afd8dab23c474d1a054ea505a56a5f.

* remove invitation for new modalities

* split out marigold usage examples

* doc cleanup

---------
Co-authored-by: yiyixuxu <yixu310@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>

b3d10d6d