Commits · cbc8ced20f7a67254caac36af79e666ac2379833 · renzhc / diffusers_dcu

19 Jun, 2025 1 commit
- Update more licenses to 2025 (#11746) · a4df8dbc
  Aryan authored Jun 19, 2025
```
update
```
  a4df8dbc
19 May, 2025 1 commit

Quentin Gallouédec authored May 19, 2025



* Use HF Papers

* Apply style fixes

---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

c8bb1ff5

02 Apr, 2025 1 commit
- Fix enable_sequential_cpu_offload in CogView4Pipeline (#11195) · 54dac3a8
  hlky authored Apr 02, 2025
```
* Fix enable_sequential_cpu_offload in CogView4Pipeline

* make fix-copies
```
  54dac3a8
15 Mar, 2025 1 commit

CogView4 Control Block (#10809) · 82188cef

Yuxuan Zhang authored Mar 16, 2025




* cogview4 control training


---------
Co-authored-by: OleehyO <leehy0357@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>

82188cef

10 Mar, 2025 1 commit
- [LoRA] CogView4 (#10981) · 8eefed65
  Aryan authored Mar 10, 2025
```
* update

* make fix-copies

* update
```
  8eefed65
04 Mar, 2025 2 commits
- update check_input for cogview4 (#10966) · 24c062aa
  YiYi Xu authored Mar 04, 2025
```
fix
```
  24c062aa
- [Docs] CogView4 comment fix (#10957) · a74f02fb
  Yuxuan Zhang authored Mar 05, 2025
```
* Update pipeline_cogview4.py

* Use GLM instead of T5 in doc
```
  a74f02fb
03 Mar, 2025 1 commit
- Update pipeline_cogview4.py (#10944) · f92e599c
  Yuxuan Zhang authored Mar 04, 2025
  
  f92e599c
15 Feb, 2025 1 commit

CogView4 (supports different length c and uc) (#10649) · d90cd362

Yuxuan Zhang authored Feb 16, 2025



* init

* encode with glm

* draft schedule

* feat(scheduler): Add CogView scheduler implementation

* feat(embeddings): add CogView 2D rotary positional embedding

* 1

* Update pipeline_cogview4.py

* fix the timestep init and sigma

* update latent

* draft patch(not work)

* fix

* [WIP][cogview4]: implement initial CogView4 pipeline

Implement the basic CogView4 pipeline structure with the following changes:
- Add CogView4 pipeline implementation
- Implement DDIM scheduler for CogView4
- Add CogView3Plus transformer architecture
- Update embedding models

Current limitations:
- CFG implementation uses padding for sequence length alignment
- Need to verify transformer inference alignment with Megatron

TODO:
- Consider separate forward passes for condition/uncondition
  instead of padding approach

* [WIP][cogview4][refactor]: Split condition/uncondition forward pass in CogView4 pipeline

Split the forward pass for conditional and unconditional predictions in the CogView4 pipeline to match the original implementation. The noise prediction is now done separately for each case before combining them for guidance. However, the results still need improvement.

This is a work in progress as the generated images are not yet matching expected quality.

* use with -2 hidden state

* remove text_projector

* 1

* [WIP] Add tensor-reload to align input from transformer block

* [WIP] for older glm

* use with cogview4 transformers forward twice of u and uc

* Update convert_cogview4_to_diffusers.py

* remove this

* use main example

* change back

* reset

* setback

* back

* back 4

* Fix qkv conversion logic for CogView4 to Diffusers format

* back5

* revert to sat to cogview4 version

* update a new convert from megatron

* [WIP][cogview4]: implement CogView4 attention processor

Add CogView4AttnProcessor class for implementing scaled dot-product attention
with rotary embeddings for the CogVideoX model. This processor concatenates
encoder and hidden states, applies QKV projections and RoPE, but does not
include spatial normalization.

TODO:
- Fix incorrect QKV projection weights
- Resolve ~25% error in RoPE implementation compared to Megatron

* [cogview4] implement CogView4 transformer block

Implement CogView4 transformer block following the Megatron architecture:
- Add multi-modulate and multi-gate mechanisms for adaptive layer normalization
- Implement dual-stream attention with encoder-decoder structure
- Add feed-forward network with GELU activation
- Support rotary position embeddings for image tokens

The implementation follows the original CogView4 architecture while adapting
it to work within the diffusers framework.

* with new attn

* [bugfix] fix dimension mismatch in CogView4 attention

* [cogview4][WIP]: update final normalization in CogView4 transformer

Refactored the final normalization layer in CogView4 transformer to use separate layernorm and AdaLN operations instead of combined AdaLayerNormContinuous. This matches the original implementation but needs validation.

Needs verification against reference implementation.

* 1

* put back

* Update transformer_cogview4.py

* change time_shift

* Update pipeline_cogview4.py

* change timesteps

* fix

* change text_encoder_id

* [cogview4][rope] align RoPE implementation with Megatron

- Implement apply_rope method in attention processor to match Megatron's implementation
- Update position embeddings to ensure compatibility with Megatron-style rotary embeddings
- Ensure consistent rotary position encoding across attention layers

This change improves compatibility with Megatron-based models and provides
better alignment with the original implementation's positional encoding approach.

* [cogview4][bugfix] apply silu activation to time embeddings in CogView4

Applied silu activation to time embeddings before splitting into conditional
and unconditional parts in CogView4Transformer2DModel. This matches the
original implementation and helps ensure correct time conditioning behavior.

* [cogview4][chore] clean up pipeline code

- Remove commented out code and debug statements
- Remove unused retrieve_timesteps function
- Clean up code formatting and documentation

This commit focuses on code cleanup in the CogView4 pipeline implementation, removing unnecessary commented code and improving readability without changing functionality.

* [cogview4][scheduler] Implement CogView4 scheduler and pipeline

* now It work

* add timestep

* batch

* change convert scipt

* refactor pt. 1; make style

* refactor pt. 2

* refactor pt. 3

* add tests

* make fix-copies

* update toctree.yml

* use flow match scheduler instead of custom

* remove scheduling_cogview.py

* add tiktoken to test dependencies

* Update src/diffusers/models/embeddings.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* apply suggestions from review

* use diffusers apply_rotary_emb

* update flow match scheduler to accept timesteps

* fix comment

* apply review sugestions

* Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

---------
Co-authored-by: 三洋三洋 <1258009915@qq.com>
Co-authored-by: OleehyO <leehy0357@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: YiYi Xu <yixu310@gmail.com>

d90cd362

08 Jan, 2025 1 commit
- PyTorch/XLA support (#10498) · 95c5ce4e
  hlky authored Jan 08, 2025
```
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
```
  95c5ce4e
07 Jan, 2025 1 commit

Use pipelines without vae (#10441) · ee7e141d

hlky authored Jan 07, 2025



* Use pipelines without vae

* getattr

* vqvae

---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

ee7e141d

16 Dec, 2024 1 commit
- Fix checkpoint in CogView3PlusPipeline example (#10211) · e9a3911b
  hlky authored Dec 16, 2024
  
  e9a3911b
21 Oct, 2024 1 commit

[docs] add docstrings in `pipline_stable_diffusion.py` (#9590) · bcd61fd3

timdalxx authored Oct 22, 2024



* fix the issue on flux dreambooth lora training

* update : origin main code

* docs: update pipeline_stable_diffusion docstring

* docs: update pipeline_stable_diffusion docstring

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix: style

* fix: style

* fix: copies

* make fix-copies

* remove extra newline

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

bcd61fd3

14 Oct, 2024 1 commit

CogView3Plus DiT (#9570) · 8d81564b

Yuxuan.Zhang authored Oct 14, 2024

* merge 9588

* max_shard_size="5GB" for colab running

* conversion script updates; modeling test; refactor transformer

* make fix-copies

* Update convert_cogview3_to_diffusers.py

* initial pipeline draft

* make style

* fight bugs 🐛

🪳

* add example

* add tests; refactor

* make style

* make fix-copies

* add co-author

YiYi Xu <yixu310@gmail.com>

* remove files

* add docs

* add co-author
Co-Authored-By: YiYi Xu <yixu310@gmail.com>

* fight docs

* address reviews

* make style

* make model work

* remove qkv fusion

* remove qkv fusion tets

* address review comments

* fix make fix-copies error

* remove None and TODO

* for FP16(draft)

* make style

* remove dynamic cfg

* remove pooled_projection_dim as a parameter

* fix tests

---------
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: YiYi Xu <yixu310@gmail.com>

8d81564b

23 Sep, 2024 1 commit

[Cog] some minor fixes and nits (#9466) · ba5af5ae

Sayak Paul authored Sep 23, 2024

* fix positional arguments in check_inputs().

* add video and latetns to check_inputs().

* prep latents_in_channels.

* quality

* multiple fixes.

* fix

ba5af5ae

19 Sep, 2024 1 commit

[training] CogVideoX Lora (#9302) · 2b443a5d

Aryan authored Sep 19, 2024



* cogvideox lora training draft

* update

* update

* update

* update

* update

* make fix-copies

* update

* update

* apply suggestions from review

* apply suggestions from reveiw

* fix typo

* Update examples/cogvideo/train_cogvideox_lora.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* fix lora alpha

* use correct lora scaling for final test pipeline

* Update examples/cogvideo/train_cogvideox_lora.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* apply suggestions from review; prodigy optimizer

YiYi Xu <yixu310@gmail.com>

* add tests

* make style

* add README

* update

* update

* make style

* fix

* update

* add test skeleton

* revert lora utils changes

* add cleaner modifications to lora testing utils

* update lora tests

* deepspeed stuff

* add requirements.txt

* deepspeed refactor

* add lora stuff to img2vid pipeline to fix tests

* fight tests

* add co-authors
Co-Authored-By: Fu-Yun Wang <1697256461@qq.com>
Co-Authored-By: zR <2448370773@qq.com>

* fight lora runner tests

* import Dummy optim and scheduler only wheh required

* update docs

* add coauthors
Co-Authored-By: Fu-Yun Wang <1697256461@qq.com>

* remove option to train text encoder
Co-Authored-By: bghira <bghira@users.github.com>

* update tests

* fight more tests

* update

* fix vid2vid

* fix typo

* remove lora tests; todo in follow-up PR

* undo img2vid changes

* remove text encoder related changes in lora loader mixin

* Revert "remove text encoder related changes in lora loader mixin"

This reverts commit f8a8444487db27859be812866db4e8cec7f25691.

* update

* round 1 of fighting tests

* round 2 of fighting tests

* fix copied from comment

* fix typo in lora test

* update styling
Co-Authored-By: YiYi Xu <yixu310@gmail.com>

---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: zR <2448370773@qq.com>
Co-authored-by: Fu-Yun Wang <1697256461@qq.com>
Co-authored-by: bghira <bghira@users.github.com>

2b443a5d

02 Sep, 2024 1 commit

[core] Support VideoToVideo with CogVideoX (#9333) · 0e6a8403

Aryan authored Sep 02, 2024

* add vid2vid pipeline for cogvideox

* make fix-copies

* update docs

* fake context parallel cache, vae encode tiling

* add test for cog vid2vid

* use video link from HF docs repo

* add copied from comments; correctly rename test class

0e6a8403

25 Aug, 2024 1 commit
- refactor 3d rope for cogvideox (#9269) · 1ca0a755
  YiYi Xu authored Aug 25, 2024
```
* refactor 3d rope

* repeat -> expand
```
  1ca0a755
23 Aug, 2024 1 commit

Cogvideox-5B Model adapter change (#9203) · 960c149c

zR authored Aug 23, 2024



* draft of embedding

---------
Co-authored-by: Aryan <aryan@huggingface.co>

960c149c

13 Aug, 2024 1 commit

[refactor] CogVideoX followups + tiled decoding support (#9150) · a85b34e7

Aryan authored Aug 14, 2024

* refactor context parallel cache; update torch compile time benchmark

* add tiling support

* make style

* remove num_frames % 8 == 0 requirement

* update default num_frames to original value

* add explanations + refactor

* update torch compile example

* update docs

* update

* clean up if-statements

* address review comments

* add test for vae tiling

* update docs

* update docs

* update docstrings

* add modeling test for cogvideox transformer

* make style

a85b34e7

07 Aug, 2024 1 commit

Add CogVideoX text-to-video generation model (#9082) · 2dad462d

zR authored Aug 07, 2024



* add CogVideoX

---------
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

2dad462d