- 19 Jun, 2025 1 commit
-
-
Aryan authored
update
-
- 19 May, 2025 1 commit
-
-
Quentin Gallouédec authored
* Use HF Papers * Apply style fixes --------- Co-authored-by:github-actions[bot] <github-actions[bot]@users.noreply.github.com>
-
- 02 Apr, 2025 1 commit
-
-
hlky authored
* Fix enable_sequential_cpu_offload in CogView4Pipeline * make fix-copies
-
- 15 Mar, 2025 1 commit
-
-
Yuxuan Zhang authored
* cogview4 control training --------- Co-authored-by:
OleehyO <leehy0357@gmail.com> Co-authored-by:
yiyixuxu <yixu310@gmail.com>
-
- 10 Mar, 2025 1 commit
-
-
Aryan authored
* update * make fix-copies * update
-
- 04 Mar, 2025 2 commits
-
-
YiYi Xu authored
fix
-
Yuxuan Zhang authored
* Update pipeline_cogview4.py * Use GLM instead of T5 in doc
-
- 03 Mar, 2025 1 commit
-
-
Yuxuan Zhang authored
-
- 15 Feb, 2025 1 commit
-
-
Yuxuan Zhang authored
* init * encode with glm * draft schedule * feat(scheduler): Add CogView scheduler implementation * feat(embeddings): add CogView 2D rotary positional embedding * 1 * Update pipeline_cogview4.py * fix the timestep init and sigma * update latent * draft patch(not work) * fix * [WIP][cogview4]: implement initial CogView4 pipeline Implement the basic CogView4 pipeline structure with the following changes: - Add CogView4 pipeline implementation - Implement DDIM scheduler for CogView4 - Add CogView3Plus transformer architecture - Update embedding models Current limitations: - CFG implementation uses padding for sequence length alignment - Need to verify transformer inference alignment with Megatron TODO: - Consider separate forward passes for condition/uncondition instead of padding approach * [WIP][cogview4][refactor]: Split condition/uncondition forward pass in CogView4 pipeline Split the forward pass for conditional and unconditional predictions in the CogView4 pipeline to match the original implementation. The noise prediction is now done separately for each case before combining them for guidance. However, the results still need improvement. This is a work in progress as the generated images are not yet matching expected quality. * use with -2 hidden state * remove text_projector * 1 * [WIP] Add tensor-reload to align input from transformer block * [WIP] for older glm * use with cogview4 transformers forward twice of u and uc * Update convert_cogview4_to_diffusers.py * remove this * use main example * change back * reset * setback * back * back 4 * Fix qkv conversion logic for CogView4 to Diffusers format * back5 * revert to sat to cogview4 version * update a new convert from megatron * [WIP][cogview4]: implement CogView4 attention processor Add CogView4AttnProcessor class for implementing scaled dot-product attention with rotary embeddings for the CogVideoX model. This processor concatenates encoder and hidden states, applies QKV projections and RoPE, but does not include spatial normalization. TODO: - Fix incorrect QKV projection weights - Resolve ~25% error in RoPE implementation compared to Megatron * [cogview4] implement CogView4 transformer block Implement CogView4 transformer block following the Megatron architecture: - Add multi-modulate and multi-gate mechanisms for adaptive layer normalization - Implement dual-stream attention with encoder-decoder structure - Add feed-forward network with GELU activation - Support rotary position embeddings for image tokens The implementation follows the original CogView4 architecture while adapting it to work within the diffusers framework. * with new attn * [bugfix] fix dimension mismatch in CogView4 attention * [cogview4][WIP]: update final normalization in CogView4 transformer Refactored the final normalization layer in CogView4 transformer to use separate layernorm and AdaLN operations instead of combined AdaLayerNormContinuous. This matches the original implementation but needs validation. Needs verification against reference implementation. * 1 * put back * Update transformer_cogview4.py * change time_shift * Update pipeline_cogview4.py * change timesteps * fix * change text_encoder_id * [cogview4][rope] align RoPE implementation with Megatron - Implement apply_rope method in attention processor to match Megatron's implementation - Update position embeddings to ensure compatibility with Megatron-style rotary embeddings - Ensure consistent rotary position encoding across attention layers This change improves compatibility with Megatron-based models and provides better alignment with the original implementation's positional encoding approach. * [cogview4][bugfix] apply silu activation to time embeddings in CogView4 Applied silu activation to time embeddings before splitting into conditional and unconditional parts in CogView4Transformer2DModel. This matches the original implementation and helps ensure correct time conditioning behavior. * [cogview4][chore] clean up pipeline code - Remove commented out code and debug statements - Remove unused retrieve_timesteps function - Clean up code formatting and documentation This commit focuses on code cleanup in the CogView4 pipeline implementation, removing unnecessary commented code and improving readability without changing functionality. * [cogview4][scheduler] Implement CogView4 scheduler and pipeline * now It work * add timestep * batch * change convert scipt * refactor pt. 1; make style * refactor pt. 2 * refactor pt. 3 * add tests * make fix-copies * update toctree.yml * use flow match scheduler instead of custom * remove scheduling_cogview.py * add tiktoken to test dependencies * Update src/diffusers/models/embeddings.py Co-authored-by:
YiYi Xu <yixu310@gmail.com> * apply suggestions from review * use diffusers apply_rotary_emb * update flow match scheduler to accept timesteps * fix comment * apply review sugestions * Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py Co-authored-by:
YiYi Xu <yixu310@gmail.com> --------- Co-authored-by:
三洋三洋 <1258009915@qq.com> Co-authored-by:
OleehyO <leehy0357@gmail.com> Co-authored-by:
Aryan <aryan@huggingface.co> Co-authored-by:
YiYi Xu <yixu310@gmail.com>
-
- 08 Jan, 2025 1 commit
-
-
hlky authored
Co-authored-by:Sayak Paul <spsayakpaul@gmail.com>
-
- 07 Jan, 2025 1 commit
-
-
hlky authored
* Use pipelines without vae * getattr * vqvae --------- Co-authored-by:Sayak Paul <spsayakpaul@gmail.com>
-
- 16 Dec, 2024 1 commit
-
-
hlky authored
-
- 21 Oct, 2024 1 commit
-
-
timdalxx authored
* fix the issue on flux dreambooth lora training * update : origin main code * docs: update pipeline_stable_diffusion docstring * docs: update pipeline_stable_diffusion docstring * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py Co-authored-by:
Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix: style * fix: style * fix: copies * make fix-copies * remove extra newline --------- Co-authored-by:
Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by:
Aryan <aryan@huggingface.co> Co-authored-by:
Sayak Paul <spsayakpaul@gmail.com>
-
- 14 Oct, 2024 1 commit
-
-
Yuxuan.Zhang authored
* merge 9588 * max_shard_size="5GB" for colab running * conversion script updates; modeling test; refactor transformer * make fix-copies * Update convert_cogview3_to_diffusers.py * initial pipeline draft * make style * fight bugs
🐛 🪳 * add example * add tests; refactor * make style * make fix-copies * add co-author YiYi Xu <yixu310@gmail.com> * remove files * add docs * add co-author Co-Authored-By:YiYi Xu <yixu310@gmail.com> * fight docs * address reviews * make style * make model work * remove qkv fusion * remove qkv fusion tets * address review comments * fix make fix-copies error * remove None and TODO * for FP16(draft) * make style * remove dynamic cfg * remove pooled_projection_dim as a parameter * fix tests --------- Co-authored-by:
Aryan <aryan@huggingface.co> Co-authored-by:
YiYi Xu <yixu310@gmail.com>
-
- 23 Sep, 2024 1 commit
-
-
Sayak Paul authored
* fix positional arguments in check_inputs(). * add video and latetns to check_inputs(). * prep latents_in_channels. * quality * multiple fixes. * fix
-
- 19 Sep, 2024 1 commit
-
-
Aryan authored
* cogvideox lora training draft * update * update * update * update * update * make fix-copies * update * update * apply suggestions from review * apply suggestions from reveiw * fix typo * Update examples/cogvideo/train_cogvideox_lora.py Co-authored-by:
YiYi Xu <yixu310@gmail.com> * fix lora alpha * use correct lora scaling for final test pipeline * Update examples/cogvideo/train_cogvideox_lora.py Co-authored-by:
YiYi Xu <yixu310@gmail.com> * apply suggestions from review; prodigy optimizer YiYi Xu <yixu310@gmail.com> * add tests * make style * add README * update * update * make style * fix * update * add test skeleton * revert lora utils changes * add cleaner modifications to lora testing utils * update lora tests * deepspeed stuff * add requirements.txt * deepspeed refactor * add lora stuff to img2vid pipeline to fix tests * fight tests * add co-authors Co-Authored-By:
Fu-Yun Wang <1697256461@qq.com> Co-Authored-By:
zR <2448370773@qq.com> * fight lora runner tests * import Dummy optim and scheduler only wheh required * update docs * add coauthors Co-Authored-By:
Fu-Yun Wang <1697256461@qq.com> * remove option to train text encoder Co-Authored-By:
bghira <bghira@users.github.com> * update tests * fight more tests * update * fix vid2vid * fix typo * remove lora tests; todo in follow-up PR * undo img2vid changes * remove text encoder related changes in lora loader mixin * Revert "remove text encoder related changes in lora loader mixin" This reverts commit f8a8444487db27859be812866db4e8cec7f25691. * update * round 1 of fighting tests * round 2 of fighting tests * fix copied from comment * fix typo in lora test * update styling Co-Authored-By:
YiYi Xu <yixu310@gmail.com> --------- Co-authored-by:
YiYi Xu <yixu310@gmail.com> Co-authored-by:
zR <2448370773@qq.com> Co-authored-by:
Fu-Yun Wang <1697256461@qq.com> Co-authored-by:
bghira <bghira@users.github.com>
-
- 02 Sep, 2024 1 commit
-
-
Aryan authored
* add vid2vid pipeline for cogvideox * make fix-copies * update docs * fake context parallel cache, vae encode tiling * add test for cog vid2vid * use video link from HF docs repo * add copied from comments; correctly rename test class
-
- 25 Aug, 2024 1 commit
-
-
YiYi Xu authored
* refactor 3d rope * repeat -> expand
-
- 23 Aug, 2024 1 commit
-
-
zR authored
* draft of embedding --------- Co-authored-by:Aryan <aryan@huggingface.co>
-
- 13 Aug, 2024 1 commit
-
-
Aryan authored
* refactor context parallel cache; update torch compile time benchmark * add tiling support * make style * remove num_frames % 8 == 0 requirement * update default num_frames to original value * add explanations + refactor * update torch compile example * update docs * update * clean up if-statements * address review comments * add test for vae tiling * update docs * update docs * update docstrings * add modeling test for cogvideox transformer * make style
-
- 07 Aug, 2024 1 commit
-
-
zR authored
* add CogVideoX --------- Co-authored-by:
Aryan <aryan@huggingface.co> Co-authored-by:
sayakpaul <spsayakpaul@gmail.com> Co-authored-by:
Aryan <contact.aryanvs@gmail.com> Co-authored-by:
yiyixuxu <yixu310@gmail.com> Co-authored-by:
Steven Liu <59462357+stevhliu@users.noreply.github.com>
-