1. 14 Jun, 2025 1 commit
    • Edna's avatar
      Chroma Pipeline (#11698) · 8adc6003
      Edna authored
      
      
      * working state from hameerabbasi and iddl
      
      * working state form hameerabbasi and iddl (transformer)
      
      * working state (normalization)
      
      * working state (embeddings)
      
      * add chroma loader
      
      * add chroma to mappings
      
      * add chroma to transformer init
      
      * take out variant stuff
      
      * get decently far in changing variant stuff
      
      * add chroma init
      
      * make chroma output class
      
      * add chroma transformer to dummy tp
      
      * add chroma to init
      
      * add chroma to init
      
      * fix single file
      
      * update
      
      * update
      
      * add chroma to auto pipeline
      
      * add chroma to pipeline init
      
      * change to chroma transformer
      
      * take out variant from blocks
      
      * swap embedder location
      
      * remove prompt_2
      
      * work on swapping text encoders
      
      * remove mask function
      
      * dont modify mask (for now)
      
      * wrap attn mask
      
      * no attn mask (can't get it to work)
      
      * remove pooled prompt embeds
      
      * change to my own unpooled embeddeer
      
      * fix load
      
      * take pooled projections out of transformer
      
      * ensure correct dtype for chroma embeddings
      
      * update
      
      * use dn6 attn mask + fix true_cfg_scale
      
      * use chroma pipeline output
      
      * use DN6 embeddings
      
      * remove guidance
      
      * remove guidance embed (pipeline)
      
      * remove guidance from embeddings
      
      * don't return length
      
      * dont change dtype
      
      * remove unused stuff, fix up docs
      
      * add chroma autodoc
      
      * add .md (oops)
      
      * initial chroma docs
      
      * undo don't change dtype
      
      * undo arxiv change
      
      unsure why that happened
      
      * fix hf papers regression in more places
      
      * Update docs/source/en/api/pipelines/chroma.md
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      * do_cfg -> self.do_classifier_free_guidance
      
      * Update docs/source/en/api/models/chroma_transformer.md
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      * Update chroma.md
      
      * Move chroma layers into transformer
      
      * Remove pruned AdaLayerNorms
      
      * Add chroma fast tests
      
      * (untested) batch cond and uncond
      
      * Add # Copied from for shift
      
      * Update # Copied from statements
      
      * update norm imports
      
      * Revert cond + uncond batching
      
      * Add transformer tests
      
      * move chroma test (oops)
      
      * chroma init
      
      * fix chroma pipeline fast tests
      
      * Update src/diffusers/models/transformers/transformer_chroma.py
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      * Move Approximator and Embeddings
      
      * Fix auto pipeline + make style, quality
      
      * make style
      
      * Apply style fixes
      
      * switch to new input ids
      
      * fix # Copied from error
      
      * remove # Copied from on protected members
      
      * try to fix import
      
      * fix import
      
      * make fix-copes
      
      * revert style fix
      
      * update chroma transformer params
      
      * update chroma transformer approximator init params
      
      * update to pad tokens
      
      * fix batch inference
      
      * Make more pipeline tests work
      
      * Make most transformer tests work
      
      * fix docs
      
      * make style, make quality
      
      * skip batch tests
      
      * fix test skipping
      
      * fix test skipping again
      
      * fix for tests
      
      * Fix all pipeline test
      
      * update
      
      * push local changes, fix docs
      
      * add encoder test, remove pooled dim
      
      * default proj dim
      
      * fix tests
      
      * fix equal size list input
      
      * update
      
      * push local changes, fix docs
      
      * add encoder test, remove pooled dim
      
      * default proj dim
      
      * fix tests
      
      * fix equal size list input
      
      * Revert "fix equal size list input"
      
      This reverts commit 3fe4ad67d58d83715bc238f8654f5e90bfc5653c.
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      ---------
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      Co-authored-by: default avatargithub-actions[bot] <github-actions[bot]@users.noreply.github.com>
      8adc6003
  2. 13 Jun, 2025 1 commit
    • Aryan's avatar
      Cosmos Predict2 (#11695) · 9f91305f
      Aryan authored
      * support text-to-image
      
      * update example
      
      * make fix-copies
      
      * support use_flow_sigmas in EDM scheduler instead of maintain cosmos-specific scheduler
      
      * support video-to-world
      
      * update
      
      * rename text2image pipeline
      
      * make fix-copies
      
      * add t2i test
      
      * add test for v2w pipeline
      
      * support edm dpmsolver multistep
      
      * update
      
      * update
      
      * update
      
      * update tests
      
      * fix tests
      
      * safety checker
      
      * make conversion script work without guardrail
      9f91305f
  3. 06 Jun, 2025 1 commit
    • Aryan's avatar
      Wan VACE (#11582) · 73a9d585
      Aryan authored
      * initial support
      
      * make fix-copies
      
      * fix no split modules
      
      * add conversion script
      
      * refactor
      
      * add pipeline test
      
      * refactor
      
      * fix bug with mask
      
      * fix for reference images
      
      * remove print
      
      * update docs
      
      * update slices
      
      * update
      
      * update
      
      * update example
      73a9d585
  4. 27 May, 2025 1 commit
    • Linoy Tsaban's avatar
      [Sana Sprint] add image-to-image pipeline (#11602) · 28ef0165
      Linoy Tsaban authored
      
      
      * sana sprint img2img
      
      * fix import
      
      * fix name
      
      * fix image encoding
      
      * fix image encoding
      
      * fix image encoding
      
      * fix image encoding
      
      * fix image encoding
      
      * fix image encoding
      
      * try w/o strength
      
      * try scaling differently
      
      * try with strength
      
      * revert unnecessary changes to scheduler
      
      * revert unnecessary changes to scheduler
      
      * Apply style fixes
      
      * remove comment
      
      * add copy statements
      
      * add copy statements
      
      * add to doc
      
      * add to doc
      
      * add to doc
      
      * add to doc
      
      * Apply style fixes
      
      * empty commit
      
      * fix copies
      
      * fix copies
      
      * fix copies
      
      * fix copies
      
      * fix copies
      
      * docs
      
      * make fix-copies.
      
      * fix doc building error.
      
      * initial commit - add img2img test
      
      * initial commit - add img2img test
      
      * fix import
      
      * fix imports
      
      * Apply style fixes
      
      * empty commit
      
      * remove
      
      * empty commit
      
      * test vocab size
      
      * fix
      
      * fix prompt missing from last commits
      
      * small changes
      
      * fix image processing when input is tensor
      
      * fix order
      
      * Apply style fixes
      
      * empty commit
      
      * fix shape
      
      * remove comment
      
      * image processing
      
      * remove comment
      
      * skip vae tiling test for now
      
      * Apply style fixes
      
      * empty commit
      
      ---------
      Co-authored-by: default avatargithub-actions[bot] <github-actions[bot]@users.noreply.github.com>
      Co-authored-by: default avatarsayakpaul <spsayakpaul@gmail.com>
      28ef0165
  5. 13 May, 2025 1 commit
    • Aryan's avatar
      LTX Video 0.9.7 (#11516) · 06fee551
      Aryan authored
      
      
      * add upsampling pipeline
      
      * ltx upsample pipeline conversion; pipeline fixes
      
      * make fix-copies
      
      * remove print
      
      * add vae convenience methods
      
      * update
      
      * add tests
      
      * support denoising strength for upscaling & video-to-video
      
      * update docs
      
      * update doc checkpoints
      
      * update docs
      
      * fix
      
      ---------
      Co-authored-by: default avatarLinoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
      06fee551
  6. 12 May, 2025 1 commit
    • Zhong-Yu Li's avatar
      Add VisualCloze (#11377) · 4f438de3
      Zhong-Yu Li authored
      * VisualCloze
      
      * style quality
      
      * add docs
      
      * add docs
      
      * typo
      
      * Update docs/source/en/api/pipelines/visualcloze.md
      
      * delete einops
      
      * style quality
      
      * Update src/diffusers/pipelines/visualcloze/pipeline_visualcloze.py
      
      * reorg
      
      * refine doc
      
      * style quality
      
      * typo
      
      * typo
      
      * Update src/diffusers/image_processor.py
      
      * add comment
      
      * test
      
      * style
      
      * Modified based on review
      
      * style
      
      * restore image_processor
      
      * update example url
      
      * style
      
      * fix-copies
      
      * VisualClozeGenerationPipeline
      
      * combine
      
      * tests docs
      
      * remove VisualClozeUpsamplingPipeline
      
      * style
      
      * quality
      
      * test examples
      
      * quality style
      
      * typo
      
      * make fix-copies
      
      * fix test_callback_cfg and test_save_load_dduf in VisualClozePipelineFastTests
      
      * add EXAMPLE_DOC_STRING to VisualClozeGenerationPipeline
      
      * delete maybe_free_model_hooks from pipeline_visualcloze_combined
      
      * Apply suggestions from code review
      
      * fix test_save_load_local test; add reason for skipping cfg test
      
      * more save_load test fixes
      
      * fix tests in generation pipeline tests
      4f438de3
  7. 07 May, 2025 1 commit
    • Aryan's avatar
      Cosmos (#10660) · 7b904941
      Aryan authored
      
      
      * begin transformer conversion
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * update
      
      * add conversion script
      
      * add pipeline
      
      * make fix-copies
      
      * remove einops
      
      * update docs
      
      * gradient checkpointing
      
      * add transformer test
      
      * update
      
      * debug
      
      * remove prints
      
      * match sigmas
      
      * add vae pt. 1
      
      * finish CV* vae
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * make fix-copies
      
      * update
      
      * make fix-copies
      
      * fix
      
      * update
      
      * update
      
      * make fix-copies
      
      * update
      
      * update tests
      
      * handle device and dtype for safety checker; required in latest diffusers
      
      * remove enable_gqa and use repeat_interleave instead
      
      * enforce safety checker; use dummy checker in fast tests
      
      * add review suggestion for ONNX export
      Co-Authored-By: default avatarAsfiya Baig <asfiyab@nvidia.com>
      
      * fix safety_checker issues when not passed explicitly
      
      We could either do what's done in this commit, or update the Cosmos examples to explicitly pass the safety checker
      
      * use cosmos guardrail package
      
      * auto format docs
      
      * update conversion script to support 14B models
      
      * update name CosmosPipeline -> CosmosTextToWorldPipeline
      
      * update docs
      
      * fix docs
      
      * fix group offload test failing for vae
      
      ---------
      Co-authored-by: default avatarAsfiya Baig <asfiyab@nvidia.com>
      7b904941
  8. 06 May, 2025 1 commit
  9. 13 Apr, 2025 1 commit
    • Ishan Modi's avatar
      [ControlNet] Adds controlnet for SanaTransformer (#11040) · f1f38ffb
      Ishan Modi authored
      
      
      * added controlnet for sana transformer
      
      * improve code quality
      
      * addressed PR comments
      
      * bug fixes
      
      * added test cases
      
      * update
      
      * added dummy objects
      
      * addressed PR comments
      
      * update
      
      * Forcing update
      
      * add to docs
      
      * code quality
      
      * addressed PR comments
      
      * addressed PR comments
      
      * update
      
      * addressed PR comments
      
      * added proper styling
      
      * update
      
      * Revert "added proper styling"
      
      This reverts commit 344ee8a7014ada095b295034ef84341f03b0e359.
      
      * manually ordered
      
      * Apply suggestions from code review
      
      ---------
      Co-authored-by: default avatarAryan <contact.aryanvs@gmail.com>
      f1f38ffb
  10. 11 Apr, 2025 1 commit
  11. 09 Apr, 2025 1 commit
  12. 01 Apr, 2025 1 commit
    • Dhruv Nair's avatar
      [WIP] Add Wan Video2Video (#11053) · df1d7b01
      Dhruv Nair authored
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      df1d7b01
  13. 21 Mar, 2025 1 commit
  14. 18 Mar, 2025 1 commit
  15. 15 Mar, 2025 1 commit
  16. 13 Mar, 2025 1 commit
  17. 07 Mar, 2025 1 commit
    • Aryan's avatar
      Hunyuan I2V (#10983) · 2e5203be
      Aryan authored
      * update
      
      * update
      
      * update
      
      * add tests
      
      * update
      
      * add model tests
      
      * update docs
      
      * update
      
      * update example
      
      * fix defaults
      
      * update
      2e5203be
  18. 03 Mar, 2025 1 commit
  19. 02 Mar, 2025 1 commit
  20. 26 Feb, 2025 1 commit
  21. 21 Feb, 2025 1 commit
  22. 15 Feb, 2025 1 commit
    • Yuxuan Zhang's avatar
      CogView4 (supports different length c and uc) (#10649) · d90cd362
      Yuxuan Zhang authored
      
      
      * init
      
      * encode with glm
      
      * draft schedule
      
      * feat(scheduler): Add CogView scheduler implementation
      
      * feat(embeddings): add CogView 2D rotary positional embedding
      
      * 1
      
      * Update pipeline_cogview4.py
      
      * fix the timestep init and sigma
      
      * update latent
      
      * draft patch(not work)
      
      * fix
      
      * [WIP][cogview4]: implement initial CogView4 pipeline
      
      Implement the basic CogView4 pipeline structure with the following changes:
      - Add CogView4 pipeline implementation
      - Implement DDIM scheduler for CogView4
      - Add CogView3Plus transformer architecture
      - Update embedding models
      
      Current limitations:
      - CFG implementation uses padding for sequence length alignment
      - Need to verify transformer inference alignment with Megatron
      
      TODO:
      - Consider separate forward passes for condition/uncondition
        instead of padding approach
      
      * [WIP][cogview4][refactor]: Split condition/uncondition forward pass in CogView4 pipeline
      
      Split the forward pass for conditional and unconditional predictions in the CogView4 pipeline to match the original implementation. The noise prediction is now done separately for each case before combining them for guidance. However, the results still need improvement.
      
      This is a work in progress as the generated images are not yet matching expected quality.
      
      * use with -2 hidden state
      
      * remove text_projector
      
      * 1
      
      * [WIP] Add tensor-reload to align input from transformer block
      
      * [WIP] for older glm
      
      * use with cogview4 transformers forward twice of u and uc
      
      * Update convert_cogview4_to_diffusers.py
      
      * remove this
      
      * use main example
      
      * change back
      
      * reset
      
      * setback
      
      * back
      
      * back 4
      
      * Fix qkv conversion logic for CogView4 to Diffusers format
      
      * back5
      
      * revert to sat to cogview4 version
      
      * update a new convert from megatron
      
      * [WIP][cogview4]: implement CogView4 attention processor
      
      Add CogView4AttnProcessor class for implementing scaled dot-product attention
      with rotary embeddings for the CogVideoX model. This processor concatenates
      encoder and hidden states, applies QKV projections and RoPE, but does not
      include spatial normalization.
      
      TODO:
      - Fix incorrect QKV projection weights
      - Resolve ~25% error in RoPE implementation compared to Megatron
      
      * [cogview4] implement CogView4 transformer block
      
      Implement CogView4 transformer block following the Megatron architecture:
      - Add multi-modulate and multi-gate mechanisms for adaptive layer normalization
      - Implement dual-stream attention with encoder-decoder structure
      - Add feed-forward network with GELU activation
      - Support rotary position embeddings for image tokens
      
      The implementation follows the original CogView4 architecture while adapting
      it to work within the diffusers framework.
      
      * with new attn
      
      * [bugfix] fix dimension mismatch in CogView4 attention
      
      * [cogview4][WIP]: update final normalization in CogView4 transformer
      
      Refactored the final normalization layer in CogView4 transformer to use separate layernorm and AdaLN operations instead of combined AdaLayerNormContinuous. This matches the original implementation but needs validation.
      
      Needs verification against reference implementation.
      
      * 1
      
      * put back
      
      * Update transformer_cogview4.py
      
      * change time_shift
      
      * Update pipeline_cogview4.py
      
      * change timesteps
      
      * fix
      
      * change text_encoder_id
      
      * [cogview4][rope] align RoPE implementation with Megatron
      
      - Implement apply_rope method in attention processor to match Megatron's implementation
      - Update position embeddings to ensure compatibility with Megatron-style rotary embeddings
      - Ensure consistent rotary position encoding across attention layers
      
      This change improves compatibility with Megatron-based models and provides
      better alignment with the original implementation's positional encoding approach.
      
      * [cogview4][bugfix] apply silu activation to time embeddings in CogView4
      
      Applied silu activation to time embeddings before splitting into conditional
      and unconditional parts in CogView4Transformer2DModel. This matches the
      original implementation and helps ensure correct time conditioning behavior.
      
      * [cogview4][chore] clean up pipeline code
      
      - Remove commented out code and debug statements
      - Remove unused retrieve_timesteps function
      - Clean up code formatting and documentation
      
      This commit focuses on code cleanup in the CogView4 pipeline implementation, removing unnecessary commented code and improving readability without changing functionality.
      
      * [cogview4][scheduler] Implement CogView4 scheduler and pipeline
      
      * now It work
      
      * add timestep
      
      * batch
      
      * change convert scipt
      
      * refactor pt. 1; make style
      
      * refactor pt. 2
      
      * refactor pt. 3
      
      * add tests
      
      * make fix-copies
      
      * update toctree.yml
      
      * use flow match scheduler instead of custom
      
      * remove scheduling_cogview.py
      
      * add tiktoken to test dependencies
      
      * Update src/diffusers/models/embeddings.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * apply suggestions from review
      
      * use diffusers apply_rotary_emb
      
      * update flow match scheduler to accept timesteps
      
      * fix comment
      
      * apply review sugestions
      
      * Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      ---------
      Co-authored-by: default avatar三洋三洋 <1258009915@qq.com>
      Co-authored-by: default avatarOleehyO <leehy0357@gmail.com>
      Co-authored-by: default avatarAryan <aryan@huggingface.co>
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      d90cd362
  23. 11 Feb, 2025 2 commits
  24. 19 Jan, 2025 1 commit
  25. 18 Dec, 2024 1 commit
  26. 16 Dec, 2024 1 commit
    • Aryan's avatar
      [core] Hunyuan Video (#10136) · aace1f41
      Aryan authored
      
      
      * copy transformer
      
      * copy vae
      
      * copy pipeline
      
      * make fix-copies
      
      * refactor; make original code work with diffusers; test latents for comparison generated with this commit
      
      * move rope into pipeline; remove flash attention; refactor
      
      * begin conversion script
      
      * make style
      
      * refactor attention
      
      * refactor
      
      * refactor final layer
      
      * their mlp -> our feedforward
      
      * make style
      
      * add docs
      
      * refactor layer names
      
      * refactor modulation
      
      * cleanup
      
      * refactor norms
      
      * refactor activations
      
      * refactor single blocks attention
      
      * refactor attention processor
      
      * make style
      
      * cleanup a bit
      
      * refactor double transformer block attention
      
      * update mochi attn proc
      
      * use diffusers attention implementation in all modules; checkpoint for all values matching original
      
      * remove helper functions in vae
      
      * refactor upsample
      
      * refactor causal conv
      
      * refactor resnet
      
      * refactor
      
      * refactor
      
      * refactor
      
      * grad checkpointing
      
      * autoencoder test
      
      * fix scaling factor
      
      * refactor clip
      
      * refactor llama text encoding
      
      * add coauthor
      Co-Authored-By: default avatar"Gregory D. Hunkins" <greg@ollano.com>
      
      * refactor rope; diff: 0.14990234375; reason and fix: create rope grid on cpu and move to device
      
      Note: The following line diverges from original behaviour. We create the grid on the device, whereas
      original implementation creates it on CPU and then moves it to device. This results in numerical
      differences in layerwise debugging outputs, but visually it is the same.
      
      * use diffusers timesteps embedding; diff: 0.10205078125
      
      * rename
      
      * convert
      
      * update
      
      * add tests for transformer
      
      * add pipeline tests; text encoder 2 is not optional
      
      * fix attention implementation for torch
      
      * add example
      
      * update docs
      
      * update docs
      
      * apply suggestions from review
      
      * refactor vae
      
      * update
      
      * Apply suggestions from code review
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      
      * Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      
      * Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      
      * make fix-copies
      
      * update
      
      ---------
      Co-authored-by: default avatar"Gregory D. Hunkins" <greg@ollano.com>
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      aace1f41
  27. 15 Dec, 2024 1 commit
    • Junsong Chen's avatar
      [Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`,... · 5a196e3d
      Junsong Chen authored
      
      [Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on. (#9982)
      
      * first add a script for DC-AE;
      
      * DC-AE init
      
      * replace triton with custom implementation
      
      * 1. rename file and remove un-used codes;
      
      * no longer rely on omegaconf and dataclass
      
      * replace custom activation with diffuers activation
      
      * remove dc_ae attention in attention_processor.py
      
      * iinherit from ModelMixin
      
      * inherit from ConfigMixin
      
      * dc-ae reduce to one file
      
      * update downsample and upsample
      
      * clean code
      
      * support DecoderOutput
      
      * remove get_same_padding and val2tuple
      
      * remove autocast and some assert
      
      * update ResBlock
      
      * remove contents within super().__init__
      
      * Update src/diffusers/models/autoencoders/dc_ae.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * remove opsequential
      
      * update other blocks to support the removal of build_norm
      
      * remove build encoder/decoder project in/out
      
      * remove inheritance of RMSNorm2d from LayerNorm
      
      * remove reset_parameters for RMSNorm2d
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * remove device and dtype in RMSNorm2d __init__
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * Update src/diffusers/models/autoencoders/dc_ae.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * Update src/diffusers/models/autoencoders/dc_ae.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * Update src/diffusers/models/autoencoders/dc_ae.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * remove op_list & build_block
      
      * remove build_stage_main
      
      * change file name to autoencoder_dc
      
      * move LiteMLA to attention.py
      
      * align with other vae decode output;
      
      * add DC-AE into init files;
      
      * update
      
      * make quality && make style;
      
      * quick push before dgx disappears again
      
      * update
      
      * make style
      
      * update
      
      * update
      
      * fix
      
      * refactor
      
      * refactor
      
      * refactor
      
      * update
      
      * possibly change to nn.Linear
      
      * refactor
      
      * make fix-copies
      
      * replace vae with ae
      
      * replace get_block_from_block_type to get_block
      
      * replace downsample_block_type from Conv to conv for consistency
      
      * add scaling factors
      
      * incorporate changes for all checkpoints
      
      * make style
      
      * move mla to attention processor file; split qkv conv to linears
      
      * refactor
      
      * add tests
      
      * from original file loader
      
      * add docs
      
      * add standard autoencoder methods
      
      * combine attention processor
      
      * fix tests
      
      * update
      
      * minor fix
      
      * minor fix
      
      * minor fix & in/out shortcut rename
      
      * minor fix
      
      * make style
      
      * fix paper link
      
      * update docs
      
      * update single file loading
      
      * make style
      
      * remove single file loading support; todo for DN6
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * add abstract
      
      * 1. add DCAE into diffusers;
      2. make style and make quality;
      
      * add DCAE_HF into diffusers;
      
      * bug fixed;
      
      * add SanaPipeline, SanaTransformer2D into diffusers;
      
      * add sanaLinearAttnProcessor2_0;
      
      * first update for SanaTransformer;
      
      * first update for SanaPipeline;
      
      * first success run SanaPipeline;
      
      * model output finally match with original model with the same intput;
      
      * code update;
      
      * code update;
      
      * add a flow dpm-solver scripts
      
      * 🎉[important update]
      1. Integrate flow-dpm-sovler into diffusers;
      2. finally run successfully on both `FlowMatchEulerDiscreteScheduler` and `FlowDPMSolverMultistepScheduler`;
      
      * 🎉🔧
      
      [important update & fix huge bugs!!]
      1. add SanaPAGPipeline & several related Sana linear attention operators;
      2. `SanaTransformer2DModel` not supports multi-resolution input;
      2. fix the multi-scale HW bugs in SanaPipeline and SanaPAGPipeline;
      3. fix the flow-dpm-solver set_timestep() init `model_output` and `lower_order_nums` bugs;
      
      * remove prints;
      
      * add convert sana official checkpoint to diffusers format Safetensor.
      
      * Update src/diffusers/models/transformers/sana_transformer_2d.py
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update src/diffusers/models/transformers/sana_transformer_2d.py
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update src/diffusers/models/transformers/sana_transformer_2d.py
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update src/diffusers/pipelines/pag/pipeline_pag_sana.py
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update src/diffusers/models/transformers/sana_transformer_2d.py
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update src/diffusers/models/transformers/sana_transformer_2d.py
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update src/diffusers/pipelines/sana/pipeline_sana.py
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update src/diffusers/pipelines/sana/pipeline_sana.py
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * update Sana for DC-AE's recent commit;
      
      * make style && make quality
      
      * Add StableDiffusion3PAGImg2Img Pipeline + Fix SD3 Unconditional PAG (#9932)
      
      * fix progress bar updates in SD 1.5 PAG Img2Img pipeline
      
      ---------
      Co-authored-by: default avatarVinh H. Pham <phamvinh257@gmail.com>
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * make the vae can be None in `__init__` of `SanaPipeline`
      
      * Update src/diffusers/models/transformers/sana_transformer_2d.py
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      
      * change the ae related code due to the latest update of DCAE branch;
      
      * change the ae related code due to the latest update of DCAE branch;
      
      * 1. change code based on AutoencoderDC;
      2. fix the bug of new GLUMBConv;
      3. run success;
      
      * update for solving conversation.
      
      * 1. fix bugs and run convert script success;
      2. Downloading ckpt from hub automatically;
      
      * make style && make quality;
      
      * 1. remove un-unsed parameters in init;
      2. code update;
      
      * remove test file
      
      * refactor; add docs; add tests; update conversion script
      
      * make style
      
      * make fix-copies
      
      * refactor
      
      * udpate pipelines
      
      * pag tests and refactor
      
      * remove sana pag conversion script
      
      * handle weight casting in conversion script
      
      * update conversion script
      
      * add a processor
      
      * 1. add bf16 pth file path;
      2. add complex human instruct in pipeline;
      
      * fix fast \tests
      
      * change gemma-2-2b-it ckpt to a non-gated repo;
      
      * fix the pth path bug in conversion script;
      
      * change grad ckpt to original; make style
      
      * fix the complex_human_instruct bug and typo;
      
      * remove dpmsolver flow scheduler
      
      * apply review suggestions
      
      * change the `FlowMatchEulerDiscreteScheduler` to default `DPMSolverMultistepScheduler` with flow matching scheduler.
      
      * fix the tokenizer.padding_side='right' bug;
      
      * update docs
      
      * make fix-copies
      
      * fix imports
      
      * fix docs
      
      * add integration test
      
      * update docs
      
      * update examples
      
      * fix convert_model_output in schedulers
      
      * fix failing tests
      
      ---------
      Co-authored-by: default avatarJunyu Chen <chenjydl2003@gmail.com>
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      Co-authored-by: default avatarchenjy2003 <70215701+chenjy2003@users.noreply.github.com>
      Co-authored-by: default avatarAryan <aryan@huggingface.co>
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      5a196e3d
  28. 12 Dec, 2024 1 commit
    • Aryan's avatar
      [core] LTX Video (#10021) · 96c376a5
      Aryan authored
      
      
      * transformer
      
      * make style & make fix-copies
      
      * transformer
      
      * add transformer tests
      
      * 80% vae
      
      * make style
      
      * make fix-copies
      
      * fix
      
      * undo cogvideox changes
      
      * update
      
      * update
      
      * match vae
      
      * add docs
      
      * t2v pipeline working; scheduler needs to be checked
      
      * docs
      
      * add pipeline test
      
      * update
      
      * update
      
      * make fix-copies
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * update
      
      * copy t2v to i2v pipeline
      
      * update
      
      * apply review suggestions
      
      * update
      
      * make style
      
      * remove framewise encoding/decoding
      
      * pack/unpack latents
      
      * image2video
      
      * update
      
      * make fix-copies
      
      * update
      
      * update
      
      * rope scale fix
      
      * debug layerwise code
      
      * remove debug
      
      * Apply suggestions from code review
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * propagate precision changes to i2v pipeline
      
      * remove downcast
      
      * address review comments
      
      * fix comment
      
      * address review comments
      
      * [Single File] LTX support for loading original weights (#10135)
      
      * from original file mixin for ltx
      
      * undo config mapping fn changes
      
      * update
      
      * add single file to pipelines
      
      * update docs
      
      * Update src/diffusers/models/autoencoders/autoencoder_kl_ltx.py
      
      * Update src/diffusers/models/autoencoders/autoencoder_kl_ltx.py
      
      * rename classes based on ltx review
      
      * point to original repository for inference
      
      * make style
      
      * resolve conflicts correctly
      
      ---------
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      96c376a5
  29. 11 Dec, 2024 1 commit
  30. 10 Dec, 2024 1 commit
  31. 03 Dec, 2024 1 commit
  32. 23 Nov, 2024 1 commit
  33. 05 Nov, 2024 1 commit
    • Aryan's avatar
      [core] Mochi T2V (#9769) · 3f329a42
      Aryan authored
      
      
      * update
      
      * udpate
      
      * update transformer
      
      * make style
      
      * fix
      
      * add conversion script
      
      * update
      
      * fix
      
      * update
      
      * fix
      
      * update
      
      * fixes
      
      * make style
      
      * update
      
      * update
      
      * update
      
      * init
      
      * update
      
      * update
      
      * add
      
      * up
      
      * up
      
      * up
      
      * update
      
      * mochi transformer
      
      * remove original implementation
      
      * make style
      
      * update inits
      
      * update conversion script
      
      * docs
      
      * Update src/diffusers/pipelines/mochi/pipeline_mochi.py
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      * Update src/diffusers/pipelines/mochi/pipeline_mochi.py
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      * fix docs
      
      * pipeline fixes
      
      * make style
      
      * invert sigmas in scheduler; fix pipeline
      
      * fix pipeline num_frames
      
      * flip proj and gate in swiglu
      
      * make style
      
      * fix
      
      * make style
      
      * fix tests
      
      * latent mean and std fix
      
      * update
      
      * cherry-pick 1069d210e1b9e84a366cdc7a13965626ea258178
      
      * remove additional sigma already handled by flow match scheduler
      
      * fix
      
      * remove hardcoded value
      
      * replace conv1x1 with linear
      
      * Update src/diffusers/pipelines/mochi/pipeline_mochi.py
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      * framewise decoding and conv_cache
      
      * make style
      
      * Apply suggestions from code review
      
      * mochi vae encoder changes
      
      * rebase correctly
      
      * Update scripts/convert_mochi_to_diffusers.py
      
      * fix tests
      
      * fixes
      
      * make style
      
      * update
      
      * make style
      
      * update
      
      * add framewise and tiled encoding
      
      * make style
      
      * make original vae implementation behaviour the default; note: framewise encoding does not work
      
      * remove framewise encoding implementation due to presence of attn layers
      
      * fight test 1
      
      * fight test 2
      
      ---------
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      Co-authored-by: default avataryiyixuxu <yixu310@gmail.com>
      3f329a42
  34. 29 Oct, 2024 1 commit
  35. 16 Oct, 2024 1 commit
  36. 14 Oct, 2024 1 commit
    • Yuxuan.Zhang's avatar
      CogView3Plus DiT (#9570) · 8d81564b
      Yuxuan.Zhang authored
      * merge 9588
      
      * max_shard_size="5GB" for colab running
      
      * conversion script updates; modeling test; refactor transformer
      
      * make fix-copies
      
      * Update convert_cogview3_to_diffusers.py
      
      * initial pipeline draft
      
      * make style
      
      * fight bugs 🐛
      
      🪳
      
      * add example
      
      * add tests; refactor
      
      * make style
      
      * make fix-copies
      
      * add co-author
      
      YiYi Xu <yixu310@gmail.com>
      
      * remove files
      
      * add docs
      
      * add co-author
      Co-Authored-By: default avatarYiYi Xu <yixu310@gmail.com>
      
      * fight docs
      
      * address reviews
      
      * make style
      
      * make model work
      
      * remove qkv fusion
      
      * remove qkv fusion tets
      
      * address review comments
      
      * fix make fix-copies error
      
      * remove None and TODO
      
      * for FP16(draft)
      
      * make style
      
      * remove dynamic cfg
      
      * remove pooled_projection_dim as a parameter
      
      * fix tests
      
      ---------
      Co-authored-by: default avatarAryan <aryan@huggingface.co>
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      8d81564b
  37. 09 Oct, 2024 1 commit
  38. 01 Oct, 2024 1 commit
  39. 17 Sep, 2024 1 commit