1. 02 Jul, 2025 1 commit
  2. 26 Jun, 2025 2 commits
  3. 18 Jun, 2025 1 commit
    • Dhruv Nair's avatar
      Chroma Follow Up (#11725) · 66394bf6
      Dhruv Nair authored
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * updte
      
      * update
      
      * update
      
      * update
      66394bf6
  4. 14 Jun, 2025 1 commit
    • Edna's avatar
      Chroma Pipeline (#11698) · 8adc6003
      Edna authored
      
      
      * working state from hameerabbasi and iddl
      
      * working state form hameerabbasi and iddl (transformer)
      
      * working state (normalization)
      
      * working state (embeddings)
      
      * add chroma loader
      
      * add chroma to mappings
      
      * add chroma to transformer init
      
      * take out variant stuff
      
      * get decently far in changing variant stuff
      
      * add chroma init
      
      * make chroma output class
      
      * add chroma transformer to dummy tp
      
      * add chroma to init
      
      * add chroma to init
      
      * fix single file
      
      * update
      
      * update
      
      * add chroma to auto pipeline
      
      * add chroma to pipeline init
      
      * change to chroma transformer
      
      * take out variant from blocks
      
      * swap embedder location
      
      * remove prompt_2
      
      * work on swapping text encoders
      
      * remove mask function
      
      * dont modify mask (for now)
      
      * wrap attn mask
      
      * no attn mask (can't get it to work)
      
      * remove pooled prompt embeds
      
      * change to my own unpooled embeddeer
      
      * fix load
      
      * take pooled projections out of transformer
      
      * ensure correct dtype for chroma embeddings
      
      * update
      
      * use dn6 attn mask + fix true_cfg_scale
      
      * use chroma pipeline output
      
      * use DN6 embeddings
      
      * remove guidance
      
      * remove guidance embed (pipeline)
      
      * remove guidance from embeddings
      
      * don't return length
      
      * dont change dtype
      
      * remove unused stuff, fix up docs
      
      * add chroma autodoc
      
      * add .md (oops)
      
      * initial chroma docs
      
      * undo don't change dtype
      
      * undo arxiv change
      
      unsure why that happened
      
      * fix hf papers regression in more places
      
      * Update docs/source/en/api/pipelines/chroma.md
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      * do_cfg -> self.do_classifier_free_guidance
      
      * Update docs/source/en/api/models/chroma_transformer.md
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      * Update chroma.md
      
      * Move chroma layers into transformer
      
      * Remove pruned AdaLayerNorms
      
      * Add chroma fast tests
      
      * (untested) batch cond and uncond
      
      * Add # Copied from for shift
      
      * Update # Copied from statements
      
      * update norm imports
      
      * Revert cond + uncond batching
      
      * Add transformer tests
      
      * move chroma test (oops)
      
      * chroma init
      
      * fix chroma pipeline fast tests
      
      * Update src/diffusers/models/transformers/transformer_chroma.py
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      * Move Approximator and Embeddings
      
      * Fix auto pipeline + make style, quality
      
      * make style
      
      * Apply style fixes
      
      * switch to new input ids
      
      * fix # Copied from error
      
      * remove # Copied from on protected members
      
      * try to fix import
      
      * fix import
      
      * make fix-copes
      
      * revert style fix
      
      * update chroma transformer params
      
      * update chroma transformer approximator init params
      
      * update to pad tokens
      
      * fix batch inference
      
      * Make more pipeline tests work
      
      * Make most transformer tests work
      
      * fix docs
      
      * make style, make quality
      
      * skip batch tests
      
      * fix test skipping
      
      * fix test skipping again
      
      * fix for tests
      
      * Fix all pipeline test
      
      * update
      
      * push local changes, fix docs
      
      * add encoder test, remove pooled dim
      
      * default proj dim
      
      * fix tests
      
      * fix equal size list input
      
      * update
      
      * push local changes, fix docs
      
      * add encoder test, remove pooled dim
      
      * default proj dim
      
      * fix tests
      
      * fix equal size list input
      
      * Revert "fix equal size list input"
      
      This reverts commit 3fe4ad67d58d83715bc238f8654f5e90bfc5653c.
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      ---------
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      Co-authored-by: default avatargithub-actions[bot] <github-actions[bot]@users.noreply.github.com>
      8adc6003
  5. 13 Jun, 2025 1 commit
    • Aryan's avatar
      Cosmos Predict2 (#11695) · 9f91305f
      Aryan authored
      * support text-to-image
      
      * update example
      
      * make fix-copies
      
      * support use_flow_sigmas in EDM scheduler instead of maintain cosmos-specific scheduler
      
      * support video-to-world
      
      * update
      
      * rename text2image pipeline
      
      * make fix-copies
      
      * add t2i test
      
      * add test for v2w pipeline
      
      * support edm dpmsolver multistep
      
      * update
      
      * update
      
      * update
      
      * update tests
      
      * fix tests
      
      * safety checker
      
      * make conversion script work without guardrail
      9f91305f
  6. 06 Jun, 2025 1 commit
    • Aryan's avatar
      Wan VACE (#11582) · 73a9d585
      Aryan authored
      * initial support
      
      * make fix-copies
      
      * fix no split modules
      
      * add conversion script
      
      * refactor
      
      * add pipeline test
      
      * refactor
      
      * fix bug with mask
      
      * fix for reference images
      
      * remove print
      
      * update docs
      
      * update slices
      
      * update
      
      * update
      
      * update example
      73a9d585
  7. 05 Jun, 2025 1 commit
  8. 27 May, 2025 1 commit
    • Linoy Tsaban's avatar
      [Sana Sprint] add image-to-image pipeline (#11602) · 28ef0165
      Linoy Tsaban authored
      
      
      * sana sprint img2img
      
      * fix import
      
      * fix name
      
      * fix image encoding
      
      * fix image encoding
      
      * fix image encoding
      
      * fix image encoding
      
      * fix image encoding
      
      * fix image encoding
      
      * try w/o strength
      
      * try scaling differently
      
      * try with strength
      
      * revert unnecessary changes to scheduler
      
      * revert unnecessary changes to scheduler
      
      * Apply style fixes
      
      * remove comment
      
      * add copy statements
      
      * add copy statements
      
      * add to doc
      
      * add to doc
      
      * add to doc
      
      * add to doc
      
      * Apply style fixes
      
      * empty commit
      
      * fix copies
      
      * fix copies
      
      * fix copies
      
      * fix copies
      
      * fix copies
      
      * docs
      
      * make fix-copies.
      
      * fix doc building error.
      
      * initial commit - add img2img test
      
      * initial commit - add img2img test
      
      * fix import
      
      * fix imports
      
      * Apply style fixes
      
      * empty commit
      
      * remove
      
      * empty commit
      
      * test vocab size
      
      * fix
      
      * fix prompt missing from last commits
      
      * small changes
      
      * fix image processing when input is tensor
      
      * fix order
      
      * Apply style fixes
      
      * empty commit
      
      * fix shape
      
      * remove comment
      
      * image processing
      
      * remove comment
      
      * skip vae tiling test for now
      
      * Apply style fixes
      
      * empty commit
      
      ---------
      Co-authored-by: default avatargithub-actions[bot] <github-actions[bot]@users.noreply.github.com>
      Co-authored-by: default avatarsayakpaul <spsayakpaul@gmail.com>
      28ef0165
  9. 13 May, 2025 1 commit
    • Aryan's avatar
      LTX Video 0.9.7 (#11516) · 06fee551
      Aryan authored
      
      
      * add upsampling pipeline
      
      * ltx upsample pipeline conversion; pipeline fixes
      
      * make fix-copies
      
      * remove print
      
      * add vae convenience methods
      
      * update
      
      * add tests
      
      * support denoising strength for upscaling & video-to-video
      
      * update docs
      
      * update doc checkpoints
      
      * update docs
      
      * fix
      
      ---------
      Co-authored-by: default avatarLinoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
      06fee551
  10. 12 May, 2025 1 commit
    • Zhong-Yu Li's avatar
      Add VisualCloze (#11377) · 4f438de3
      Zhong-Yu Li authored
      * VisualCloze
      
      * style quality
      
      * add docs
      
      * add docs
      
      * typo
      
      * Update docs/source/en/api/pipelines/visualcloze.md
      
      * delete einops
      
      * style quality
      
      * Update src/diffusers/pipelines/visualcloze/pipeline_visualcloze.py
      
      * reorg
      
      * refine doc
      
      * style quality
      
      * typo
      
      * typo
      
      * Update src/diffusers/image_processor.py
      
      * add comment
      
      * test
      
      * style
      
      * Modified based on review
      
      * style
      
      * restore image_processor
      
      * update example url
      
      * style
      
      * fix-copies
      
      * VisualClozeGenerationPipeline
      
      * combine
      
      * tests docs
      
      * remove VisualClozeUpsamplingPipeline
      
      * style
      
      * quality
      
      * test examples
      
      * quality style
      
      * typo
      
      * make fix-copies
      
      * fix test_callback_cfg and test_save_load_dduf in VisualClozePipelineFastTests
      
      * add EXAMPLE_DOC_STRING to VisualClozeGenerationPipeline
      
      * delete maybe_free_model_hooks from pipeline_visualcloze_combined
      
      * Apply suggestions from code review
      
      * fix test_save_load_local test; add reason for skipping cfg test
      
      * more save_load test fixes
      
      * fix tests in generation pipeline tests
      4f438de3
  11. 07 May, 2025 1 commit
    • Aryan's avatar
      Cosmos (#10660) · 7b904941
      Aryan authored
      
      
      * begin transformer conversion
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * update
      
      * add conversion script
      
      * add pipeline
      
      * make fix-copies
      
      * remove einops
      
      * update docs
      
      * gradient checkpointing
      
      * add transformer test
      
      * update
      
      * debug
      
      * remove prints
      
      * match sigmas
      
      * add vae pt. 1
      
      * finish CV* vae
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * make fix-copies
      
      * update
      
      * make fix-copies
      
      * fix
      
      * update
      
      * update
      
      * make fix-copies
      
      * update
      
      * update tests
      
      * handle device and dtype for safety checker; required in latest diffusers
      
      * remove enable_gqa and use repeat_interleave instead
      
      * enforce safety checker; use dummy checker in fast tests
      
      * add review suggestion for ONNX export
      Co-Authored-By: default avatarAsfiya Baig <asfiyab@nvidia.com>
      
      * fix safety_checker issues when not passed explicitly
      
      We could either do what's done in this commit, or update the Cosmos examples to explicitly pass the safety checker
      
      * use cosmos guardrail package
      
      * auto format docs
      
      * update conversion script to support 14B models
      
      * update name CosmosPipeline -> CosmosTextToWorldPipeline
      
      * update docs
      
      * fix docs
      
      * fix group offload test failing for vae
      
      ---------
      Co-authored-by: default avatarAsfiya Baig <asfiyab@nvidia.com>
      7b904941
  12. 06 May, 2025 1 commit
  13. 15 Apr, 2025 2 commits
  14. 14 Apr, 2025 1 commit
  15. 13 Apr, 2025 1 commit
    • Ishan Modi's avatar
      [ControlNet] Adds controlnet for SanaTransformer (#11040) · f1f38ffb
      Ishan Modi authored
      
      
      * added controlnet for sana transformer
      
      * improve code quality
      
      * addressed PR comments
      
      * bug fixes
      
      * added test cases
      
      * update
      
      * added dummy objects
      
      * addressed PR comments
      
      * update
      
      * Forcing update
      
      * add to docs
      
      * code quality
      
      * addressed PR comments
      
      * addressed PR comments
      
      * update
      
      * addressed PR comments
      
      * added proper styling
      
      * update
      
      * Revert "added proper styling"
      
      This reverts commit 344ee8a7014ada095b295034ef84341f03b0e359.
      
      * manually ordered
      
      * Apply suggestions from code review
      
      ---------
      Co-authored-by: default avatarAryan <contact.aryanvs@gmail.com>
      f1f38ffb
  16. 11 Apr, 2025 1 commit
  17. 09 Apr, 2025 2 commits
  18. 01 Apr, 2025 1 commit
    • Dhruv Nair's avatar
      [WIP] Add Wan Video2Video (#11053) · df1d7b01
      Dhruv Nair authored
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      df1d7b01
  19. 21 Mar, 2025 2 commits
  20. 18 Mar, 2025 1 commit
  21. 15 Mar, 2025 1 commit
  22. 13 Mar, 2025 1 commit
  23. 11 Mar, 2025 1 commit
  24. 10 Mar, 2025 1 commit
  25. 07 Mar, 2025 1 commit
    • Aryan's avatar
      Hunyuan I2V (#10983) · 2e5203be
      Aryan authored
      * update
      
      * update
      
      * update
      
      * add tests
      
      * update
      
      * add model tests
      
      * update docs
      
      * update
      
      * update example
      
      * fix defaults
      
      * update
      2e5203be
  26. 03 Mar, 2025 1 commit
  27. 02 Mar, 2025 1 commit
  28. 26 Feb, 2025 1 commit
  29. 24 Feb, 2025 1 commit
  30. 21 Feb, 2025 1 commit
  31. 15 Feb, 2025 1 commit
    • Yuxuan Zhang's avatar
      CogView4 (supports different length c and uc) (#10649) · d90cd362
      Yuxuan Zhang authored
      
      
      * init
      
      * encode with glm
      
      * draft schedule
      
      * feat(scheduler): Add CogView scheduler implementation
      
      * feat(embeddings): add CogView 2D rotary positional embedding
      
      * 1
      
      * Update pipeline_cogview4.py
      
      * fix the timestep init and sigma
      
      * update latent
      
      * draft patch(not work)
      
      * fix
      
      * [WIP][cogview4]: implement initial CogView4 pipeline
      
      Implement the basic CogView4 pipeline structure with the following changes:
      - Add CogView4 pipeline implementation
      - Implement DDIM scheduler for CogView4
      - Add CogView3Plus transformer architecture
      - Update embedding models
      
      Current limitations:
      - CFG implementation uses padding for sequence length alignment
      - Need to verify transformer inference alignment with Megatron
      
      TODO:
      - Consider separate forward passes for condition/uncondition
        instead of padding approach
      
      * [WIP][cogview4][refactor]: Split condition/uncondition forward pass in CogView4 pipeline
      
      Split the forward pass for conditional and unconditional predictions in the CogView4 pipeline to match the original implementation. The noise prediction is now done separately for each case before combining them for guidance. However, the results still need improvement.
      
      This is a work in progress as the generated images are not yet matching expected quality.
      
      * use with -2 hidden state
      
      * remove text_projector
      
      * 1
      
      * [WIP] Add tensor-reload to align input from transformer block
      
      * [WIP] for older glm
      
      * use with cogview4 transformers forward twice of u and uc
      
      * Update convert_cogview4_to_diffusers.py
      
      * remove this
      
      * use main example
      
      * change back
      
      * reset
      
      * setback
      
      * back
      
      * back 4
      
      * Fix qkv conversion logic for CogView4 to Diffusers format
      
      * back5
      
      * revert to sat to cogview4 version
      
      * update a new convert from megatron
      
      * [WIP][cogview4]: implement CogView4 attention processor
      
      Add CogView4AttnProcessor class for implementing scaled dot-product attention
      with rotary embeddings for the CogVideoX model. This processor concatenates
      encoder and hidden states, applies QKV projections and RoPE, but does not
      include spatial normalization.
      
      TODO:
      - Fix incorrect QKV projection weights
      - Resolve ~25% error in RoPE implementation compared to Megatron
      
      * [cogview4] implement CogView4 transformer block
      
      Implement CogView4 transformer block following the Megatron architecture:
      - Add multi-modulate and multi-gate mechanisms for adaptive layer normalization
      - Implement dual-stream attention with encoder-decoder structure
      - Add feed-forward network with GELU activation
      - Support rotary position embeddings for image tokens
      
      The implementation follows the original CogView4 architecture while adapting
      it to work within the diffusers framework.
      
      * with new attn
      
      * [bugfix] fix dimension mismatch in CogView4 attention
      
      * [cogview4][WIP]: update final normalization in CogView4 transformer
      
      Refactored the final normalization layer in CogView4 transformer to use separate layernorm and AdaLN operations instead of combined AdaLayerNormContinuous. This matches the original implementation but needs validation.
      
      Needs verification against reference implementation.
      
      * 1
      
      * put back
      
      * Update transformer_cogview4.py
      
      * change time_shift
      
      * Update pipeline_cogview4.py
      
      * change timesteps
      
      * fix
      
      * change text_encoder_id
      
      * [cogview4][rope] align RoPE implementation with Megatron
      
      - Implement apply_rope method in attention processor to match Megatron's implementation
      - Update position embeddings to ensure compatibility with Megatron-style rotary embeddings
      - Ensure consistent rotary position encoding across attention layers
      
      This change improves compatibility with Megatron-based models and provides
      better alignment with the original implementation's positional encoding approach.
      
      * [cogview4][bugfix] apply silu activation to time embeddings in CogView4
      
      Applied silu activation to time embeddings before splitting into conditional
      and unconditional parts in CogView4Transformer2DModel. This matches the
      original implementation and helps ensure correct time conditioning behavior.
      
      * [cogview4][chore] clean up pipeline code
      
      - Remove commented out code and debug statements
      - Remove unused retrieve_timesteps function
      - Clean up code formatting and documentation
      
      This commit focuses on code cleanup in the CogView4 pipeline implementation, removing unnecessary commented code and improving readability without changing functionality.
      
      * [cogview4][scheduler] Implement CogView4 scheduler and pipeline
      
      * now It work
      
      * add timestep
      
      * batch
      
      * change convert scipt
      
      * refactor pt. 1; make style
      
      * refactor pt. 2
      
      * refactor pt. 3
      
      * add tests
      
      * make fix-copies
      
      * update toctree.yml
      
      * use flow match scheduler instead of custom
      
      * remove scheduling_cogview.py
      
      * add tiktoken to test dependencies
      
      * Update src/diffusers/models/embeddings.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * apply suggestions from review
      
      * use diffusers apply_rotary_emb
      
      * update flow match scheduler to accept timesteps
      
      * fix comment
      
      * apply review sugestions
      
      * Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      ---------
      Co-authored-by: default avatar三洋三洋 <1258009915@qq.com>
      Co-authored-by: default avatarOleehyO <leehy0357@gmail.com>
      Co-authored-by: default avatarAryan <aryan@huggingface.co>
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      d90cd362
  32. 11 Feb, 2025 2 commits
  33. 27 Jan, 2025 1 commit
  34. 19 Jan, 2025 1 commit
  35. 23 Dec, 2024 1 commit