1. 15 May, 2025 2 commits
  2. 13 May, 2025 4 commits
  3. 12 May, 2025 2 commits
    • Zhong-Yu Li's avatar
      Add VisualCloze (#11377) · 4f438de3
      Zhong-Yu Li authored
      * VisualCloze
      
      * style quality
      
      * add docs
      
      * add docs
      
      * typo
      
      * Update docs/source/en/api/pipelines/visualcloze.md
      
      * delete einops
      
      * style quality
      
      * Update src/diffusers/pipelines/visualcloze/pipeline_visualcloze.py
      
      * reorg
      
      * refine doc
      
      * style quality
      
      * typo
      
      * typo
      
      * Update src/diffusers/image_processor.py
      
      * add comment
      
      * test
      
      * style
      
      * Modified based on review
      
      * style
      
      * restore image_processor
      
      * update example url
      
      * style
      
      * fix-copies
      
      * VisualClozeGenerationPipeline
      
      * combine
      
      * tests docs
      
      * remove VisualClozeUpsamplingPipeline
      
      * style
      
      * quality
      
      * test examples
      
      * quality style
      
      * typo
      
      * make fix-copies
      
      * fix test_callback_cfg and test_save_load_dduf in VisualClozePipelineFastTests
      
      * add EXAMPLE_DOC_STRING to VisualClozeGenerationPipeline
      
      * delete maybe_free_model_hooks from pipeline_visualcloze_combined
      
      * Apply suggestions from code review
      
      * fix test_save_load_local test; add reason for skipping cfg test
      
      * more save_load test fixes
      
      * fix tests in generation pipeline tests
      4f438de3
    • Aryan's avatar
      Hunyuan Video Framepack F1 (#11534) · e48f6aee
      Aryan authored
      * support framepack f1
      
      * update docs
      
      * update toctree
      
      * remove typo
      e48f6aee
  4. 11 May, 2025 1 commit
  5. 09 May, 2025 4 commits
    • Aryan's avatar
      92fe689f
    • James Xu's avatar
      [LTXPipeline] Update latents dtype to match VAE dtype (#11533) · 3c0a0129
      James Xu authored
      fix: update latents dtype to match vae
      3c0a0129
    • Sayak Paul's avatar
      [LoRA] support non-diffusers hidream loras (#11532) · 0c47c954
      Sayak Paul authored
      * support non-diffusers hidream loras
      
      * make fix-copies
      0c47c954
    • Sayak Paul's avatar
      feat: pipeline-level quantization config (#11130) · 599c8871
      Sayak Paul authored
      
      
      * feat: pipeline-level quant config.
      Co-authored-by: default avatarSunMarc <marc.sun@hotmail.fr>
      
      condition better.
      
      support mapping.
      
      improvements.
      
      [Quantization] Add Quanto backend (#10756)
      
      * update
      
      * updaet
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * Update docs/source/en/quantization/quanto.md
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * Update src/diffusers/quantizers/quanto/utils.py
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * update
      
      * update
      
      ---------
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      [Single File] Add single file loading for SANA Transformer (#10947)
      
      * added support for from_single_file
      
      * added diffusers mapping script
      
      * added testcase
      
      * bug fix
      
      * updated tests
      
      * corrected code quality
      
      * corrected code quality
      
      ---------
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      [LoRA] Improve warning messages when LoRA loading becomes a no-op (#10187)
      
      * updates
      
      * updates
      
      * updates
      
      * updates
      
      * notebooks revert
      
      * fix-copies.
      
      * seeing
      
      * fix
      
      * revert
      
      * fixes
      
      * fixes
      
      * fixes
      
      * remove print
      
      * fix
      
      * conflicts ii.
      
      * updates
      
      * fixes
      
      * better filtering of prefix.
      
      ---------
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      
      [LoRA] CogView4 (#10981)
      
      * update
      
      * make fix-copies
      
      * update
      
      [Tests] improve quantization tests by additionally measuring the inference memory savings (#11021)
      
      * memory usage tests
      
      * fixes
      
      * gguf
      
      [`Research Project`] Add AnyText: Multilingual Visual Text Generation And Editing (#8998)
      
      * Add initial template
      
      * Second template
      
      * feat: Add TextEmbeddingModule to AnyTextPipeline
      
      * feat: Add AuxiliaryLatentModule template to AnyTextPipeline
      
      * Add bert tokenizer from the anytext repo for now
      
      * feat: Update AnyTextPipeline's modify_prompt method
      
      This commit adds improvements to the modify_prompt method in the AnyTextPipeline class. The method now handles special characters and replaces selected string prompts with a placeholder. Additionally, it includes a check for Chinese text and translation using the trans_pipe.
      
      * Fill in the `forward` pass of `AuxiliaryLatentModule`
      
      * `make style && make quality`
      
      * `chore: Update bert_tokenizer.py with a TODO comment suggesting the use of the transformers library`
      
      * Update error handling to raise and logging
      
      * Add `create_glyph_lines` function into `TextEmbeddingModule`
      
      * make style
      
      * Up
      
      * Up
      
      * Up
      
      * Up
      
      * Remove several comments
      
      * refactor: Remove ControlNetConditioningEmbedding and update code accordingly
      
      * Up
      
      * Up
      
      * up
      
      * refactor: Update AnyTextPipeline to include new optional parameters
      
      * up
      
      * feat: Add OCR model and its components
      
      * chore: Update `TextEmbeddingModule` to include OCR model components and dependencies
      
      * chore: Update `AuxiliaryLatentModule` to include VAE model and its dependencies for masked image in the editing task
      
      * `make style`
      
      * refactor: Update `AnyTextPipeline`'s docstring
      
      * Update `AuxiliaryLatentModule` to include info dictionary so that text processing is done once
      
      * simplify
      
      * `make style`
      
      * Converting `TextEmbeddingModule` to ordinary `encode_prompt()` function
      
      * Simplify for now
      
      * `make style`
      
      * Up
      
      * feat: Add scripts to convert AnyText controlnet to diffusers
      
      * `make style`
      
      * Fix: Move glyph rendering to `TextEmbeddingModule` from `AuxiliaryLatentModule`
      
      * make style
      
      * Up
      
      * Simplify
      
      * Up
      
      * feat: Add safetensors module for loading model file
      
      * Fix device issues
      
      * Up
      
      * Up
      
      * refactor: Simplify
      
      * refactor: Simplify code for loading models and handling data types
      
      * `make style`
      
      * refactor: Update to() method in FrozenCLIPEmbedderT3 and TextEmbeddingModule
      
      * refactor: Update dtype in embedding_manager.py to match proj.weight
      
      * Up
      
      * Add attribution and adaptation information to pipeline_anytext.py
      
      * Update usage example
      
      * Will refactor `controlnet_cond_embedding` initialization
      
      * Add `AnyTextControlNetConditioningEmbedding` template
      
      * Refactor organization
      
      * style
      
      * style
      
      * Move custom blocks from `AuxiliaryLatentModule` to `AnyTextControlNetConditioningEmbedding`
      
      * Follow one-file policy
      
      * style
      
      * [Docs] Update README and pipeline_anytext.py to use AnyTextControlNetModel
      
      * [Docs] Update import statement for AnyTextControlNetModel in pipeline_anytext.py
      
      * [Fix] Update import path for ControlNetModel, ControlNetOutput in anytext_controlnet.py
      
      * Refactor AnyTextControlNet to use configurable conditioning embedding channels
      
      * Complete control net conditioning embedding in AnyTextControlNetModel
      
      * up
      
      * [FIX] Ensure embeddings use correct device in AnyTextControlNetModel
      
      * up
      
      * up
      
      * style
      
      * [UPDATE] Revise README and example code for AnyTextPipeline integration with DiffusionPipeline
      
      * [UPDATE] Update example code in anytext.py to use correct font file and improve clarity
      
      * down
      
      * [UPDATE] Refactor BasicTokenizer usage to a new Checker class for text processing
      
      * update pillow
      
      * [UPDATE] Remove commented-out code and unnecessary docstring in anytext.py and anytext_controlnet.py for improved clarity
      
      * [REMOVE] Delete frozen_clip_embedder_t3.py as it is in the anytext.py file
      
      * [UPDATE] Replace edict with dict for configuration in anytext.py and RecModel.py for consistency
      
      * 🆙
      
      
      
      * style
      
      * [UPDATE] Revise README.md for clarity, remove unused imports in anytext.py, and add author credits in anytext_controlnet.py
      
      * style
      
      * Update examples/research_projects/anytext/README.md
      Co-authored-by: default avatarAryan <contact.aryanvs@gmail.com>
      
      * Remove commented-out image preparation code in AnyTextPipeline
      
      * Remove unnecessary blank line in README.md
      
      [Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6  (#11018)
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings  (#11012)
      
      small fix on generating time_ids & embeddings
      
      [LoRA] support wan i2v loras from the world. (#11025)
      
      * support wan i2v loras from the world.
      
      * remove copied from.
      
      * upates
      
      * add lora.
      
      Fix SD3 IPAdapter feature extractor (#11027)
      
      chore: fix help messages in advanced diffusion examples (#10923)
      
      Fix missing **kwargs in lora_pipeline.py (#11011)
      
      * Update lora_pipeline.py
      
      * Apply style fixes
      
      * fix-copies
      
      ---------
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      Co-authored-by: default avatargithub-actions[bot] <github-actions[bot]@users.noreply.github.com>
      
      Fix for multi-GPU WAN inference (#10997)
      
      Ensure that hidden_state and shift/scale are on the same device when running with multiple GPUs
      
      Co-authored-by: Jimmy <39@🇺🇸.com>
      
      [Refactor] Clean up import utils boilerplate (#11026)
      
      * update
      
      * update
      
      * update
      
      Use `output_size` in `repeat_interleave` (#11030)
      
      [hybrid inference 🍯🐝] Add VAE encode (#11017)
      
      * [hybrid inference 🍯🐝
      
      ] Add VAE encode
      
      * _toctree: add vae encode
      
      * Add endpoints, tests
      
      * vae_encode docs
      
      * vae encode benchmarks
      
      * api reference
      
      * changelog
      
      * Update docs/source/en/hybrid_inference/overview.md
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * update
      
      ---------
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      Wan Pipeline scaling fix, type hint warning, multi generator fix (#11007)
      
      * Wan Pipeline scaling fix, type hint warning, multi generator fix
      
      * Apply suggestions from code review
      
      [LoRA] change to warning from info when notifying the users about a LoRA no-op (#11044)
      
      * move to warning.
      
      * test related changes.
      
      Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline (#10827)
      
      * Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline
      
      ---------
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      making ```formatted_images``` initialization compact (#10801)
      
      compact writing
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      Fix aclnnRepeatInterleaveIntWithDim error on NPU for get_1d_rotary_pos_embed (#10820)
      
      * get_1d_rotary_pos_embed support npu
      
      * Update src/diffusers/models/embeddings.py
      
      ---------
      Co-authored-by: default avatarKai zheng <kaizheng@KaideMacBook-Pro.local>
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      [Tests] restrict memory tests for quanto for certain schemes. (#11052)
      
      * restrict memory tests for quanto for certain schemes.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      * fixes
      
      * style
      
      ---------
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      [LoRA] feat: support non-diffusers wan t2v loras. (#11059)
      
      feat: support non-diffusers wan t2v loras.
      
      [examples/controlnet/train_controlnet_sd3.py] Fixes #11050 - Cast prompt_embeds and pooled_prompt_embeds to weight_dtype to prevent dtype mismatch (#11051)
      
      Fix: dtype mismatch of prompt embeddings in sd3 controlnet training
      Co-authored-by: default avatarAndreas Jörg <andreasjoerg@MacBook-Pro-von-Andreas-2.fritz.box>
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      reverts accidental change that removes attn_mask in attn. Improves fl… (#11065)
      
      reverts accidental change that removes attn_mask in attn. Improves flux ptxla by using flash block sizes. Moves encoding outside the for loop.
      Co-authored-by: default avatarJuan Acevedo <jfacevedo@google.com>
      
      Fix deterministic issue when getting pipeline dtype and device (#10696)
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      [Tests] add requires peft decorator. (#11037)
      
      * add requires peft decorator.
      
      * install peft conditionally.
      
      * conditional deps.
      Co-authored-by: default avatarDN6 <dhruv.nair@gmail.com>
      
      ---------
      Co-authored-by: default avatarDN6 <dhruv.nair@gmail.com>
      
      CogView4 Control Block (#10809)
      
      * cogview4 control training
      
      ---------
      Co-authored-by: default avatarOleehyO <leehy0357@gmail.com>
      Co-authored-by: default avataryiyixuxu <yixu310@gmail.com>
      
      [CI] pin transformers version for benchmarking. (#11067)
      
      pin transformers version for benchmarking.
      
      updates
      
      Fix Wan I2V Quality (#11087)
      
      * fix_wan_i2v_quality
      
      * Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * Update pipeline_wan_i2v.py
      
      ---------
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      
      LTX 0.9.5 (#10968)
      
      * update
      
      ---------
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      
      make PR GPU tests conditioned on styling. (#11099)
      
      Group offloading improvements (#11094)
      
      update
      
      Fix pipeline_flux_controlnet.py (#11095)
      
      * Fix pipeline_flux_controlnet.py
      
      * Fix style
      
      update readme instructions. (#11096)
      Co-authored-by: default avatarJuan Acevedo <jfacevedo@google.com>
      
      Resolve stride mismatch in UNet's ResNet to support Torch DDP (#11098)
      
      Modify UNet's ResNet implementation to resolve stride mismatch in Torch's DDP
      
      Fix Group offloading behaviour when using streams (#11097)
      
      * update
      
      * update
      
      Quality options in `export_to_video` (#11090)
      
      * Quality options in `export_to_video`
      
      * make style
      
      improve more.
      
      add placeholders for docstrings.
      
      formatting.
      
      smol fix.
      
      solidify validation and annotation
      
      * Revert "feat: pipeline-level quant config."
      
      This reverts commit 316ff46b7648bfa24525ac02c284afcf440404aa.
      
      * feat: implement pipeline-level quantization config
      Co-authored-by: default avatarSunMarc <marc@huggingface.co>
      
      * update
      
      * fixes
      
      * fix validation.
      
      * add tests and other improvements.
      
      * add tests
      
      * import quality
      
      * remove prints.
      
      * add docs.
      
      * fixes to docs.
      
      * doc fixes.
      
      * doc fixes.
      
      * add validation to the input quantization_config.
      
      * clarify recommendations.
      
      * docs
      
      * add to ci.
      
      * todo.
      
      ---------
      Co-authored-by: default avatarSunMarc <marc@huggingface.co>
      599c8871
  6. 08 May, 2025 4 commits
  7. 07 May, 2025 2 commits
    • YiYi Xu's avatar
      clean up the __Init__ for stable_diffusion (#11500) · 53bd367b
      YiYi Xu authored
      up
      53bd367b
    • Aryan's avatar
      Cosmos (#10660) · 7b904941
      Aryan authored
      
      
      * begin transformer conversion
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * update
      
      * add conversion script
      
      * add pipeline
      
      * make fix-copies
      
      * remove einops
      
      * update docs
      
      * gradient checkpointing
      
      * add transformer test
      
      * update
      
      * debug
      
      * remove prints
      
      * match sigmas
      
      * add vae pt. 1
      
      * finish CV* vae
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * make fix-copies
      
      * update
      
      * make fix-copies
      
      * fix
      
      * update
      
      * update
      
      * make fix-copies
      
      * update
      
      * update tests
      
      * handle device and dtype for safety checker; required in latest diffusers
      
      * remove enable_gqa and use repeat_interleave instead
      
      * enforce safety checker; use dummy checker in fast tests
      
      * add review suggestion for ONNX export
      Co-Authored-By: default avatarAsfiya Baig <asfiyab@nvidia.com>
      
      * fix safety_checker issues when not passed explicitly
      
      We could either do what's done in this commit, or update the Cosmos examples to explicitly pass the safety checker
      
      * use cosmos guardrail package
      
      * auto format docs
      
      * update conversion script to support 14B models
      
      * update name CosmosPipeline -> CosmosTextToWorldPipeline
      
      * update docs
      
      * fix docs
      
      * fix group offload test failing for vae
      
      ---------
      Co-authored-by: default avatarAsfiya Baig <asfiyab@nvidia.com>
      7b904941
  8. 06 May, 2025 4 commits
  9. 05 May, 2025 2 commits
  10. 01 May, 2025 3 commits
  11. 30 Apr, 2025 4 commits
  12. 28 Apr, 2025 3 commits
  13. 24 Apr, 2025 2 commits
  14. 23 Apr, 2025 3 commits