1. 21 Mar, 2025 1 commit
  2. 15 Dec, 2024 1 commit
    • Junsong Chen's avatar
      [Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`,... · 5a196e3d
      Junsong Chen authored
      
      [Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on. (#9982)
      
      * first add a script for DC-AE;
      
      * DC-AE init
      
      * replace triton with custom implementation
      
      * 1. rename file and remove un-used codes;
      
      * no longer rely on omegaconf and dataclass
      
      * replace custom activation with diffuers activation
      
      * remove dc_ae attention in attention_processor.py
      
      * iinherit from ModelMixin
      
      * inherit from ConfigMixin
      
      * dc-ae reduce to one file
      
      * update downsample and upsample
      
      * clean code
      
      * support DecoderOutput
      
      * remove get_same_padding and val2tuple
      
      * remove autocast and some assert
      
      * update ResBlock
      
      * remove contents within super().__init__
      
      * Update src/diffusers/models/autoencoders/dc_ae.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * remove opsequential
      
      * update other blocks to support the removal of build_norm
      
      * remove build encoder/decoder project in/out
      
      * remove inheritance of RMSNorm2d from LayerNorm
      
      * remove reset_parameters for RMSNorm2d
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * remove device and dtype in RMSNorm2d __init__
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * Update src/diffusers/models/autoencoders/dc_ae.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * Update src/diffusers/models/autoencoders/dc_ae.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * Update src/diffusers/models/autoencoders/dc_ae.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * remove op_list & build_block
      
      * remove build_stage_main
      
      * change file name to autoencoder_dc
      
      * move LiteMLA to attention.py
      
      * align with other vae decode output;
      
      * add DC-AE into init files;
      
      * update
      
      * make quality && make style;
      
      * quick push before dgx disappears again
      
      * update
      
      * make style
      
      * update
      
      * update
      
      * fix
      
      * refactor
      
      * refactor
      
      * refactor
      
      * update
      
      * possibly change to nn.Linear
      
      * refactor
      
      * make fix-copies
      
      * replace vae with ae
      
      * replace get_block_from_block_type to get_block
      
      * replace downsample_block_type from Conv to conv for consistency
      
      * add scaling factors
      
      * incorporate changes for all checkpoints
      
      * make style
      
      * move mla to attention processor file; split qkv conv to linears
      
      * refactor
      
      * add tests
      
      * from original file loader
      
      * add docs
      
      * add standard autoencoder methods
      
      * combine attention processor
      
      * fix tests
      
      * update
      
      * minor fix
      
      * minor fix
      
      * minor fix & in/out shortcut rename
      
      * minor fix
      
      * make style
      
      * fix paper link
      
      * update docs
      
      * update single file loading
      
      * make style
      
      * remove single file loading support; todo for DN6
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * add abstract
      
      * 1. add DCAE into diffusers;
      2. make style and make quality;
      
      * add DCAE_HF into diffusers;
      
      * bug fixed;
      
      * add SanaPipeline, SanaTransformer2D into diffusers;
      
      * add sanaLinearAttnProcessor2_0;
      
      * first update for SanaTransformer;
      
      * first update for SanaPipeline;
      
      * first success run SanaPipeline;
      
      * model output finally match with original model with the same intput;
      
      * code update;
      
      * code update;
      
      * add a flow dpm-solver scripts
      
      * 🎉[important update]
      1. Integrate flow-dpm-sovler into diffusers;
      2. finally run successfully on both `FlowMatchEulerDiscreteScheduler` and `FlowDPMSolverMultistepScheduler`;
      
      * 🎉🔧
      
      [important update & fix huge bugs!!]
      1. add SanaPAGPipeline & several related Sana linear attention operators;
      2. `SanaTransformer2DModel` not supports multi-resolution input;
      2. fix the multi-scale HW bugs in SanaPipeline and SanaPAGPipeline;
      3. fix the flow-dpm-solver set_timestep() init `model_output` and `lower_order_nums` bugs;
      
      * remove prints;
      
      * add convert sana official checkpoint to diffusers format Safetensor.
      
      * Update src/diffusers/models/transformers/sana_transformer_2d.py
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update src/diffusers/models/transformers/sana_transformer_2d.py
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update src/diffusers/models/transformers/sana_transformer_2d.py
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update src/diffusers/pipelines/pag/pipeline_pag_sana.py
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update src/diffusers/models/transformers/sana_transformer_2d.py
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update src/diffusers/models/transformers/sana_transformer_2d.py
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update src/diffusers/pipelines/sana/pipeline_sana.py
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update src/diffusers/pipelines/sana/pipeline_sana.py
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * update Sana for DC-AE's recent commit;
      
      * make style && make quality
      
      * Add StableDiffusion3PAGImg2Img Pipeline + Fix SD3 Unconditional PAG (#9932)
      
      * fix progress bar updates in SD 1.5 PAG Img2Img pipeline
      
      ---------
      Co-authored-by: default avatarVinh H. Pham <phamvinh257@gmail.com>
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * make the vae can be None in `__init__` of `SanaPipeline`
      
      * Update src/diffusers/models/transformers/sana_transformer_2d.py
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      
      * change the ae related code due to the latest update of DCAE branch;
      
      * change the ae related code due to the latest update of DCAE branch;
      
      * 1. change code based on AutoencoderDC;
      2. fix the bug of new GLUMBConv;
      3. run success;
      
      * update for solving conversation.
      
      * 1. fix bugs and run convert script success;
      2. Downloading ckpt from hub automatically;
      
      * make style && make quality;
      
      * 1. remove un-unsed parameters in init;
      2. code update;
      
      * remove test file
      
      * refactor; add docs; add tests; update conversion script
      
      * make style
      
      * make fix-copies
      
      * refactor
      
      * udpate pipelines
      
      * pag tests and refactor
      
      * remove sana pag conversion script
      
      * handle weight casting in conversion script
      
      * update conversion script
      
      * add a processor
      
      * 1. add bf16 pth file path;
      2. add complex human instruct in pipeline;
      
      * fix fast \tests
      
      * change gemma-2-2b-it ckpt to a non-gated repo;
      
      * fix the pth path bug in conversion script;
      
      * change grad ckpt to original; make style
      
      * fix the complex_human_instruct bug and typo;
      
      * remove dpmsolver flow scheduler
      
      * apply review suggestions
      
      * change the `FlowMatchEulerDiscreteScheduler` to default `DPMSolverMultistepScheduler` with flow matching scheduler.
      
      * fix the tokenizer.padding_side='right' bug;
      
      * update docs
      
      * make fix-copies
      
      * fix imports
      
      * fix docs
      
      * add integration test
      
      * update docs
      
      * update examples
      
      * fix convert_model_output in schedulers
      
      * fix failing tests
      
      ---------
      Co-authored-by: default avatarJunyu Chen <chenjydl2003@gmail.com>
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      Co-authored-by: default avatarchenjy2003 <70215701+chenjy2003@users.noreply.github.com>
      Co-authored-by: default avatarAryan <aryan@huggingface.co>
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      5a196e3d
  3. 11 Jul, 2024 1 commit
    • Xin Ma's avatar
      Latte: Latent Diffusion Transformer for Video Generation (#8404) · b8cf84a3
      Xin Ma authored
      
      
      * add Latte to diffusers
      
      * remove print
      
      * remove print
      
      * remove print
      
      * remove unuse codes
      
      * remove layer_norm_latte and add a flag
      
      * remove layer_norm_latte and add a flag
      
      * update latte_pipeline
      
      * update latte_pipeline
      
      * remove unuse squeeze
      
      * add norm_hidden_states.ndim == 2: # for Latte
      
      * fixed test latte pipeline bugs
      
      * fixed test latte pipeline bugs
      
      * delete sh
      
      * add doc for latte
      
      * add licensing
      
      * Move Transformer3DModelOutput to modeling_outputs
      
      * give a default value to sample_size
      
      * remove the einops dependency
      
      * change norm2 for latte
      
      * modify pipeline of latte
      
      * update test for Latte
      
      * modify some codes for latte
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * modify for Latte pipeline
      
      * video_length -> num_frames; update prepare_latents copied from
      
      * make fix-copies
      
      * make style
      
      * typo: videe -> video
      
      * update
      
      * modify for Latte pipeline
      
      * modify latte pipeline
      
      * modify latte pipeline
      
      * modify latte pipeline
      
      * modify latte pipeline
      
      * modify for Latte pipeline
      
      * Delete .vscode directory
      
      * make style
      
      * make fix-copies
      
      * add latte transformer 3d to docs _toctree.yml
      
      * update example
      
      * reduce frames for test
      
      * fixed bug of _text_preprocessing
      
      * set num frame to 1 for testing
      
      * remove unuse print
      
      * add text = self._clean_caption(text) again
      
      ---------
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      Co-authored-by: default avatarAryan <contact.aryanvs@gmail.com>
      Co-authored-by: default avatarAryan <aryan@huggingface.co>
      b8cf84a3
  4. 01 Jun, 2024 1 commit
  5. 31 Jan, 2024 1 commit
  6. 07 Nov, 2023 1 commit
  7. 06 Nov, 2023 1 commit
    • Sayak Paul's avatar
      [Feat] PixArt-Alpha (#5642) · d61889fc
      Sayak Paul authored
      
      
      * init pixart alpha pipeline
      
      * fix: import
      
      * script
      
      * script
      
      * script
      
      * add: vae to the pipeline
      
      * add: vae_scale_factor
      
      * add: checkpoint_path
      
      * clean conversion script a bit.
      
      * size embeddings.
      
      * fix: size embedding
      
      * update scrip
      
      * support for interpolation of position embedding.
      
      * support for conditioning.
      
      * ..
      
      * ..
      
      * ..
      
      * final layer
      
      * final layer
      
      * align if encode_prompt
      
      * support for caption embedding
      
      * refactor
      
      * refactor
      
      * refactor
      
      * start cross attention
      
      * start cross attention
      
      * cross_attention_dim
      
      * cross
      
      * cross
      
      * support for resolution and aspect_ratio
      
      * support for caption projection
      
      * refactor patch embeddings
      
      * batch_size
      
      * up
      
      * commit
      
      * commit
      
      * commit.
      
      * squeeze
      
      * squeeze
      
      * squeeze
      
      * squeeze
      
      * squeeze
      
      * squeeze
      
      * squeeze
      
      * squeeze
      
      * squeeze
      
      * squeeze
      
      * squeeze
      
      * squeeze.
      
      * squeeze.
      
      * fix final block./
      
      * fix final block./
      
      * fix final block./
      
      * clean
      
      * fix: interpolation scale.
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging'
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * make --checkpoint_path non-required.
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * debugging
      
      * remove num_tokens
      
      * timesteps -> timestep
      
      * timesteps -> timestep
      
      * timesteps -> timestep
      
      * timesteps -> timestep
      
      * timesteps -> timestep
      
      * timesteps -> timestep
      
      * debug
      
      * debug
      
      * update conversion script.
      
      * update conversion script.
      
      * update conversion script.
      
      * debug
      
      * debug
      
      * debug
      
      * clean
      
      * debug
      
      * debug
      
      * debug
      
      * debug
      
      * debug
      
      * debug
      
      * debug
      
      * debug
      
      * deug
      
      * debug
      
      * debug
      
      * debug
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * clean
      
      * fix
      
      * fix
      
      * boom
      
      * boom
      
      * some changes
      
      * boom
      
      * save
      
      * up
      
      * remove i
      
      * fix more tests
      
      * DPMSolverMultistepScheduler
      
      * fix
      
      * offloading
      
      * fix conversion script
      
      * fix conversion script
      
      * remove print
      
      * remove support for negative prompt embeds.
      
      * typo.
      
      * remove extra kwargs
      
      * bring conversion script to where it was
      
      * fix
      
      * trying mu luck
      
      * trying my luck again
      
      * again
      
      * again
      
      * again
      
      * clean up
      
      * up
      
      * up
      
      * update example
      
      * support for 512
      
      * remove spacing
      
      * finalize docs.
      
      * test debug
      
      * fix: assertion values.
      
      * debug
      
      * debug
      
      * debug
      
      * fix: repeat
      
      * remove prints.
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      
      * Correct more
      
      * Apply suggestions from code review
      
      * Change all
      
      * Clean more
      
      * fix more
      
      * Fix more
      
      * Fix more
      
      * Correct more
      
      * address patrick's comments.
      
      * remove unneeded args
      
      * clean up pipeline.
      
      * sty;e
      
      * make the use of additional conditions better conditioned.
      
      * None better
      
      * dtype
      
      * height and width validation
      
      * add a note about size brackets.
      
      * fix
      
      * spit out slow test outputs.
      
      * fix?
      
      * fix optional test
      
      * fix more
      
      * remove unneeded comment
      
      * debug
      
      ---------
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      d61889fc