1. 09 Jan, 2026 1 commit
  2. 18 Jun, 2025 1 commit
  3. 19 May, 2025 1 commit
  4. 20 Feb, 2025 1 commit
  5. 16 Dec, 2024 1 commit
    • Aryan's avatar
      [core] Hunyuan Video (#10136) · aace1f41
      Aryan authored
      
      
      * copy transformer
      
      * copy vae
      
      * copy pipeline
      
      * make fix-copies
      
      * refactor; make original code work with diffusers; test latents for comparison generated with this commit
      
      * move rope into pipeline; remove flash attention; refactor
      
      * begin conversion script
      
      * make style
      
      * refactor attention
      
      * refactor
      
      * refactor final layer
      
      * their mlp -> our feedforward
      
      * make style
      
      * add docs
      
      * refactor layer names
      
      * refactor modulation
      
      * cleanup
      
      * refactor norms
      
      * refactor activations
      
      * refactor single blocks attention
      
      * refactor attention processor
      
      * make style
      
      * cleanup a bit
      
      * refactor double transformer block attention
      
      * update mochi attn proc
      
      * use diffusers attention implementation in all modules; checkpoint for all values matching original
      
      * remove helper functions in vae
      
      * refactor upsample
      
      * refactor causal conv
      
      * refactor resnet
      
      * refactor
      
      * refactor
      
      * refactor
      
      * grad checkpointing
      
      * autoencoder test
      
      * fix scaling factor
      
      * refactor clip
      
      * refactor llama text encoding
      
      * add coauthor
      Co-Authored-By: default avatar"Gregory D. Hunkins" <greg@ollano.com>
      
      * refactor rope; diff: 0.14990234375; reason and fix: create rope grid on cpu and move to device
      
      Note: The following line diverges from original behaviour. We create the grid on the device, whereas
      original implementation creates it on CPU and then moves it to device. This results in numerical
      differences in layerwise debugging outputs, but visually it is the same.
      
      * use diffusers timesteps embedding; diff: 0.10205078125
      
      * rename
      
      * convert
      
      * update
      
      * add tests for transformer
      
      * add pipeline tests; text encoder 2 is not optional
      
      * fix attention implementation for torch
      
      * add example
      
      * update docs
      
      * update docs
      
      * apply suggestions from review
      
      * refactor vae
      
      * update
      
      * Apply suggestions from code review
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      
      * Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      
      * Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      
      * make fix-copies
      
      * update
      
      ---------
      Co-authored-by: default avatar"Gregory D. Hunkins" <greg@ollano.com>
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      aace1f41
  6. 13 Dec, 2024 1 commit
  7. 05 Nov, 2024 1 commit
    • Aryan's avatar
      [core] Mochi T2V (#9769) · 3f329a42
      Aryan authored
      
      
      * update
      
      * udpate
      
      * update transformer
      
      * make style
      
      * fix
      
      * add conversion script
      
      * update
      
      * fix
      
      * update
      
      * fix
      
      * update
      
      * fixes
      
      * make style
      
      * update
      
      * update
      
      * update
      
      * init
      
      * update
      
      * update
      
      * add
      
      * up
      
      * up
      
      * up
      
      * update
      
      * mochi transformer
      
      * remove original implementation
      
      * make style
      
      * update inits
      
      * update conversion script
      
      * docs
      
      * Update src/diffusers/pipelines/mochi/pipeline_mochi.py
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      * Update src/diffusers/pipelines/mochi/pipeline_mochi.py
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      * fix docs
      
      * pipeline fixes
      
      * make style
      
      * invert sigmas in scheduler; fix pipeline
      
      * fix pipeline num_frames
      
      * flip proj and gate in swiglu
      
      * make style
      
      * fix
      
      * make style
      
      * fix tests
      
      * latent mean and std fix
      
      * update
      
      * cherry-pick 1069d210e1b9e84a366cdc7a13965626ea258178
      
      * remove additional sigma already handled by flow match scheduler
      
      * fix
      
      * remove hardcoded value
      
      * replace conv1x1 with linear
      
      * Update src/diffusers/pipelines/mochi/pipeline_mochi.py
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      
      * framewise decoding and conv_cache
      
      * make style
      
      * Apply suggestions from code review
      
      * mochi vae encoder changes
      
      * rebase correctly
      
      * Update scripts/convert_mochi_to_diffusers.py
      
      * fix tests
      
      * fixes
      
      * make style
      
      * update
      
      * make style
      
      * update
      
      * add framewise and tiled encoding
      
      * make style
      
      * make original vae implementation behaviour the default; note: framewise encoding does not work
      
      * remove framewise encoding implementation due to presence of attn layers
      
      * fight test 1
      
      * fight test 2
      
      ---------
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      Co-authored-by: default avataryiyixuxu <yixu310@gmail.com>
      3f329a42
  8. 30 Jul, 2024 1 commit
    • Yoach Lacombe's avatar
      Stable Audio integration (#8716) · 69e72b1d
      Yoach Lacombe authored
      
      
      * WIP modeling code and pipeline
      
      * add custom attention processor + custom activation + add to init
      
      * correct ProjectionModel forward
      
      * add stable audio to __initèè
      
      * add autoencoder and update pipeline and modeling code
      
      * add half Rope
      
      * add partial rotary v2
      
      * add temporary modfis to scheduler
      
      * add EDM DPM Solver
      
      * remove TODOs
      
      * clean GLU
      
      * remove att.group_norm to attn processor
      
      * revert back src/diffusers/schedulers/scheduling_dpmsolver_multistep.py
      
      * refactor GLU -> SwiGLU
      
      * remove redundant args
      
      * add channel multiples in autoencoder docstrings
      
      * changes in docsrtings and copyright headers
      
      * clean pipeline
      
      * further cleaning
      
      * remove peft and lora and fromoriginalmodel
      
      * Delete src/diffusers/pipelines/stable_audio/diffusers.code-workspace
      
      * make style
      
      * dummy models
      
      * fix copied from
      
      * add fast oobleck tests
      
      * add brownian tree
      
      * oobleck autoencoder slow tests
      
      * remove TODO
      
      * fast stable audio pipeline tests
      
      * add slow tests
      
      * make style
      
      * add first version of docs
      
      * wrap is_torchsde_available to the scheduler
      
      * fix slow test
      
      * test with input waveform
      
      * add input waveform
      
      * remove some todos
      
      * create stableaudio gaussian projection + make style
      
      * add pipeline to toctree
      
      * fix copied from
      
      * make quality
      
      * refactor timestep_features->time_proj
      
      * refactor joint_attention_kwargs->cross_attention_kwargs
      
      * remove forward_chunk
      
      * move StableAudioDitModel to transformers folder
      
      * correct convert + remove partial rotary embed
      
      * apply suggestions from yiyixuxu -> removing attn.kv_heads
      
      * remove temb
      
      * remove cross_attention_kwargs
      
      * further removal of cross_attention_kwargs
      
      * remove text encoder autocast to fp16
      
      * continue removing autocast
      
      * make style
      
      * refactor how text and audio are embedded
      
      * add paper
      
      * update example code
      
      * make style
      
      * unify projection model forward + fix device placement
      
      * make style
      
      * remove fuse qkv
      
      * apply suggestions from review
      
      * Update src/diffusers/pipelines/stable_audio/pipeline_stable_audio.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * make style
      
      * smaller models in fast tests
      
      * pass sequential offloading fast tests
      
      * add docs for vae and autoencoder
      
      * make style and update example
      
      * remove useless import
      
      * add cosine scheduler
      
      * dummy classes
      
      * cosine scheduler docs
      
      * better description of scheduler
      
      ---------
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      69e72b1d
  9. 01 Jun, 2024 1 commit
  10. 03 May, 2024 1 commit
  11. 13 Mar, 2024 1 commit
    • Sayak Paul's avatar
      [LoRA] use the PyTorch classes wherever needed and start depcrecation cycles (#7204) · 531e7191
      Sayak Paul authored
      * fix PyTorch classes and start deprecsation cycles.
      
      * remove args crafting for accommodating scale.
      
      * remove scale check in feedforward.
      
      * assert against nn.Linear and not CompatibleLinear.
      
      * remove conv_cls and lineaR_cls.
      
      * remove scale
      
      * 👋
      
       scale.
      
      * fix: unet2dcondition
      
      * fix attention.py
      
      * fix: attention.py again
      
      * fix: unet_2d_blocks.
      
      * fix-copies.
      
      * more fixes.
      
      * fix: resnet.py
      
      * more fixes
      
      * fix i2vgenxl unet.
      
      * depcrecate scale gently.
      
      * fix-copies
      
      * Apply suggestions from code review
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * quality
      
      * throw warning when scale is passed to the the BasicTransformerBlock class.
      
      * remove scale from signature.
      
      * cross_attention_kwargs, very nice catch by Yiyi
      
      * fix: logger.warn
      
      * make deprecation message clearer.
      
      * address final comments.
      
      * maintain same depcrecation message and also add it to activations.
      
      * address yiyi
      
      * fix copies
      
      * Apply suggestions from code review
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * more depcrecation
      
      * fix-copies
      
      ---------
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      531e7191
  12. 08 Feb, 2024 1 commit
  13. 04 Dec, 2023 1 commit
    • takuoko's avatar
      [Feature] Support IP-Adapter Plus (#5915) · 0a08d419
      takuoko authored
      
      
      * Support IP-Adapter Plus
      
      * fix format
      
      * restore before black format
      
      * restore before black format
      
      * generic
      
      * Refactor PerceiverAttention
      
      * format
      
      * fix test and refactor PerceiverAttention
      
      * generic encode_image
      
      * keep attention implementation
      
      * merge tests
      
      * encode_image backward compatible
      
      * code quality
      
      * fix controlnet inpaint pipeline
      
      * refactor FFN
      
      * refactor FFN
      
      ---------
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      0a08d419
  14. 26 Oct, 2023 1 commit
    • Chi's avatar
      Remove multiple if-else statement in the get_activation function. (#5446) · ce7f3344
      Chi authored
      
      
      * I added a new doc string to the class. This is more flexible to understanding other developers what are doing and where it's using.
      
      * Update src/diffusers/models/unet_2d_blocks.py
      
      This changes suggest by maintener.
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * Update src/diffusers/models/unet_2d_blocks.py
      
      Add suggested text
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * Update unet_2d_blocks.py
      
      I changed the Parameter to Args text.
      
      * Update unet_2d_blocks.py
      
      proper indentation set in this file.
      
      * Update unet_2d_blocks.py
      
      a little bit of change in the act_fun argument line.
      
      * I run the black command to reformat style in the code
      
      * Update unet_2d_blocks.py
      
      similar doc-string add to have in the original diffusion repository.
      
      * I use a lower method in the activation function.
      
      * Replace multiple if-else statements with a dictionary of activation functions, and call one if statement to retrieve the appropriate function.
      
      * I am using black package to reforamted my file
      
      * I defined the ACTIVATION_FUNCTIONS variable outside of the function
      
      * activation function variable convert to lower case
      
      * First, I resolved the conflict issue. Then, I ran the Black package to reformat my file.
      
      ---------
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      ce7f3344
  15. 24 Oct, 2023 1 commit
  16. 09 Oct, 2023 1 commit
  17. 02 Aug, 2023 1 commit
    • Sayak Paul's avatar
      [Feat] add tiny Autoencoder for (almost) instant decoding (#4384) · 18fc40c1
      Sayak Paul authored
      
      
      * add: model implementation of tiny autoencoder.
      
      * add: inits.
      
      * push the latest devs.
      
      * add: conversion script and finish.
      
      * add: scaling factor args.
      
      * debugging
      
      * fix denormalization.
      
      * fix: positional argument.
      
      * handle use_torch_2_0_or_xformers.
      
      * handle post_quant_conv
      
      * handle dtype
      
      * fix: sdxl image processor for tiny ae.
      
      * fix: sdxl image processor for tiny ae.
      
      * unify upcasting logic.
      
      * copied from madness.
      
      * remove trailing whitespace.
      
      * set is_tiny_vae = False
      
      * address PR comments.
      
      * change to AutoencoderTiny
      
      * make act_fn an str throughout
      
      * fix: apply_forward_hook decorator call
      
      * get rid of the special is_tiny_vae flag.
      
      * directly scale the output.
      
      * fix dummies?
      
      * fix: act_fn.
      
      * get rid of the Clamp() layer.
      
      * bring back copied from.
      
      * movement of the blocks to appropriate modules.
      
      * add: docstrings to AutoencoderTiny
      
      * add: documentation.
      
      * changes to the conversion script.
      
      * add doc entry.
      
      * settle tests.
      
      * style
      
      * add one slow test.
      
      * fix
      
      * fix 2
      
      * fix 2
      
      * fix: 4
      
      * fix: 5
      
      * finish integration tests
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * style
      
      ---------
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      18fc40c1
  18. 05 Jun, 2023 1 commit