1. 20 Dec, 2024 1 commit
  2. 19 Dec, 2024 1 commit
  3. 16 Dec, 2024 1 commit
    • Aryan's avatar
      [core] Hunyuan Video (#10136) · aace1f41
      Aryan authored
      
      
      * copy transformer
      
      * copy vae
      
      * copy pipeline
      
      * make fix-copies
      
      * refactor; make original code work with diffusers; test latents for comparison generated with this commit
      
      * move rope into pipeline; remove flash attention; refactor
      
      * begin conversion script
      
      * make style
      
      * refactor attention
      
      * refactor
      
      * refactor final layer
      
      * their mlp -> our feedforward
      
      * make style
      
      * add docs
      
      * refactor layer names
      
      * refactor modulation
      
      * cleanup
      
      * refactor norms
      
      * refactor activations
      
      * refactor single blocks attention
      
      * refactor attention processor
      
      * make style
      
      * cleanup a bit
      
      * refactor double transformer block attention
      
      * update mochi attn proc
      
      * use diffusers attention implementation in all modules; checkpoint for all values matching original
      
      * remove helper functions in vae
      
      * refactor upsample
      
      * refactor causal conv
      
      * refactor resnet
      
      * refactor
      
      * refactor
      
      * refactor
      
      * grad checkpointing
      
      * autoencoder test
      
      * fix scaling factor
      
      * refactor clip
      
      * refactor llama text encoding
      
      * add coauthor
      Co-Authored-By: default avatar"Gregory D. Hunkins" <greg@ollano.com>
      
      * refactor rope; diff: 0.14990234375; reason and fix: create rope grid on cpu and move to device
      
      Note: The following line diverges from original behaviour. We create the grid on the device, whereas
      original implementation creates it on CPU and then moves it to device. This results in numerical
      differences in layerwise debugging outputs, but visually it is the same.
      
      * use diffusers timesteps embedding; diff: 0.10205078125
      
      * rename
      
      * convert
      
      * update
      
      * add tests for transformer
      
      * add pipeline tests; text encoder 2 is not optional
      
      * fix attention implementation for torch
      
      * add example
      
      * update docs
      
      * update docs
      
      * apply suggestions from review
      
      * refactor vae
      
      * update
      
      * Apply suggestions from code review
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      
      * Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      
      * Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      
      * make fix-copies
      
      * update
      
      ---------
      Co-authored-by: default avatar"Gregory D. Hunkins" <greg@ollano.com>
      Co-authored-by: default avatarhlky <hlky@hlky.ac>
      aace1f41
  4. 06 Dec, 2024 1 commit
    • Junsong Chen's avatar
      [DC-AE] Add the official Deep Compression Autoencoder code(32x,64x,128x compression ratio); (#9708) · cd892041
      Junsong Chen authored
      
      
      * first add a script for DC-AE;
      
      * DC-AE init
      
      * replace triton with custom implementation
      
      * 1. rename file and remove un-used codes;
      
      * no longer rely on omegaconf and dataclass
      
      * replace custom activation with diffuers activation
      
      * remove dc_ae attention in attention_processor.py
      
      * iinherit from ModelMixin
      
      * inherit from ConfigMixin
      
      * dc-ae reduce to one file
      
      * update downsample and upsample
      
      * clean code
      
      * support DecoderOutput
      
      * remove get_same_padding and val2tuple
      
      * remove autocast and some assert
      
      * update ResBlock
      
      * remove contents within super().__init__
      
      * Update src/diffusers/models/autoencoders/dc_ae.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * remove opsequential
      
      * update other blocks to support the removal of build_norm
      
      * remove build encoder/decoder project in/out
      
      * remove inheritance of RMSNorm2d from LayerNorm
      
      * remove reset_parameters for RMSNorm2d
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * remove device and dtype in RMSNorm2d __init__
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * Update src/diffusers/models/autoencoders/dc_ae.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * Update src/diffusers/models/autoencoders/dc_ae.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * Update src/diffusers/models/autoencoders/dc_ae.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * remove op_list & build_block
      
      * remove build_stage_main
      
      * change file name to autoencoder_dc
      
      * move LiteMLA to attention.py
      
      * align with other vae decode output;
      
      * add DC-AE into init files;
      
      * update
      
      * make quality && make style;
      
      * quick push before dgx disappears again
      
      * update
      
      * make style
      
      * update
      
      * update
      
      * fix
      
      * refactor
      
      * refactor
      
      * refactor
      
      * update
      
      * possibly change to nn.Linear
      
      * refactor
      
      * make fix-copies
      
      * replace vae with ae
      
      * replace get_block_from_block_type to get_block
      
      * replace downsample_block_type from Conv to conv for consistency
      
      * add scaling factors
      
      * incorporate changes for all checkpoints
      
      * make style
      
      * move mla to attention processor file; split qkv conv to linears
      
      * refactor
      
      * add tests
      
      * from original file loader
      
      * add docs
      
      * add standard autoencoder methods
      
      * combine attention processor
      
      * fix tests
      
      * update
      
      * minor fix
      
      * minor fix
      
      * minor fix & in/out shortcut rename
      
      * minor fix
      
      * make style
      
      * fix paper link
      
      * update docs
      
      * update single file loading
      
      * make style
      
      * remove single file loading support; todo for DN6
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * add abstract
      
      ---------
      Co-authored-by: default avatarJunyu Chen <chenjydl2003@gmail.com>
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      Co-authored-by: default avatarchenjy2003 <70215701+chenjy2003@users.noreply.github.com>
      Co-authored-by: default avatarAryan <aryan@huggingface.co>
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      cd892041
  5. 04 Dec, 2024 1 commit
    • Sayak Paul's avatar
      [tests] refactor vae tests (#9808) · c1926cef
      Sayak Paul authored
      
      
      * add: autoencoderkl tests
      
      * autoencodertiny.
      
      * fix
      
      * asymmetric autoencoder.
      
      * more
      
      * integration tests for stable audio decoder.
      
      * consistency decoder vae tests
      
      * remove grad check from consistency decoder.
      
      * cog
      
      * bye test_models_vae.py
      
      * fix
      
      * fix
      
      * remove allegro
      
      * fixes
      
      * fixes
      
      * fixes
      
      ---------
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      c1926cef
  6. 03 Dec, 2024 1 commit
  7. 31 Oct, 2024 1 commit
  8. 24 Sep, 2024 1 commit
  9. 21 Sep, 2024 1 commit
    • Sayak Paul's avatar
      [CI] fix nightly model tests (#9483) · aa73072f
      Sayak Paul authored
      * check if default attn procs fix it.
      
      * print
      
      * print
      
      * replace
      
      * style./
      
      * replace revision with variant.
      
      * replace with stable-diffusion-v1-5/stable-diffusion-inpainting.
      
      * replace with stable-diffusion-v1-5/stable-diffusion-v1-5.
      
      * fix
      aa73072f
  10. 12 Sep, 2024 1 commit
  11. 04 Sep, 2024 1 commit
  12. 03 Sep, 2024 1 commit
  13. 30 Jul, 2024 2 commits
    • Yoach Lacombe's avatar
      Fix Stable Audio repository id (#9016) · ea1b4ea7
      Yoach Lacombe authored
      Fix Stable Audio repo id
      ea1b4ea7
    • Yoach Lacombe's avatar
      Stable Audio integration (#8716) · 69e72b1d
      Yoach Lacombe authored
      
      
      * WIP modeling code and pipeline
      
      * add custom attention processor + custom activation + add to init
      
      * correct ProjectionModel forward
      
      * add stable audio to __initèè
      
      * add autoencoder and update pipeline and modeling code
      
      * add half Rope
      
      * add partial rotary v2
      
      * add temporary modfis to scheduler
      
      * add EDM DPM Solver
      
      * remove TODOs
      
      * clean GLU
      
      * remove att.group_norm to attn processor
      
      * revert back src/diffusers/schedulers/scheduling_dpmsolver_multistep.py
      
      * refactor GLU -> SwiGLU
      
      * remove redundant args
      
      * add channel multiples in autoencoder docstrings
      
      * changes in docsrtings and copyright headers
      
      * clean pipeline
      
      * further cleaning
      
      * remove peft and lora and fromoriginalmodel
      
      * Delete src/diffusers/pipelines/stable_audio/diffusers.code-workspace
      
      * make style
      
      * dummy models
      
      * fix copied from
      
      * add fast oobleck tests
      
      * add brownian tree
      
      * oobleck autoencoder slow tests
      
      * remove TODO
      
      * fast stable audio pipeline tests
      
      * add slow tests
      
      * make style
      
      * add first version of docs
      
      * wrap is_torchsde_available to the scheduler
      
      * fix slow test
      
      * test with input waveform
      
      * add input waveform
      
      * remove some todos
      
      * create stableaudio gaussian projection + make style
      
      * add pipeline to toctree
      
      * fix copied from
      
      * make quality
      
      * refactor timestep_features->time_proj
      
      * refactor joint_attention_kwargs->cross_attention_kwargs
      
      * remove forward_chunk
      
      * move StableAudioDitModel to transformers folder
      
      * correct convert + remove partial rotary embed
      
      * apply suggestions from yiyixuxu -> removing attn.kv_heads
      
      * remove temb
      
      * remove cross_attention_kwargs
      
      * further removal of cross_attention_kwargs
      
      * remove text encoder autocast to fp16
      
      * continue removing autocast
      
      * make style
      
      * refactor how text and audio are embedded
      
      * add paper
      
      * update example code
      
      * make style
      
      * unify projection model forward + fix device placement
      
      * make style
      
      * remove fuse qkv
      
      * apply suggestions from review
      
      * Update src/diffusers/pipelines/stable_audio/pipeline_stable_audio.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * make style
      
      * smaller models in fast tests
      
      * pass sequential offloading fast tests
      
      * add docs for vae and autoencoder
      
      * make style and update example
      
      * remove useless import
      
      * add cosine scheduler
      
      * dummy classes
      
      * cosine scheduler docs
      
      * better description of scheduler
      
      ---------
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      69e72b1d
  14. 04 Jul, 2024 1 commit
  15. 15 May, 2024 1 commit
    • Isamu Isozaki's avatar
      Adding VQGAN Training script (#5483) · d27e996c
      Isamu Isozaki authored
      
      
      * Init commit
      
      * Removed einops
      
      * Added default movq config for training
      
      * Update explanation of prompts
      
      * Fixed inheritance of discriminator and init_tracker
      
      * Fixed incompatible api between muse and here
      
      * Fixed output
      
      * Setup init training
      
      * Basic structure done
      
      * Removed attention for quick tests
      
      * Style fixes
      
      * Fixed vae/vqgan styles
      
      * Removed redefinition of wandb
      
      * Fixed log_validation and tqdm
      
      * Nothing commit
      
      * Added commit loss to lookup_from_codebook
      
      * Update src/diffusers/models/vq_model.py
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * Adding perliminary README
      
      * Fixed one typo
      
      * Local changes
      
      * Fixed main issues
      
      * Merging
      
      * Update src/diffusers/models/vq_model.py
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * Testing+Fixed bugs in training script
      
      * Some style fixes
      
      * Added wandb to docs
      
      * Fixed timm test
      
      * get testing suite ready.
      
      * remove return loss
      
      * remove return_loss
      
      * Remove diffs
      
      * Remove diffs
      
      * fix ruff format
      
      ---------
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      d27e996c
  16. 09 May, 2024 1 commit
    • Dhruv Nair's avatar
      [Refactor] Better align `from_single_file` logic with `from_pretrained` (#7496) · cb0f3b49
      Dhruv Nair authored
      
      
      * refactor unet single file loading a bit.
      
      * retrieve the unet from create_diffusers_unet_model_from_ldm
      
      * update
      
      * update
      
      * updae
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * tests
      
      * update
      
      * update
      
      * update
      
      * Update docs/source/en/api/single_file.md
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * Update docs/source/en/api/single_file.md
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * Update docs/source/en/api/loaders/single_file.md
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * Update src/diffusers/loaders/single_file.py
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      
      * Update docs/source/en/api/loaders/single_file.md
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * Update docs/source/en/api/loaders/single_file.md
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * Update docs/source/en/api/loaders/single_file.md
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * Update docs/source/en/api/loaders/single_file.md
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      ---------
      Co-authored-by: default avatarsayakpaul <spsayakpaul@gmail.com>
      Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
      cb0f3b49
  17. 24 Apr, 2024 1 commit
  18. 05 Apr, 2024 1 commit
    • Sayak Paul's avatar
      [Tests] reduce block sizes of UNet and VAE tests (#7560) · 1c60e094
      Sayak Paul authored
      * reduce block sizes for unet1d.
      
      * reduce blocks for unet_2d.
      
      * reduce block size for unet_motion
      
      * increase channels.
      
      * correctly increase channels.
      
      * reduce number of layers in unet2dconditionmodel tests.
      
      * reduce block sizes for unet2dconditionmodel tests
      
      * reduce block sizes for unet3dconditionmodel.
      
      * fix: test_feed_forward_chunking
      
      * fix: test_forward_with_norm_groups
      
      * skip spatiotemporal tests on MPS.
      
      * reduce block size in AutoencoderKL.
      
      * reduce block sizes for vqmodel.
      
      * further reduce block size.
      
      * make style.
      
      * Empty-Commit
      
      * reduce sizes for ConsistencyDecoderVAETests
      
      * further reduction.
      
      * further block reductions in AutoencoderKL and AssymetricAutoencoderKL.
      
      * massively reduce the block size in unet2dcontionmodel.
      
      * reduce sizes for unet3d
      
      * fix tests in unet3d.
      
      * reduce blocks further in motion unet.
      
      * fix: output shape
      
      * add attention_head_dim to the test configuration.
      
      * remove unexpected keyword arg
      
      * up a bit.
      
      * groups.
      
      * up again
      
      * fix
      1c60e094
  19. 29 Mar, 2024 2 commits
  20. 26 Mar, 2024 1 commit
    • M. Tolga Cangöz's avatar
      Fix Tiling in `ConsistencyDecoderVAE` (#7290) · 443aa14e
      M. Tolga Cangöz authored
      
      
      * Fix typos
      
      * Add docstring to `decode` method in `ConsistencyDecoderVAE`
      
      * Fix tiling
      
      * Enable tiled VAE decoding with customizable tile sample size and overlap factor
      
      * Revert "Enable tiled VAE decoding with customizable tile sample size and overlap factor"
      
      This reverts commit 181049675e83cea7b33ae2bbeba2aff7ae1b1761.
      
      * Add VAE tiling test for `ConsistencyDecoderVAE`
      
      ---------
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      443aa14e
  21. 25 Mar, 2024 1 commit
  22. 14 Mar, 2024 1 commit
  23. 27 Feb, 2024 1 commit
  24. 08 Feb, 2024 1 commit
  25. 01 Feb, 2024 1 commit