• Yoach Lacombe's avatar
    Stable Audio integration (#8716) · 69e72b1d
    Yoach Lacombe authored
    
    
    * WIP modeling code and pipeline
    
    * add custom attention processor + custom activation + add to init
    
    * correct ProjectionModel forward
    
    * add stable audio to __initèè
    
    * add autoencoder and update pipeline and modeling code
    
    * add half Rope
    
    * add partial rotary v2
    
    * add temporary modfis to scheduler
    
    * add EDM DPM Solver
    
    * remove TODOs
    
    * clean GLU
    
    * remove att.group_norm to attn processor
    
    * revert back src/diffusers/schedulers/scheduling_dpmsolver_multistep.py
    
    * refactor GLU -> SwiGLU
    
    * remove redundant args
    
    * add channel multiples in autoencoder docstrings
    
    * changes in docsrtings and copyright headers
    
    * clean pipeline
    
    * further cleaning
    
    * remove peft and lora and fromoriginalmodel
    
    * Delete src/diffusers/pipelines/stable_audio/diffusers.code-workspace
    
    * make style
    
    * dummy models
    
    * fix copied from
    
    * add fast oobleck tests
    
    * add brownian tree
    
    * oobleck autoencoder slow tests
    
    * remove TODO
    
    * fast stable audio pipeline tests
    
    * add slow tests
    
    * make style
    
    * add first version of docs
    
    * wrap is_torchsde_available to the scheduler
    
    * fix slow test
    
    * test with input waveform
    
    * add input waveform
    
    * remove some todos
    
    * create stableaudio gaussian projection + make style
    
    * add pipeline to toctree
    
    * fix copied from
    
    * make quality
    
    * refactor timestep_features->time_proj
    
    * refactor joint_attention_kwargs->cross_attention_kwargs
    
    * remove forward_chunk
    
    * move StableAudioDitModel to transformers folder
    
    * correct convert + remove partial rotary embed
    
    * apply suggestions from yiyixuxu -> removing attn.kv_heads
    
    * remove temb
    
    * remove cross_attention_kwargs
    
    * further removal of cross_attention_kwargs
    
    * remove text encoder autocast to fp16
    
    * continue removing autocast
    
    * make style
    
    * refactor how text and audio are embedded
    
    * add paper
    
    * update example code
    
    * make style
    
    * unify projection model forward + fix device placement
    
    * make style
    
    * remove fuse qkv
    
    * apply suggestions from review
    
    * Update src/diffusers/pipelines/stable_audio/pipeline_stable_audio.py
    Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
    
    * make style
    
    * smaller models in fast tests
    
    * pass sequential offloading fast tests
    
    * add docs for vae and autoencoder
    
    * make style and update example
    
    * remove useless import
    
    * add cosine scheduler
    
    * dummy classes
    
    * cosine scheduler docs
    
    * better description of scheduler
    
    ---------
    Co-authored-by: default avatarYiYi Xu <yixu310@gmail.com>
    69e72b1d
embeddings.py 54.8 KB