• Sanchit Gandhi's avatar
    Add AudioLDM (#2232) · b94880e5
    Sanchit Gandhi authored
    
    
    * Add AudioLDM
    
    * up
    
    * add vocoder
    
    * start unet
    
    * unconditional unet
    
    * clap, vocoder and vae
    
    * clean-up: conversion scripts
    
    * fix: conversion script token_type_ids
    
    * clean-up: pipeline docstring
    
    * tests: from SD
    
    * clean-up: cpu offload vocoder instead of safety checker
    
    * feat: adapt tests to audioldm
    
    * feat: add docs
    
    * clean-up: amend pipeline docstrings
    
    * clean-up: make style
    
    * clean-up: make fix-copies
    
    * fix: add doc path to toctree
    
    * clean-up: args for conversion script
    
    * clean-up: paths to checkpoints
    
    * fix: use conditional unet
    
    * clean-up: make style
    
    * fix: type hints for UNet
    
    * clean-up: docstring for UNet
    
    * clean-up: make style
    
    * clean-up: remove duplicate in docstring
    
    * clean-up: make style
    
    * clean-up: make fix-copies
    
    * clean-up: move imports to start in code snippet
    
    * fix: pass cross_attention_dim as a list/tuple to unet
    
    * clean-up: make fix-copies
    
    * fix: update checkpoint path
    
    * fix: unet cross_attention_dim in tests
    
    * film embeddings -> class embeddings
    
    * Apply suggestions from code review
    Co-authored-by: default avatarWill Berman <wlbberman@gmail.com>
    
    * fix: unet film embed to use existing args
    
    * fix: unet tests to use existing args
    
    * fix: make style
    
    * fix: transformers import and version in init
    
    * clean-up: make style
    
    * Revert "clean-up: make style"
    
    This reverts commit 5d6d1f8b324f5583e7805dc01e2c86e493660d66.
    
    * clean-up: make style
    
    * clean-up: use pipeline tester mixin tests where poss
    
    * clean-up: skip attn slicing test
    
    * fix: add torch dtype to docs
    
    * fix: remove conversion script out of src
    
    * fix: remove .detach from 1d waveform
    
    * fix: reduce default num inf steps
    
    * fix: swap height/width -> audio_length_in_s
    
    * clean-up: make style
    
    * fix: remove nightly tests
    
    * fix: imports in conversion script
    
    * clean-up: slim-down to two slow tests
    
    * clean-up: slim-down fast tests
    
    * fix: batch consistent tests
    
    * clean-up: make style
    
    * clean-up: remove vae slicing fast test
    
    * clean-up: propagate changes to doc
    
    * fix: increase test tol to 1e-2
    
    * clean-up: finish docs
    
    * clean-up: make style
    
    * feat: vocoder / VAE compatibility check
    
    * feat: possibly expand / cut audio waveform
    
    * fix: pipeline call signature test
    
    * fix: slow tests output len
    
    * clean-up: make style
    
    * make style
    
    ---------
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    Co-authored-by: default avatarWilliam Berman <WLBberman@gmail.com>
    b94880e5
unet_2d_condition.py 32.3 KB