1. 30 Sep, 2025 1 commit
  2. 19 Jun, 2025 1 commit
  3. 19 May, 2025 1 commit
  4. 06 Jan, 2025 1 commit
  5. 24 Jun, 2024 1 commit
  6. 08 Apr, 2024 1 commit
    • Nguyễn Công Tú Anh's avatar
      Add AudioLDM2 TTS (#5381) · 56a76082
      Nguyễn Công Tú Anh authored
      
      
      * add audioldm2 tts
      
      * change gpt2 max new tokens
      
      * remove unnecessary pipeline and class
      
      * add TTS to AudioLDM2Pipeline
      
      * add TTS docs
      
      * delete unnecessary file
      
      * remove unnecessary import
      
      * add audioldm2 slow testcase
      
      * fix code quality
      
      * remove AudioLDMLearnablePositionalEmbedding
      
      * add variable check vits encoder
      
      * add use_learned_position_embedding
      
      ---------
      Co-authored-by: default avatarDhruv Nair <dhruv.nair@gmail.com>
      56a76082
  7. 08 Feb, 2024 1 commit
  8. 14 Nov, 2023 1 commit
  9. 23 Oct, 2023 1 commit
  10. 24 Aug, 2023 1 commit
  11. 21 Aug, 2023 1 commit
    • Sanchit Gandhi's avatar
      Add AudioLDM 2 (#4549) · 7a24977c
      Sanchit Gandhi authored
      
      
      * from audioldm
      
      * unet down + mid
      
      * vae, clap, flan-t5
      
      * start sequence audio mae
      
      * iterate on audioldm encoder
      
      * finish encoder
      
      * finish weight conversion
      
      * text pre-processing
      
      * gpt2 pre-processing
      
      * fix projection model
      
      * working
      
      * unet equivalence
      
      * finish in base
      
      * add unet cond
      
      * finish unet
      
      * finish custom unet
      
      * start clean-up
      
      * revert base unet changes
      
      * refactor pre-processing
      
      * tests: from audioldm
      
      * fix some tests
      
      * more fixes
      
      * iterate on tests
      
      * make fix copies
      
      * harden fast tests
      
      * slow integration tests
      
      * finish tests
      
      * update checkpoint
      
      * update copyright
      
      * docs
      
      * remove outdated method
      
      * add docstring
      
      * make style
      
      * remove decode latents
      
      * enable cpu offload
      
      * (text_encoder_1, tokenizer_1) -> (text_encoder, tokenizer)
      
      * more clean up
      
      * more refactor
      
      * build pr docs
      
      * Update docs/source/en/api/pipelines/audioldm2.md
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * small clean
      
      * tidy conversion
      
      * update for large checkpoint
      
      * generate -> generate_language_model
      
      * full clap model
      
      * shrink clap-audio in tests
      
      * fix large integration test
      
      * fix fast tests
      
      * use generation config
      
      * make style
      
      * update docs
      
      * finish docs
      
      * finish doc
      
      * update tests
      
      * fix last test
      
      * syntax
      
      * finalise tests
      
      * refactor projection model in prep for TTS
      
      * fix fast tests
      
      * style
      
      ---------
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      7a24977c