1. 28 Mar, 2023 20 commits
  2. 27 Mar, 2023 5 commits
  3. 24 Mar, 2023 7 commits
  4. 23 Mar, 2023 8 commits
    • Sanchit Gandhi's avatar
      Add AudioLDM (#2232) · b94880e5
      Sanchit Gandhi authored
      
      
      * Add AudioLDM
      
      * up
      
      * add vocoder
      
      * start unet
      
      * unconditional unet
      
      * clap, vocoder and vae
      
      * clean-up: conversion scripts
      
      * fix: conversion script token_type_ids
      
      * clean-up: pipeline docstring
      
      * tests: from SD
      
      * clean-up: cpu offload vocoder instead of safety checker
      
      * feat: adapt tests to audioldm
      
      * feat: add docs
      
      * clean-up: amend pipeline docstrings
      
      * clean-up: make style
      
      * clean-up: make fix-copies
      
      * fix: add doc path to toctree
      
      * clean-up: args for conversion script
      
      * clean-up: paths to checkpoints
      
      * fix: use conditional unet
      
      * clean-up: make style
      
      * fix: type hints for UNet
      
      * clean-up: docstring for UNet
      
      * clean-up: make style
      
      * clean-up: remove duplicate in docstring
      
      * clean-up: make style
      
      * clean-up: make fix-copies
      
      * clean-up: move imports to start in code snippet
      
      * fix: pass cross_attention_dim as a list/tuple to unet
      
      * clean-up: make fix-copies
      
      * fix: update checkpoint path
      
      * fix: unet cross_attention_dim in tests
      
      * film embeddings -> class embeddings
      
      * Apply suggestions from code review
      Co-authored-by: default avatarWill Berman <wlbberman@gmail.com>
      
      * fix: unet film embed to use existing args
      
      * fix: unet tests to use existing args
      
      * fix: make style
      
      * fix: transformers import and version in init
      
      * clean-up: make style
      
      * Revert "clean-up: make style"
      
      This reverts commit 5d6d1f8b324f5583e7805dc01e2c86e493660d66.
      
      * clean-up: make style
      
      * clean-up: use pipeline tester mixin tests where poss
      
      * clean-up: skip attn slicing test
      
      * fix: add torch dtype to docs
      
      * fix: remove conversion script out of src
      
      * fix: remove .detach from 1d waveform
      
      * fix: reduce default num inf steps
      
      * fix: swap height/width -> audio_length_in_s
      
      * clean-up: make style
      
      * fix: remove nightly tests
      
      * fix: imports in conversion script
      
      * clean-up: slim-down to two slow tests
      
      * clean-up: slim-down fast tests
      
      * fix: batch consistent tests
      
      * clean-up: make style
      
      * clean-up: remove vae slicing fast test
      
      * clean-up: propagate changes to doc
      
      * fix: increase test tol to 1e-2
      
      * clean-up: finish docs
      
      * clean-up: make style
      
      * feat: vocoder / VAE compatibility check
      
      * feat: possibly expand / cut audio waveform
      
      * fix: pipeline call signature test
      
      * fix: slow tests output len
      
      * clean-up: make style
      
      * make style
      
      ---------
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarWilliam Berman <WLBberman@gmail.com>
      b94880e5
    • Steven Liu's avatar
      [docs] Add Colab notebooks and Spaces (#2713) · 1870fb05
      Steven Liu authored
      * add colab notebook and spaces
      
      * fix image link
      1870fb05
    • YiYi Xu's avatar
      Flax controlnet (#2727) · df91c447
      YiYi Xu authored
      
      
      * add contronet flax
      
      ---------
      Co-authored-by: default avataryiyixuxu <yixu310@gmail,com>
      df91c447
    • Pedro Cuenca's avatar
      Skip `mps` in text-to-video tests (#2792) · aa0531fa
      Pedro Cuenca authored
      * Skip mps in text-to-video tests.
      
      * style
      
      * Skip UNet3D mps tests.
      aa0531fa
    • Haofan Wang's avatar
      Update train_text_to_image_lora.py (#2767) · dc5b4e23
      Haofan Wang authored
      * Update train_text_to_image_lora.py
      
      * Update train_text_to_image_lora.py
      
      * Update train_text_to_image_lora.py
      
      * Update train_text_to_image_lora.py
      
      * format
      dc5b4e23
    • Sayak Paul's avatar
      [Docs] small fixes to the text to video doc. (#2787) · 0d7aac3e
      Sayak Paul authored
      * small fixes to the text to video doc.
      
      * add: Spaces link.
      
      * add: warning on research-only model.
      0d7aac3e
    • Nipun Jindal's avatar
      [2737]: Add DPMSolverMultistepScheduler to CLIP guided community pipeline (#2779) · 055c90f5
      Nipun Jindal authored
      
      
      [2737]: Add DPMSolverMultistepScheduler to CLIP guided community pipelines
      Co-authored-by: default avatarnjindal <njindal@adobe.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      055c90f5
    • Kashif Rasul's avatar
      Music Spectrogram diffusion pipeline (#1044) · 2ef9bdd7
      Kashif Rasul authored
      
      
      * initial TokenEncoder and ContinuousEncoder
      
      * initial modules
      
      * added ContinuousContextTransformer
      
      * fix copy paste error
      
      * use numpy for get_sequence_length
      
      * initial terminal relative positional encodings
      
      * fix weights keys
      
      * fix assert
      
      * cross attend style: concat encodings
      
      * make style
      
      * concat once
      
      * fix formatting
      
      * Initial SpectrogramPipeline
      
      * fix input_tokens
      
      * make style
      
      * added mel output
      
      * ignore weights for config
      
      * move mel to numpy
      
      * import pipeline
      
      * fix class names and import
      
      * moved models to models folder
      
      * import ContinuousContextTransformer and SpectrogramDiffusionPipeline
      
      * initial spec diffusion converstion script
      
      * renamed config to t5config
      
      * added weight loading
      
      * use arguments instead of t5config
      
      * broadcast noise time to batch dim
      
      * fix call
      
      * added scale_to_features
      
      * fix weights
      
      * transpose laynorm weight
      
      * scale is a vector
      
      * scale the query outputs
      
      * added comment
      
      * undo scaling
      
      * undo depth_scaling
      
      * inital get_extended_attention_mask
      
      * attention_mask is none in self-attention
      
      * cleanup
      
      * manually invert attention
      
      * nn.linear need bias=False
      
      * added T5LayerFFCond
      
      * remove to fix conflict
      
      * make style and dummy
      
      * remove unsed variables
      
      * remove predict_epsilon
      
      * Move accelerate to a soft-dependency (#1134)
      
      * finish
      
      * finish
      
      * Update src/diffusers/modeling_utils.py
      
      * Update src/diffusers/pipeline_utils.py
      Co-authored-by: default avatarAnton Lozhkov <anton@huggingface.co>
      
      * more fixes
      
      * fix
      Co-authored-by: default avatarAnton Lozhkov <anton@huggingface.co>
      
      * fix order
      
      * added initial midi to note token data pipeline
      
      * added int to int tokenizer
      
      * remove duplicate
      
      * added logic for segments
      
      * add melgan to pipeline
      
      * move autoregressive gen into pipeline
      
      * added note_representation_processor_chain
      
      * fix dtypes
      
      * remove immutabledict req
      
      * initial doc
      
      * use np.where
      
      * require note_seq
      
      * fix typo
      
      * update dependency
      
      * added note-seq to test
      
      * added is_note_seq_available
      
      * fix import
      
      * added toc
      
      * added example usage
      
      * undo for now
      
      * moved docs
      
      * fix merge
      
      * fix imports
      
      * predict first segment
      
      * avoid un-needed copy to and from cpu
      
      * make style
      
      * Copyright
      
      * fix style
      
      * add test and fix inference steps
      
      * remove bogus files
      
      * reorder models
      
      * up
      
      * remove transformers dependency
      
      * make work with diffusers cross attention
      
      * clean more
      
      * remove @
      
      * improve further
      
      * up
      
      * uP
      
      * Apply suggestions from code review
      
      * Update tests/pipelines/spectrogram_diffusion/test_spectrogram_diffusion.py
      
      * loop over all tokens
      
      * make style
      
      * Added a section on the model
      
      * fix formatting
      
      * grammer
      
      * formatting
      
      * make fix-copies
      
      * Update src/diffusers/pipelines/__init__.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/diffusers/pipelines/spectrogram_diffusion/pipeline_spectrogram_diffusion.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * added callback ad optional ionnx
      
      * do not squeeze batch dim
      
      * clean up more
      
      * upload
      
      * convert jax to nnumpy
      
      * make style
      
      * fix warning
      
      * make fix-copies
      
      * fix warning
      
      * add initial fast tests
      
      * add initial pipeline_params
      
      * eval mode due to dropout
      
      * skip batch tests as pipeline runs on a single file
      
      * make style
      
      * fix relative path
      
      * fix doc tests
      
      * Update src/diffusers/models/t5_film_transformer.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/diffusers/models/t5_film_transformer.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update docs/source/en/api/pipelines/spectrogram_diffusion.mdx
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update tests/pipelines/spectrogram_diffusion/test_spectrogram_diffusion.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update tests/pipelines/spectrogram_diffusion/test_spectrogram_diffusion.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update tests/pipelines/spectrogram_diffusion/test_spectrogram_diffusion.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update tests/pipelines/spectrogram_diffusion/test_spectrogram_diffusion.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * add MidiProcessor
      
      * format
      
      * fix org
      
      * Apply suggestions from code review
      
      * Update tests/pipelines/spectrogram_diffusion/test_spectrogram_diffusion.py
      
      * make style
      
      * pin protobuf to <4
      
      * fix formatting
      
      * white space
      
      * tensorboard needs protobuf
      
      ---------
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarAnton Lozhkov <anton@huggingface.co>
      2ef9bdd7