1. 11 Apr, 2023 4 commits
    • Will Berman's avatar
      Attention processor cross attention norm group norm (#3021) · 98c5e5da
      Will Berman authored
      add group norm type to attention processor cross attention norm
      
      This lets the cross attention norm use both a group norm block and a
      layer norm block.
      
      The group norm operates along the channels dimension
      and requires input shape (batch size, channels, *) where as the layer norm with a single
      `normalized_shape` dimension only operates over the least significant
      dimension i.e. (*, channels).
      
      The channels we want to normalize are the hidden dimension of the encoder hidden states.
      
      By convention, the encoder hidden states are always passed as (batch size, sequence
      length, hidden states).
      
      This means the layer norm can operate on the tensor without modification, but the group
      norm requires flipping the last two dimensions to operate on (batch size, hidden states, sequence length).
      
      All existing attention processors will have the same logic and we can
      consolidate it in a helper function `prepare_encoder_hidden_states`
      
      prepare_encoder_hidden_states -> norm_encoder_hidden_states re: @patrickvonplaten
      
      move norm_cross defined check to outside norm_encoder_hidden_states
      
      add missing attn.norm_cross check
      98c5e5da
    • Will Berman's avatar
      unet time embedding activation function (#3048) · 2d52e81c
      Will Berman authored
      * unet time embedding activation function
      
      * typo act_fn -> time_embedding_act_fn
      
      * flatten conditional
      2d52e81c
    • Will Berman's avatar
      add only cross attention to simple attention blocks (#3011) · c6180a31
      Will Berman authored
      * add only cross attention to simple attention blocks
      
      * add test for only_cross_attention re: @patrickvonplaten
      
      * mid_block_only_cross_attention better default
      
      allow mid_block_only_cross_attention to default to
      `only_cross_attention` when `only_cross_attention` is given
      as a single boolean
      c6180a31
    • Patrick von Platen's avatar
      Fix config prints and save, load of pipelines (#2849) · 8b451eb6
      Patrick von Platen authored
      * [Config] Fix config prints and save, load
      
      * Only use potential nn.Modules for dtype and device
      
      * Correct vae image processor
      
      * make sure in_channels is not accessed directly
      
      * make sure in channels is only accessed via config
      
      * Make sure schedulers only access config attributes
      
      * Make sure to access config in SAG
      
      * Fix vae processor and make style
      
      * add tests
      
      * uP
      
      * make style
      
      * Fix more naming issues
      
      * Final fix with vae config
      
      * change more
      8b451eb6
  2. 10 Apr, 2023 3 commits
  3. 27 Mar, 2023 1 commit
  4. 23 Mar, 2023 1 commit
    • Sanchit Gandhi's avatar
      Add AudioLDM (#2232) · b94880e5
      Sanchit Gandhi authored
      
      
      * Add AudioLDM
      
      * up
      
      * add vocoder
      
      * start unet
      
      * unconditional unet
      
      * clap, vocoder and vae
      
      * clean-up: conversion scripts
      
      * fix: conversion script token_type_ids
      
      * clean-up: pipeline docstring
      
      * tests: from SD
      
      * clean-up: cpu offload vocoder instead of safety checker
      
      * feat: adapt tests to audioldm
      
      * feat: add docs
      
      * clean-up: amend pipeline docstrings
      
      * clean-up: make style
      
      * clean-up: make fix-copies
      
      * fix: add doc path to toctree
      
      * clean-up: args for conversion script
      
      * clean-up: paths to checkpoints
      
      * fix: use conditional unet
      
      * clean-up: make style
      
      * fix: type hints for UNet
      
      * clean-up: docstring for UNet
      
      * clean-up: make style
      
      * clean-up: remove duplicate in docstring
      
      * clean-up: make style
      
      * clean-up: make fix-copies
      
      * clean-up: move imports to start in code snippet
      
      * fix: pass cross_attention_dim as a list/tuple to unet
      
      * clean-up: make fix-copies
      
      * fix: update checkpoint path
      
      * fix: unet cross_attention_dim in tests
      
      * film embeddings -> class embeddings
      
      * Apply suggestions from code review
      Co-authored-by: default avatarWill Berman <wlbberman@gmail.com>
      
      * fix: unet film embed to use existing args
      
      * fix: unet tests to use existing args
      
      * fix: make style
      
      * fix: transformers import and version in init
      
      * clean-up: make style
      
      * Revert "clean-up: make style"
      
      This reverts commit 5d6d1f8b324f5583e7805dc01e2c86e493660d66.
      
      * clean-up: make style
      
      * clean-up: use pipeline tester mixin tests where poss
      
      * clean-up: skip attn slicing test
      
      * fix: add torch dtype to docs
      
      * fix: remove conversion script out of src
      
      * fix: remove .detach from 1d waveform
      
      * fix: reduce default num inf steps
      
      * fix: swap height/width -> audio_length_in_s
      
      * clean-up: make style
      
      * fix: remove nightly tests
      
      * fix: imports in conversion script
      
      * clean-up: slim-down to two slow tests
      
      * clean-up: slim-down fast tests
      
      * fix: batch consistent tests
      
      * clean-up: make style
      
      * clean-up: remove vae slicing fast test
      
      * clean-up: propagate changes to doc
      
      * fix: increase test tol to 1e-2
      
      * clean-up: finish docs
      
      * clean-up: make style
      
      * feat: vocoder / VAE compatibility check
      
      * feat: possibly expand / cut audio waveform
      
      * fix: pipeline call signature test
      
      * fix: slow tests output len
      
      * clean-up: make style
      
      * make style
      
      ---------
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarWilliam Berman <WLBberman@gmail.com>
      b94880e5
  5. 21 Mar, 2023 1 commit
  6. 15 Mar, 2023 1 commit
  7. 14 Mar, 2023 1 commit
  8. 07 Mar, 2023 1 commit
  9. 02 Mar, 2023 1 commit
    • Takuma Mori's avatar
      Add a ControlNet model & pipeline (#2407) · 8dfff7c0
      Takuma Mori authored
      
      
      * add scaffold
      - copied convert_controlnet_to_diffusers.py from
      convert_original_stable_diffusion_to_diffusers.py
      
      * Add support to load ControlNet (WIP)
      - this makes Missking Key error on ControlNetModel
      
      * Update to convert ControlNet without error msg
      - init impl for StableDiffusionControlNetPipeline
      - init impl for ControlNetModel
      
      * cleanup of commented out
      
      * split create_controlnet_diffusers_config()
      from create_unet_diffusers_config()
      
      - add config: hint_channels
      
      * Add input_hint_block, input_zero_conv and
      middle_block_out
      - this makes missing key error on loading model
      
      * add unet_2d_blocks_controlnet.py
      - copied from unet_2d_blocks.py as impl CrossAttnDownBlock2D,DownBlock2D
      - this makes missing key error on loading model
      
      * Add loading for input_hint_block, zero_convs
      and middle_block_out
      
      - this makes no error message on model loading
      
      * Copy from UNet2DConditionalModel except __init__
      
      * Add ultra primitive test for ControlNetModel
      inference
      
      * Support ControlNetModel inference
      - without exceptions
      
      * copy forward() from UNet2DConditionModel
      
      * Impl ControlledUNet2DConditionModel inference
      - test_controlled_unet_inference passed
      
      * Frozen weight & biases for training
      
      * Minimized version of ControlNet/ControlledUnet
      - test_modules_controllnet.py passed
      
      * make style
      
      * Add support model loading for minimized ver
      
      * Remove all previous version files
      
      * from_pretrained and inference test passed
      
      * copied from pipeline_stable_diffusion.py
      except `__init__()`
      
      * Impl pipeline, pixel match test (almost) passed.
      
      * make style
      
      * make fix-copies
      
      * Fix to add import ControlNet blocks
      for `make fix-copies`
      
      * Remove einops dependency
      
      * Support  np.ndarray, PIL.Image for controlnet_hint
      
      * set default config file as lllyasviel's
      
      * Add support grayscale (hw) numpy array
      
      * Add and update docstrings
      
      * add control_net.mdx
      
      * add control_net.mdx to toctree
      
      * Update copyright year
      
      * Fix to add PIL.Image RGB->BGR conversion
      - thanks @Mystfit
      
      * make fix-copies
      
      * add basic fast test for controlnet
      
      * add slow test for controlnet/unet
      
      * Ignore down/up_block len check on ControlNet
      
      * add a copy from test_stable_diffusion.py
      
      * Accept controlnet_hint is None
      
      * merge pipeline_stable_diffusion.py diff
      
      * Update class name to SDControlNetPipeline
      
      * make style
      
      * Baseline fast test almost passed (w long desc)
      
      * still needs investigate.
      
      Following didn't passed descriped in TODO comment:
      - test_stable_diffusion_long_prompt
      - test_stable_diffusion_no_safety_checker
      
      Following didn't passed same as stable_diffusion_pipeline:
      - test_attention_slicing_forward_pass
      - test_inference_batch_single_identical
      - test_xformers_attention_forwardGenerator_pass
      these seems come from calc accuracy.
      
      * Add note comment related vae_scale_factor
      
      * add test_stable_diffusion_controlnet_ddim
      
      * add assertion for vae_scale_factor != 8
      
      * slow test of pipeline almost passed
      Failed: test_stable_diffusion_pipeline_with_model_offloading
      - ImportError: `enable_model_offload` requires `accelerate v0.17.0` or higher
      
      but currently latest version == 0.16.0
      
      * test_stable_diffusion_long_prompt passed
      
      * test_stable_diffusion_no_safety_checker passed
      
      - due to its model size, move to slow test
      
      * remove PoC test files
      
      * fix num_of_image, prompt length issue add add test
      
      * add support List[PIL.Image] for controlnet_hint
      
      * wip
      
      * all slow test passed
      
      * make style
      
      * update for slow test
      
      * RGB(PIL)->BGR(ctrlnet) conversion
      
      * fixes
      
      * remove manual num_images_per_prompt test
      
      * add document
      
      * add `image` argument docstring
      
      * make style
      
      * Add line to correct conversion
      
      * add controlnet_conditioning_scale (aka control_scales
      strength)
      
      * rgb channel ordering by default
      
      * image batching logic
      
      * Add control image descriptions for each checkpoint
      
      * Only save controlnet model in conversion script
      
      * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py
      
      typo
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * add gerated image example
      
      * a depth mask -> a depth map
      
      * rename control_net.mdx to controlnet.mdx
      
      * fix toc title
      
      * add ControlNet abstruct and link
      
      * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py
      Co-authored-by: default avatardqueue <dbyqin@gmail.com>
      
      * remove controlnet constructor arguments re: @patrickvonplaten
      
      * [integration tests] test canny
      
      * test_canny fixes
      
      * [integration tests] test_depth
      
      * [integration tests] test_hed
      
      * [integration tests] test_mlsd
      
      * add channel order config to controlnet
      
      * [integration tests] test normal
      
      * [integration tests] test_openpose test_scribble
      
      * change height and width to default to conditioning image
      
      * [integration tests] test seg
      
      * style
      
      * test_depth fix
      
      * [integration tests] size fixes
      
      * [integration tests] cpu offloading
      
      * style
      
      * generalize controlnet embedding
      
      * fix conversion script
      
      * Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      
      * Style adapted to the documentation of pix2pix
      
      * merge main by hand
      
      * style
      
      * [docs] controlling generation doc nits
      
      * correct some things
      
      * add: controlnetmodel to autodoc.
      
      * finish docs
      
      * finish
      
      * finish 2
      
      * correct images
      
      * finish controlnet
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * uP
      
      * upload model
      
      * up
      
      * up
      
      ---------
      Co-authored-by: default avatarWilliam Berman <WLBberman@gmail.com>
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      Co-authored-by: default avatardqueue <dbyqin@gmail.com>
      Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      8dfff7c0
  10. 01 Mar, 2023 1 commit
  11. 14 Feb, 2023 2 commits
  12. 07 Feb, 2023 1 commit
    • YiYi Xu's avatar
      Stable Diffusion Latent Upscaler (#2059) · 1051ca81
      YiYi Xu authored
      
      
      * Modify UNet2DConditionModel
      
      - allow skipping mid_block
      
      - adding a norm_group_size argument so that we can set the `num_groups` for group norm using `num_channels//norm_group_size`
      
      - allow user to set dimension for the timestep embedding (`time_embed_dim`)
      
      - the kernel_size for `conv_in` and `conv_out` is now configurable
      
      - add random fourier feature layer (`GaussianFourierProjection`) for `time_proj`
      
      - allow user to add the time and class embeddings before passing through the projection layer together - `time_embedding(t_emb + class_label))`
      
      - added 2 arguments `attn1_types` and `attn2_types`
      
        * currently we have argument `only_cross_attention`: when it's set to `True`, we will have a to the
      `BasicTransformerBlock` block with 2 cross-attention , otherwise we
      get a self-attention followed by a cross-attention; in k-upscaler, we need to have blocks that include just one cross-attention, or self-attention -> cross-attention;
      so I added `attn1_types` and `attn2_types` to the unet's argument list to allow user specify the attention types for the 2 positions in each block;  note that I stil kept
      the `only_cross_attention` argument for unet for easy configuration, but it will be converted to `attn1_type` and `attn2_type` when passing down to the down blocks
      
      - the position of downsample layer and upsample layer is now configurable
      
      - in k-upscaler unet, there is only one skip connection per each up/down block (instead of each layer in stable diffusion unet), added `skip_freq = "block"` to support
      this use case
      
      - if user passes attention_mask to unet, it will prepare the mask and pass a flag to cross attention processer to skip the `prepare_attention_mask` step
      inside cross attention block
      
      add up/down blocks for k-upscaler
      
      modify CrossAttention class
      
      - make the `dropout` layer in `to_out` optional
      
      - `use_conv_proj` - use conv instead of linear for all projection layers (i.e. `to_q`, `to_k`, `to_v`, `to_out`) whenever possible. note that when it's used to do cross
      attention, to_k, to_v has to be linear because the `encoder_hidden_states` is not 2d
      
      - `cross_attention_norm` - add an optional layernorm on encoder_hidden_states
      
      - `attention_dropout`: add an optional dropout on attention score
      
      adapt BasicTransformerBlock
      
      - add an ada groupnorm layer  to conditioning attention input with timestep embedding
      
      - allow skipping the FeedForward layer in between the attentions
      
      - replaced the only_cross_attention argument with attn1_type and attn2_type for more flexible configuration
      
      update timestep embedding: add new act_fn  gelu and an optional act_2
      
      modified ResnetBlock2D
      
      - refactored with AdaGroupNorm class (the timestep scale shift normalization)
      
      - add `mid_channel` argument - allow the first conv to have a different output dimension from the second conv
      
      - add option to use input AdaGroupNorm on the input instead of groupnorm
      
      - add options to add a dropout layer after each conv
      
      - allow user to set the bias in conv_shortcut (needed for k-upscaler)
      
      - add gelu
      
      adding conversion script for k-upscaler unet
      
      add pipeline
      
      * fix attention mask
      
      * fix a typo
      
      * fix a bug
      
      * make sure model can be used with GPU
      
      * make pipeline work with fp16
      
      * fix an error in BasicTransfomerBlock
      
      * make style
      
      * fix typo
      
      * some more fixes
      
      * uP
      
      * up
      
      * correct more
      
      * some clean-up
      
      * clean time proj
      
      * up
      
      * uP
      
      * more changes
      
      * remove the upcast_attention=True from unet config
      
      * remove attn1_types, attn2_types etc
      
      * fix
      
      * revert incorrect changes up/down samplers
      
      * make style
      
      * remove outdated files
      
      * Apply suggestions from code review
      
      * attention refactor
      
      * refactor cross attention
      
      * Apply suggestions from code review
      
      * update
      
      * up
      
      * update
      
      * Apply suggestions from code review
      
      * finish
      
      * Update src/diffusers/models/cross_attention.py
      
      * more fixes
      
      * up
      
      * up
      
      * up
      
      * finish
      
      * more corrections of conversion state
      
      * act_2 -> act_2_fn
      
      * remove dropout_after_conv from ResnetBlock2D
      
      * make style
      
      * simplify KAttentionBlock
      
      * add fast test for latent upscaler pipeline
      
      * add slow test
      
      * slow test fp16
      
      * make style
      
      * add doc string for pipeline_stable_diffusion_latent_upscale
      
      * add api doc page for latent upscaler pipeline
      
      * deprecate attention mask
      
      * clean up embeddings
      
      * simplify resnet
      
      * up
      
      * clean up resnet
      
      * up
      
      * correct more
      
      * up
      
      * up
      
      * improve a bit more
      
      * correct more
      
      * more clean-ups
      
      * Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * add docstrings for new unet config
      
      * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * # Copied from
      
      * encode the image if not latent
      
      * remove force casting vae to fp32
      
      * fix
      
      * add comments about preconditioning parameters from k-diffusion paper
      
      * attn1_type, attn2_type -> add_self_attention
      
      * clean up get_down_block and get_up_block
      
      * fix
      
      * fixed a typo(?) in ada group norm
      
      * update slice attention processer for cross attention
      
      * update slice
      
      * fix fast test
      
      * update the checkpoint
      
      * finish tests
      
      * fix-copies
      
      * fix-copy for modeling_text_unet.py
      
      * make style
      
      * make style
      
      * fix f-string
      
      * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * fix import
      
      * correct changes
      
      * fix resnet
      
      * make fix-copies
      
      * correct euler scheduler
      
      * add missing #copied from for preprocess
      
      * revert
      
      * fix
      
      * fix copies
      
      * Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update src/diffusers/models/cross_attention.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * clean up conversion script
      
      * KDownsample2d,KUpsample2d -> KDownsample2D,KUpsample2D
      
      * more
      
      * Update src/diffusers/models/unet_2d_condition.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * remove prepare_extra_step_kwargs
      
      * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * fix a typo in timestep embedding
      
      * remove num_image_per_prompt
      
      * fix fasttest
      
      * make style + fix-copies
      
      * fix
      
      * fix xformer test
      
      * fix style
      
      * doc string
      
      * make style
      
      * fix-copies
      
      * docstring for time_embedding_norm
      
      * make style
      
      * final finishes
      
      * make fix-copies
      
      * fix tests
      
      ---------
      Co-authored-by: default avataryiyixuxu <yixu@yis-macbook-pro.lan>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      1051ca81
  13. 27 Jan, 2023 1 commit
  14. 26 Jan, 2023 1 commit
    • Pedro Cuenca's avatar
      Allow `UNet2DModel` to use arbitrary class embeddings (#2080) · 915a5636
      Pedro Cuenca authored
      * Allow `UNet2DModel` to use arbitrary class embeddings.
      
      We can currently use class conditioning in `UNet2DConditionModel`, but
      not in `UNet2DModel`. However, `UNet2DConditionModel` requires text
      conditioning too, which is unrelated to other types of conditioning.
      This commit makes it possible for `UNet2DModel` to be conditioned on
      entities other than timesteps. This is useful for training /
      research purposes. We can currently train models to perform
      unconditional image generation or text-to-image generation, but it's not
      straightforward to train a model to perform class-conditioned image
      generation, if text conditioning is not required.
      
      We could potentiall use `UNet2DConditionModel` for class-conditioning
      without text embeddings by using down/up blocks without
      cross-conditioning. However:
      - The mid block currently requires cross attention.
      - We are required to provide `encoder_hidden_states` to `forward`.
      
      * Style
      
      * Align class conditioning, add docstring for `num_class_embeds`.
      
      * Copy docstring to versatile_diffusion UNetFlatConditionModel
      915a5636
  15. 18 Jan, 2023 1 commit
  16. 30 Dec, 2022 1 commit
  17. 20 Dec, 2022 1 commit
  18. 18 Dec, 2022 1 commit
    • Will Berman's avatar
      kakaobrain unCLIP (#1428) · 2dcf64b7
      Will Berman authored
      
      
      * [wip] attention block updates
      
      * [wip] unCLIP unet decoder and super res
      
      * [wip] unCLIP prior transformer
      
      * [wip] scheduler changes
      
      * [wip] text proj utility class
      
      * [wip] UnCLIPPipeline
      
      * [wip] kakaobrain unCLIP convert script
      
      * [unCLIP pipeline] fixes re: @patrickvonplaten
      
      remove callbacks
      
      move denoising loops into call function
      
      * UNCLIPScheduler re: @patrickvonplaten
      
      Revert changes to DDPMScheduler. Make UNCLIPScheduler, a modified
      DDPM scheduler with changes to support karlo
      
      * mask -> attention_mask re: @patrickvonplaten
      
      * [DDPMScheduler] remove leftover change
      
      * [docs] PriorTransformer
      
      * [docs] UNet2DConditionModel and UNet2DModel
      
      * [nit] UNCLIPScheduler -> UnCLIPScheduler
      
      matches existing unclip naming better
      
      * [docs] SchedulingUnCLIP
      
      * [docs] UnCLIPTextProjModel
      
      * refactor
      
      * finish licenses
      
      * rename all to attention_mask and prep in models
      
      * more renaming
      
      * don't expose unused configs
      
      * final renaming fixes
      
      * remove x attn mask when not necessary
      
      * configure kakao script to use new class embedding config
      
      * fix copies
      
      * [tests] UnCLIPScheduler
      
      * finish x attn
      
      * finish
      
      * remove more
      
      * rename condition blocks
      
      * clean more
      
      * Apply suggestions from code review
      
      * up
      
      * fix
      
      * [tests] UnCLIPPipelineFastTests
      
      * remove unused imports
      
      * [tests] UnCLIPPipelineIntegrationTests
      
      * correct
      
      * make style
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      2dcf64b7
  19. 07 Dec, 2022 2 commits
  20. 05 Dec, 2022 2 commits
  21. 02 Dec, 2022 3 commits
  22. 24 Nov, 2022 2 commits
    • Anton Lozhkov's avatar
      Support SD2 attention slicing (#1397) · d50e3217
      Anton Lozhkov authored
      * Support SD2 attention slicing
      
      * Support SD2 attention slicing
      
      * Add more copies
      
      * Use attn_num_head_channels in blocks
      
      * fix-copies
      
      * Update tests
      
      * fix imports
      d50e3217
    • Suraj Patil's avatar
      Adapt UNet2D for supre-resolution (#1385) · cecdd8bd
      Suraj Patil authored
      * allow disabling self attention
      
      * add class_embedding
      
      * fix copies
      
      * fix condition
      
      * fix copies
      
      * do_self_attention -> only_cross_attention
      
      * fix copies
      
      * num_classes -> num_class_embeds
      
      * fix default value
      cecdd8bd
  23. 23 Nov, 2022 3 commits
    • Suraj Patil's avatar
      update unet2d (#1376) · f07a16e0
      Suraj Patil authored
      * boom boom
      
      * remove duplicate arg
      
      * add use_linear_proj arg
      
      * fix copies
      
      * style
      
      * add fast tests
      
      * use_linear_proj -> use_linear_projection
      f07a16e0
    • Patrick von Platen's avatar
      [Versatile Diffusion] Add versatile diffusion model (#1283) · 2625fb59
      Patrick von Platen authored
      
      
      * up
      
      * convert dual unet
      
      * revert dual attn
      
      * adapt for vd-official
      
      * test the full pipeline
      
      * mixed inference
      
      * mixed inference for text2img
      
      * add image prompting
      
      * fix clip norm
      
      * split text2img and img2img
      
      * fix format
      
      * refactor text2img
      
      * mega pipeline
      
      * add optimus
      
      * refactor image var
      
      * wip text_unet
      
      * text unet end to end
      
      * update tests
      
      * reshape
      
      * fix image to text
      
      * add some first docs
      
      * dual guided pipeline
      
      * fix token ratio
      
      * propose change
      
      * dual transformer as a native module
      
      * DualTransformer(nn.Module)
      
      * DualTransformer(nn.Module)
      
      * correct unconditional image
      
      * save-load with mega pipeline
      
      * remove image to text
      
      * up
      
      * uP
      
      * fix
      
      * up
      
      * final fix
      
      * remove_unused_weights
      
      * test updates
      
      * save progress
      
      * uP
      
      * fix dual prompts
      
      * some fixes
      
      * finish
      
      * style
      
      * finish renaming
      
      * up
      
      * fix
      
      * fix
      
      * fix
      
      * finish
      Co-authored-by: default avataranton-l <anton@huggingface.co>
      2625fb59
    • Penn's avatar
      Fix using non-square images with UNet2DModel and DDIM/DDPM pipelines (#1289) · 8fd3a743
      Penn authored
      
      
      * fix non square images with UNet2DModel and DDIM/DDPM pipelines
      
      * fix unet_2d `sample_size` docstring
      
      * update pipeline tests for unet uncond
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      8fd3a743
  24. 16 Nov, 2022 1 commit
  25. 14 Nov, 2022 1 commit
  26. 02 Nov, 2022 1 commit
    • MatthieuTPHR's avatar
      Up to 2x speedup on GPUs using memory efficient attention (#532) · 98c42134
      MatthieuTPHR authored
      
      
      * 2x speedup using memory efficient attention
      
      * remove einops dependency
      
      * Swap K, M in op instantiation
      
      * Simplify code, remove unnecessary maybe_init call and function, remove unused self.scale parameter
      
      * make xformers a soft dependency
      
      * remove one-liner functions
      
      * change one letter variable to appropriate names
      
      * Remove Env variable dependency, remove MemoryEfficientCrossAttention class and use enable_xformers_memory_efficient_attention method
      
      * Add memory efficient attention toggle to img2img and inpaint pipelines
      
      * Clearer management of xformers' availability
      
      * update optimizations markdown to add info about memory efficient attention
      
      * add benchmarks for TITAN RTX
      
      * More detailed explanation of how the mem eff benchmark were ran
      
      * Removing autocast from optimization markdown
      
      * import_utils: import torch only if is available
      Co-authored-by: default avatarNouamane Tazi <nouamane98@gmail.com>
      98c42134
  27. 25 Oct, 2022 1 commit