• YiYi Xu's avatar
    Stable Diffusion Latent Upscaler (#2059) · 1051ca81
    YiYi Xu authored
    
    
    * Modify UNet2DConditionModel
    
    - allow skipping mid_block
    
    - adding a norm_group_size argument so that we can set the `num_groups` for group norm using `num_channels//norm_group_size`
    
    - allow user to set dimension for the timestep embedding (`time_embed_dim`)
    
    - the kernel_size for `conv_in` and `conv_out` is now configurable
    
    - add random fourier feature layer (`GaussianFourierProjection`) for `time_proj`
    
    - allow user to add the time and class embeddings before passing through the projection layer together - `time_embedding(t_emb + class_label))`
    
    - added 2 arguments `attn1_types` and `attn2_types`
    
      * currently we have argument `only_cross_attention`: when it's set to `True`, we will have a to the
    `BasicTransformerBlock` block with 2 cross-attention , otherwise we
    get a self-attention followed by a cross-attention; in k-upscaler, we need to have blocks that include just one cross-attention, or self-attention -> cross-attention;
    so I added `attn1_types` and `attn2_types` to the unet's argument list to allow user specify the attention types for the 2 positions in each block;  note that I stil kept
    the `only_cross_attention` argument for unet for easy configuration, but it will be converted to `attn1_type` and `attn2_type` when passing down to the down blocks
    
    - the position of downsample layer and upsample layer is now configurable
    
    - in k-upscaler unet, there is only one skip connection per each up/down block (instead of each layer in stable diffusion unet), added `skip_freq = "block"` to support
    this use case
    
    - if user passes attention_mask to unet, it will prepare the mask and pass a flag to cross attention processer to skip the `prepare_attention_mask` step
    inside cross attention block
    
    add up/down blocks for k-upscaler
    
    modify CrossAttention class
    
    - make the `dropout` layer in `to_out` optional
    
    - `use_conv_proj` - use conv instead of linear for all projection layers (i.e. `to_q`, `to_k`, `to_v`, `to_out`) whenever possible. note that when it's used to do cross
    attention, to_k, to_v has to be linear because the `encoder_hidden_states` is not 2d
    
    - `cross_attention_norm` - add an optional layernorm on encoder_hidden_states
    
    - `attention_dropout`: add an optional dropout on attention score
    
    adapt BasicTransformerBlock
    
    - add an ada groupnorm layer  to conditioning attention input with timestep embedding
    
    - allow skipping the FeedForward layer in between the attentions
    
    - replaced the only_cross_attention argument with attn1_type and attn2_type for more flexible configuration
    
    update timestep embedding: add new act_fn  gelu and an optional act_2
    
    modified ResnetBlock2D
    
    - refactored with AdaGroupNorm class (the timestep scale shift normalization)
    
    - add `mid_channel` argument - allow the first conv to have a different output dimension from the second conv
    
    - add option to use input AdaGroupNorm on the input instead of groupnorm
    
    - add options to add a dropout layer after each conv
    
    - allow user to set the bias in conv_shortcut (needed for k-upscaler)
    
    - add gelu
    
    adding conversion script for k-upscaler unet
    
    add pipeline
    
    * fix attention mask
    
    * fix a typo
    
    * fix a bug
    
    * make sure model can be used with GPU
    
    * make pipeline work with fp16
    
    * fix an error in BasicTransfomerBlock
    
    * make style
    
    * fix typo
    
    * some more fixes
    
    * uP
    
    * up
    
    * correct more
    
    * some clean-up
    
    * clean time proj
    
    * up
    
    * uP
    
    * more changes
    
    * remove the upcast_attention=True from unet config
    
    * remove attn1_types, attn2_types etc
    
    * fix
    
    * revert incorrect changes up/down samplers
    
    * make style
    
    * remove outdated files
    
    * Apply suggestions from code review
    
    * attention refactor
    
    * refactor cross attention
    
    * Apply suggestions from code review
    
    * update
    
    * up
    
    * update
    
    * Apply suggestions from code review
    
    * finish
    
    * Update src/diffusers/models/cross_attention.py
    
    * more fixes
    
    * up
    
    * up
    
    * up
    
    * finish
    
    * more corrections of conversion state
    
    * act_2 -> act_2_fn
    
    * remove dropout_after_conv from ResnetBlock2D
    
    * make style
    
    * simplify KAttentionBlock
    
    * add fast test for latent upscaler pipeline
    
    * add slow test
    
    * slow test fp16
    
    * make style
    
    * add doc string for pipeline_stable_diffusion_latent_upscale
    
    * add api doc page for latent upscaler pipeline
    
    * deprecate attention mask
    
    * clean up embeddings
    
    * simplify resnet
    
    * up
    
    * clean up resnet
    
    * up
    
    * correct more
    
    * up
    
    * up
    
    * improve a bit more
    
    * correct more
    
    * more clean-ups
    
    * Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    
    * Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    
    * add docstrings for new unet config
    
    * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    
    * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    
    * # Copied from
    
    * encode the image if not latent
    
    * remove force casting vae to fp32
    
    * fix
    
    * add comments about preconditioning parameters from k-diffusion paper
    
    * attn1_type, attn2_type -> add_self_attention
    
    * clean up get_down_block and get_up_block
    
    * fix
    
    * fixed a typo(?) in ada group norm
    
    * update slice attention processer for cross attention
    
    * update slice
    
    * fix fast test
    
    * update the checkpoint
    
    * finish tests
    
    * fix-copies
    
    * fix-copy for modeling_text_unet.py
    
    * make style
    
    * make style
    
    * fix f-string
    
    * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    
    * fix import
    
    * correct changes
    
    * fix resnet
    
    * make fix-copies
    
    * correct euler scheduler
    
    * add missing #copied from for preprocess
    
    * revert
    
    * fix
    
    * fix copies
    
    * Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
    Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
    
    * Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
    Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
    
    * Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
    Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
    
    * Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx
    Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
    
    * Update src/diffusers/models/cross_attention.py
    Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
    
    * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
    Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
    
    * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
    Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
    
    * clean up conversion script
    
    * KDownsample2d,KUpsample2d -> KDownsample2D,KUpsample2D
    
    * more
    
    * Update src/diffusers/models/unet_2d_condition.py
    Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
    
    * remove prepare_extra_step_kwargs
    
    * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
    Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
    
    * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    
    * fix a typo in timestep embedding
    
    * remove num_image_per_prompt
    
    * fix fasttest
    
    * make style + fix-copies
    
    * fix
    
    * fix xformer test
    
    * fix style
    
    * doc string
    
    * make style
    
    * fix-copies
    
    * docstring for time_embedding_norm
    
    * make style
    
    * final finishes
    
    * make fix-copies
    
    * fix tests
    
    ---------
    Co-authored-by: default avataryiyixuxu <yixu@yis-macbook-pro.lan>
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
    1051ca81
unet_2d_condition.py 27.4 KB