"vscode:/vscode.git/clone" did not exist on "0d9b6bfd5a1ac04ec9727c14fc662b4942700b24"
Allow `UNet2DModel` to use arbitrary class embeddings (#2080)
* Allow `UNet2DModel` to use arbitrary class embeddings. We can currently use class conditioning in `UNet2DConditionModel`, but not in `UNet2DModel`. However, `UNet2DConditionModel` requires text conditioning too, which is unrelated to other types of conditioning. This commit makes it possible for `UNet2DModel` to be conditioned on entities other than timesteps. This is useful for training / research purposes. We can currently train models to perform unconditional image generation or text-to-image generation, but it's not straightforward to train a model to perform class-conditioned image generation, if text conditioning is not required. We could potentiall use `UNet2DConditionModel` for class-conditioning without text embeddings by using down/up blocks without cross-conditioning. However: - The mid block currently requires cross attention. - We are required to provide `encoder_hidden_states` to `forward`. * Style * Align class conditioning, add docstring for `num_class_embeds`. * Copy docstring to versatile_diffusion UNetFlatConditionModel
Showing
Please register or sign in to comment