"vscode:/vscode.git/clone" did not exist on "8e8954bd15d3a3c36fbbaa37c978a8b5b8379f9f"
Unverified Commit fa356bd4 authored by Cris's avatar Cris Committed by GitHub
Browse files

[docs] Changed path for ControlNet in docs (#4215)

docs: changed path for control net
parent 3ba36f97
......@@ -44,7 +44,7 @@ Unless otherwise mentioned, these are techniques that work with existing models
For convenience, we provide a table to denote which methods are inference-only and which require fine-tuning/training.
| **Method** | **Inference only** | **Requires training /<br> fine-tuning** | **Comments** |
|:---:|:---:|:---:|:---:|
| :-------------------------------------------------: | :----------------: | :-------------------------------------: | :---------------------------------------------------------------------------------------------: |
| [Instruct Pix2Pix](#instruct-pix2pix) | ✅ | ❌ | Can additionally be<br>fine-tuned for better <br>performance on specific <br>edit instructions. |
| [Pix2Pix Zero](#pix2pixzero) | ✅ | ❌ | |
| [Attend and Excite](#attend-and-excite) | ✅ | ❌ | |
......@@ -79,8 +79,9 @@ See [here](../api/pipelines/stable_diffusion/pix2pix) for more information on ho
The denoising process is guided from one conceptual embedding towards another conceptual embedding. The intermediate latents are optimized during the denoising process to push the attention maps towards reference attention maps. The reference attention maps are from the denoising process of the input image and are used to encourage semantic preservation.
Pix2Pix Zero can be used both to edit synthetic images as well as real images.
- To edit synthetic images, one first generates an image given a caption.
Next, we generate image captions for the concept that shall be edited and for the new target concept. We can use a model like [Flan-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5) for this purpose. Then, "mean" prompt embeddings for both the source and target concepts are created via the text encoder. Finally, the pix2pix-zero algorithm is used to edit the synthetic image.
Next, we generate image captions for the concept that shall be edited and for the new target concept. We can use a model like [Flan-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5) for this purpose. Then, "mean" prompt embeddings for both the source and target concepts are created via the text encoder. Finally, the pix2pix-zero algorithm is used to edit the synthetic image.
- To edit a real image, one first generates an image caption using a model like [BLIP](https://huggingface.co/docs/transformers/model_doc/blip). Then one applies ddim inversion on the prompt and image to generate "inverse" latents. Similar to before, "mean" prompt embeddings for both source and target concepts are created and finally the pix2pix-zero algorithm in combination with the "inverse" latents is used to edit the image.
<Tip>
......@@ -176,11 +177,12 @@ See [here](../training/text_inversion) for more information on how to use it.
[Paper](https://arxiv.org/abs/2302.05543)
[ControlNet](../api/pipelines/stable_diffusion/controlnet) is an auxiliary network which adds an extra condition.
[ControlNet](../api/pipelines/controlnet) is an auxiliary network which adds an extra condition.
[ControlNet](../api/pipelines/controlnet) is an auxiliary network which adds an extra condition.
There are 8 canonical pre-trained ControlNets trained on different conditionings such as edge detection, scribbles,
depth maps, and semantic segmentations.
See [here](../api/pipelines/stable_diffusion/controlnet) for more information on how to use it.
See [here](../api/pipelines/controlnet) for more information on how to use it.
## Prompt Weighting
......@@ -217,6 +219,7 @@ To know more details, check out the [official doc](../api/pipelines/stable_diffu
input prompts while preserving the original input images as much as possible.
To know more details, check out the [official doc](../api/pipelines/stable_diffusion/model_editing).
## T2I-Adapter
[Paper](https://arxiv.org/abs/2302.08453)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment