depth2img.md 2.7 KB
Newer Older
Patrick von Platen's avatar
Patrick von Platen committed
1
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Patrick von Platen's avatar
Patrick von Platen committed
2
3
4
5
6
7
8
9
10
11
12

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

13
# Text-guided depth-to-image generation
Patrick von Platen's avatar
Patrick von Platen committed
14

15
16
17
18
19
[[open-in-colab]]

The [`StableDiffusionDepth2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images. In addition, you can also pass a `depth_map` to preserve the image structure. If no `depth_map` is provided, the pipeline automatically predicts the depth via an integrated [depth-estimation model](https://github.com/isl-org/MiDaS).

Start by creating an instance of the [`StableDiffusionDepth2ImgPipeline`]:
Patrick von Platen's avatar
Patrick von Platen committed
20
21
22
23
24
25
26
27
28
29
30
31

```python
import torch
import requests
from PIL import Image

from diffusers import StableDiffusionDepth2ImgPipeline

pipe = StableDiffusionDepth2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-depth",
    torch_dtype=torch.float16,
).to("cuda")
32
```
Patrick von Platen's avatar
Patrick von Platen committed
33

34
Now pass your prompt to the pipeline. You can also pass a `negative_prompt` to prevent certain words from guiding how an image is generated:
Patrick von Platen's avatar
Patrick von Platen committed
35

36
```python
Patrick von Platen's avatar
Patrick von Platen committed
37
38
39
40
41
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
init_image = Image.open(requests.get(url, stream=True).raw)
prompt = "two tigers"
n_prompt = "bad, deformed, ugly, bad anatomy"
image = pipe(prompt=prompt, image=init_image, negative_prompt=n_prompt, strength=0.7).images[0]
42
image
Patrick von Platen's avatar
Patrick von Platen committed
43
```
44
45
46
47
48
49
50
51
52
53
54
55
56

| Input                                                                           | Output                                                                                                                                |
|---------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/coco-cats.png" width="500"/> | <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/depth2img-tigers.png" width="500"/> |

Play around with the Spaces below and see if you notice a difference between generated images with and without a depth map!

<iframe
	src="https://radames-stable-diffusion-depth2img.hf.space"
	frameborder="0"
	width="850"
	height="500"
></iframe>