img2img.mdx 3.68 KB
Newer Older
Patrick von Platen's avatar
Patrick von Platen committed
1
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Nathan Lambert's avatar
Nathan Lambert committed
2
3
4
5
6
7
8
9
10
11
12

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

13
# Text-Guided Image-to-Image Generation
Patrick von Platen's avatar
Patrick von Platen committed
14

YiYi Xu's avatar
YiYi Xu committed
15
16
17
18
19
20
21
22
23
24
25
[[open-in-colab]]

The [`StableDiffusionImg2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images. This tutorial shows how to use it for text-guided image-to-image generation with Stable Diffusion model.

Before you begin, make sure you have all the necessary libraries installed:

```bash
!pip install diffusers transformers ftfy accelerate
```

Get started by creating a [`StableDiffusionImg2ImgPipeline`] with a pretrained Stable Diffusion model.
Patrick von Platen's avatar
Patrick von Platen committed
26

27
```python
Patrick von Platen's avatar
Patrick von Platen committed
28
import torch
29
30
31
import requests
from PIL import Image
from io import BytesIO
Patrick von Platen's avatar
Patrick von Platen committed
32

33
from diffusers import StableDiffusionImg2ImgPipeline
YiYi Xu's avatar
YiYi Xu committed
34
```
Patrick von Platen's avatar
Patrick von Platen committed
35

M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
36
Load the pipeline:
YiYi Xu's avatar
YiYi Xu committed
37
38

```python
39
device = "cuda"
40
41
42
pipe = StableDiffusionImg2ImgPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to(
    device
)
YiYi Xu's avatar
YiYi Xu committed
43
```
Patrick von Platen's avatar
Patrick von Platen committed
44

M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
45
Download an initial image and preprocess it so we can pass it to the pipeline:
YiYi Xu's avatar
YiYi Xu committed
46
47

```python
48
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
Patrick von Platen's avatar
Patrick von Platen committed
49

50
51
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
52
init_image.thumbnail((768, 768))
YiYi Xu's avatar
YiYi Xu committed
53
54
55
56
57
init_image
```

![img](https://huggingface.co/datasets/YiYiXu/test-doc-assets/resolve/main/image_2_image_using_diffusers_cell_8_output_0.jpeg)

M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
58
Define the prompt and run the pipeline:
Patrick von Platen's avatar
Patrick von Platen committed
59

YiYi Xu's avatar
YiYi Xu committed
60
```python
61
prompt = "A fantasy landscape, trending on artstation"
YiYi Xu's avatar
YiYi Xu committed
62
63
64
```

<Tip>
Patrick von Platen's avatar
Patrick von Platen committed
65

YiYi Xu's avatar
YiYi Xu committed
66
`strength` is a value between 0.0 and 1.0, that controls the amount of noise that is added to the input image. Values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input.
Patrick von Platen's avatar
Patrick von Platen committed
67

YiYi Xu's avatar
YiYi Xu committed
68
69
</Tip>

M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
70
Let's generate two images with same pipeline and seed, but with different values for `strength`:
YiYi Xu's avatar
YiYi Xu committed
71
72
73
74

```python
generator = torch.Generator(device=device).manual_seed(1024)
image = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5, generator=generator).images[0]
75
```
YiYi Xu's avatar
YiYi Xu committed
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91

```python
image
```

![img](https://huggingface.co/datasets/YiYiXu/test-doc-assets/resolve/main/image_2_image_using_diffusers_cell_13_output_0.jpeg)


```python
image = pipe(prompt=prompt, image=init_image, strength=0.5, guidance_scale=7.5, generator=generator).images[0]
image
```

![img](https://huggingface.co/datasets/YiYiXu/test-doc-assets/resolve/main/image_2_image_using_diffusers_cell_14_output_1.jpeg)


M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
92
As you can see, when using a lower value for `strength`, the generated image is more closer to the original `image`.
YiYi Xu's avatar
YiYi Xu committed
93

M. Tolga Cangöz's avatar
M. Tolga Cangöz committed
94
Now let's use a different scheduler - [LMSDiscreteScheduler](https://huggingface.co/docs/diffusers/api/schedulers#diffusers.LMSDiscreteScheduler):
YiYi Xu's avatar
YiYi Xu committed
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112

```python
from diffusers import LMSDiscreteScheduler

lms = LMSDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.scheduler = lms
```

```python
generator = torch.Generator(device=device).manual_seed(1024)
image = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5, generator=generator).images[0]
```

```python
image
```

![img](https://huggingface.co/datasets/YiYiXu/test-doc-assets/resolve/main/image_2_image_using_diffusers_cell_19_output_0.jpeg)
Patrick von Platen's avatar
Patrick von Platen committed
113