README.md 7.42 KB
Newer Older
Patrick von Platen's avatar
Patrick von Platen committed
1
2
# Diffusers

Patrick von Platen's avatar
Patrick von Platen committed
3
## Definitions
Patrick von Platen's avatar
Patrick von Platen committed
4

Patrick von Platen's avatar
Patrick von Platen committed
5
6
7
8
9
**Models**: Single neural network that models p_θ(x_t-1|x_t) and is trained to “denoise” to image
*Examples: UNet, Conditioned UNet, 3D UNet, Transformer UNet*

![model_diff_1_50](https://user-images.githubusercontent.com/23423619/171610307-dab0cd8b-75da-4d4e-9f5a-5922072e2bb5.png)

Patrick von Platen's avatar
Patrick von Platen committed
10
11
**Schedulers**: Algorithm to sample noise schedule for both *training* and *inference*. Defines alpha and beta schedule, timesteps, etc..
*Example: Gaussian DDPM, DDIM, PMLS, DEIN*
Patrick von Platen's avatar
Patrick von Platen committed
12
13
14
15
16
17
18
19

![sampling](https://user-images.githubusercontent.com/23423619/171608981-3ad05953-a684-4c82-89f8-62a459147a07.png)
![training](https://user-images.githubusercontent.com/23423619/171608964-b3260cce-e6b4-4841-959d-7d8ba4b8d1b2.png)

**Diffusion Pipeline**: End-to-end pipeline that includes multiple diffusion models, possible text encoders, CLIP
*Example: GLIDE,CompVis/Latent-Diffusion, Imagen, DALL-E*

![imagen](https://user-images.githubusercontent.com/23423619/171609001-c3f2c1c9-f597-4a16-9843-749bf3f9431c.png)
Patrick von Platen's avatar
Patrick von Platen committed
20

Patrick von Platen's avatar
Patrick von Platen committed
21
22
## 1. `diffusers` as a central modular diffusion and sampler library

Patrick von Platen's avatar
Patrick von Platen committed
23
24
`diffusers` is more modularized than `transformers`. The idea is that researchers and engineers can use only parts of the library easily for the own use cases.
It could become a central place for all kinds of models, schedulers, training utils and processors that one can mix and match for one's own use case.
Patrick von Platen's avatar
Patrick von Platen committed
25
Both models and schedulers should be load- and saveable from the Hub.
Patrick von Platen's avatar
Patrick von Platen committed
26

Patrick von Platen's avatar
Patrick von Platen committed
27
Example for [DDPM](https://arxiv.org/abs/2006.11239):
Patrick von Platen's avatar
Patrick von Platen committed
28
29
30

```python
import torch
Patrick von Platen's avatar
Patrick von Platen committed
31
32
33
34
from diffusers import UNetModel, GaussianDDPMScheduler
import PIL
import numpy as np

Patrick von Platen's avatar
Patrick von Platen committed
35
generator = torch.manual_seed(0)
Patrick von Platen's avatar
Patrick von Platen committed
36
torch_device = "cuda" if torch.cuda.is_available() else "cpu"
Patrick von Platen's avatar
Patrick von Platen committed
37
38

# 1. Load models
Patrick von Platen's avatar
Patrick von Platen committed
39
noise_scheduler = GaussianDDPMScheduler.from_config("fusing/ddpm-lsun-church")
Patrick von Platen's avatar
Patrick von Platen committed
40
41
42
model = UNetModel.from_pretrained("fusing/ddpm-lsun-church").to(torch_device)

# 2. Sample gaussian noise
Patrick von Platen's avatar
Patrick von Platen committed
43
image = noise_scheduler.sample_noise((1, model.in_channels, model.resolution, model.resolution), device=torch_device, generator=generator)
Patrick von Platen's avatar
Patrick von Platen committed
44
45

# 3. Denoise                                                                                                                                           
Patrick von Platen's avatar
Patrick von Platen committed
46
47
48
49
50
num_prediction_steps = len(noise_scheduler)
for t in tqdm.tqdm(reversed(range(num_prediction_steps)), total=num_prediction_steps):
		# predict noise residual
		with torch.no_grad():
				residual = self.unet(image, t)
Patrick von Platen's avatar
Patrick von Platen committed
51

Patrick von Platen's avatar
Patrick von Platen committed
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
		# predict previous mean of image x_t-1
		pred_prev_image = noise_scheduler.get_prev_image_step(residual, image, t)

		# optionally sample variance
		variance = 0
		if t > 0:
				noise = noise_scheduler.sample_noise(image.shape, device=image.device, generator=generator)
				variance = noise_scheduler.get_variance(t).sqrt() * noise

		# set current image to prev_image: x_t -> x_t-1
		image = pred_prev_image + variance

# 5. process image to PIL
image_processed = image.cpu().permute(0, 2, 3, 1)
image_processed = (image_processed + 1.0) * 127.5
image_processed = image_processed.numpy().astype(np.uint8)
image_pil = PIL.Image.fromarray(image_processed[0])

# 6. save image
image_pil.save("test.png")
```

Example for [DDIM](https://arxiv.org/abs/2010.02502):

```python
import torch
from diffusers import UNetModel, DDIMScheduler
import PIL
import numpy as np

generator = torch.manual_seed(0)
torch_device = "cuda" if torch.cuda.is_available() else "cpu"

# 1. Load models
noise_scheduler = DDIMScheduler.from_config("fusing/ddpm-celeba-hq")
model = UNetModel.from_pretrained("fusing/ddpm-celeba-hq").to(torch_device)

# 2. Sample gaussian noise
image = noise_scheduler.sample_noise((1, model.in_channels, model.resolution, model.resolution), device=torch_device, generator=generator)

# 3. Denoise                                                                                                                                           
num_inference_steps = 50
eta = 0.0  # <- deterministic sampling

for t in tqdm.tqdm(reversed(range(num_inference_steps)), total=num_inference_steps):
		# 1. predict noise residual
		with torch.no_grad():
				residual = self.unet(image, inference_step_times[t])

		# 2. predict previous mean of image x_t-1
		pred_prev_image = noise_scheduler.get_prev_image_step(residual, image, t, num_inference_steps, eta)

		# 3. optionally sample variance
		variance = 0
		if eta > 0:
				noise = noise_scheduler.sample_noise(image.shape, device=image.device, generator=generator)
				variance = noise_scheduler.get_variance(t).sqrt() * eta * noise

		# 4. set current image to prev_image: x_t -> x_t-1
		image = pred_prev_image + variance

# 5. process image to PIL
Patrick von Platen's avatar
Patrick von Platen committed
114
115
116
117
image_processed = image.cpu().permute(0, 2, 3, 1)
image_processed = (image_processed + 1.0) * 127.5
image_processed = image_processed.numpy().astype(np.uint8)
image_pil = PIL.Image.fromarray(image_processed[0])
Patrick von Platen's avatar
Patrick von Platen committed
118

Patrick von Platen's avatar
Patrick von Platen committed
119
# 6. save image
Patrick von Platen's avatar
Patrick von Platen committed
120
image_pil.save("test.png")
Patrick von Platen's avatar
Patrick von Platen committed
121
122
```

Patrick von Platen's avatar
Patrick von Platen committed
123
124
## 2. `diffusers` as a collection of most important Diffusion systems (GLIDE, Dalle, ...)
`models` directory in repository hosts the complete code necessary for running a diffusion system as well as to train it. A `DiffusionPipeline` class allows to easily run the diffusion model in inference:
Patrick von Platen's avatar
Patrick von Platen committed
125
126
127
128

Example:

```python
Suraj Patil's avatar
Suraj Patil committed
129
from diffusers import DiffusionPipeline
Patrick von Platen's avatar
Patrick von Platen committed
130
131
import PIL.Image
import numpy as np
Patrick von Platen's avatar
Patrick von Platen committed
132

Patrick von Platen's avatar
Patrick von Platen committed
133
# load model and scheduler
Suraj Patil's avatar
Suraj Patil committed
134
ddpm = DiffusionPipeline.from_pretrained("fusing/ddpm-lsun-bedroom")
Patrick von Platen's avatar
Patrick von Platen committed
135
136

# run pipeline in inference (sample random noise and denoise)
Patrick von Platen's avatar
Patrick von Platen committed
137
138
image = ddpm()

Patrick von Platen's avatar
Patrick von Platen committed
139
# process image to PIL
Patrick von Platen's avatar
Patrick von Platen committed
140
141
142
143
image_processed = image.cpu().permute(0, 2, 3, 1)
image_processed = (image_processed + 1.0) * 127.5
image_processed = image_processed.numpy().astype(np.uint8)
image_pil = PIL.Image.fromarray(image_processed[0])
Patrick von Platen's avatar
Patrick von Platen committed
144
145

# save image
Patrick von Platen's avatar
Patrick von Platen committed
146
image_pil.save("test.png")
Patrick von Platen's avatar
Patrick von Platen committed
147
148
```

Patrick von Platen's avatar
Patrick von Platen committed
149
150
151
152
## Library structure:

```
├── models
Patrick von Platen's avatar
Patrick von Platen committed
153
154
155
156
157
│   ├── audio
│   │   └── fastdiff
│   │       ├── modeling_fastdiff.py
│   │       ├── README.md
│   │       └── run_fastdiff.py
Patrick von Platen's avatar
Patrick von Platen committed
158
│   ├── __init__.py
Patrick von Platen's avatar
Patrick von Platen committed
159
160
161
162
163
164
│   └── vision
│       ├── dalle2
│       │   ├── modeling_dalle2.py
│       │   ├── README.md
│       │   └── run_dalle2.py
│       ├── ddpm
Patrick von Platen's avatar
Patrick von Platen committed
165
│       │   ├── example.py
Patrick von Platen's avatar
Patrick von Platen committed
166
167
168
169
170
│       │   ├── modeling_ddpm.py
│       │   ├── README.md
│       │   └── run_ddpm.py
│       ├── glide
│       │   ├── modeling_glide.py
Patrick von Platen's avatar
Patrick von Platen committed
171
│       │   ├── modeling_vqvae.py.py
Patrick von Platen's avatar
Patrick von Platen committed
172
│       │   ├── README.md
Patrick von Platen's avatar
Patrick von Platen committed
173
│       │   └── run_glide.py
Patrick von Platen's avatar
Patrick von Platen committed
174
175
176
177
│       ├── imagen
│       │   ├── modeling_dalle2.py
│       │   ├── README.md
│       │   └── run_dalle2.py
Patrick von Platen's avatar
Patrick von Platen committed
178
│       ├── __init__.py
Patrick von Platen's avatar
Patrick von Platen committed
179
180
181
182
│       └── latent_diffusion
│           ├── modeling_latent_diffusion.py
│           ├── README.md
│           └── run_latent_diffusion.py
Patrick von Platen's avatar
Patrick von Platen committed
183
184
185
186
├── pyproject.toml
├── README.md
├── setup.cfg
├── setup.py
Patrick von Platen's avatar
Patrick von Platen committed
187
188
189
190
191
192
├── src
│   └── diffusers
│       ├── configuration_utils.py
│       ├── __init__.py
│       ├── modeling_utils.py
│       ├── models
Patrick von Platen's avatar
Patrick von Platen committed
193
194
│       │   ├── __init__.py
│       │   ├── unet_glide.py
Patrick von Platen's avatar
Patrick von Platen committed
195
│       │   └── unet.py
Patrick von Platen's avatar
Patrick von Platen committed
196
│       ├── pipeline_utils.py
Patrick von Platen's avatar
Patrick von Platen committed
197
198
│       └── schedulers
│           ├── gaussian_ddpm.py
Patrick von Platen's avatar
Patrick von Platen committed
199
│           ├── __init__.py
Patrick von Platen's avatar
Patrick von Platen committed
200
201
202
├── tests
│   └── test_modeling_utils.py
```