README.md 6.26 KB
Newer Older
Patrick von Platen's avatar
Patrick von Platen committed
1
2
# Diffusers

Patrick von Platen's avatar
Patrick von Platen committed
3
## Definitions
Patrick von Platen's avatar
Patrick von Platen committed
4

Patrick von Platen's avatar
Patrick von Platen committed
5
6
7
8
9
**Models**: Single neural network that models p_θ(x_t-1|x_t) and is trained to “denoise” to image
*Examples: UNet, Conditioned UNet, 3D UNet, Transformer UNet*

![model_diff_1_50](https://user-images.githubusercontent.com/23423619/171610307-dab0cd8b-75da-4d4e-9f5a-5922072e2bb5.png)

Patrick von Platen's avatar
Patrick von Platen committed
10
11
**Schedulers**: Algorithm to sample noise schedule for both *training* and *inference*. Defines alpha and beta schedule, timesteps, etc..
*Example: Gaussian DDPM, DDIM, PMLS, DEIN*
Patrick von Platen's avatar
Patrick von Platen committed
12
13
14
15
16
17
18
19

![sampling](https://user-images.githubusercontent.com/23423619/171608981-3ad05953-a684-4c82-89f8-62a459147a07.png)
![training](https://user-images.githubusercontent.com/23423619/171608964-b3260cce-e6b4-4841-959d-7d8ba4b8d1b2.png)

**Diffusion Pipeline**: End-to-end pipeline that includes multiple diffusion models, possible text encoders, CLIP
*Example: GLIDE,CompVis/Latent-Diffusion, Imagen, DALL-E*

![imagen](https://user-images.githubusercontent.com/23423619/171609001-c3f2c1c9-f597-4a16-9843-749bf3f9431c.png)
Patrick von Platen's avatar
Patrick von Platen committed
20

Patrick von Platen's avatar
Patrick von Platen committed
21
22
## 1. `diffusers` as a central modular diffusion and sampler library

Patrick von Platen's avatar
Patrick von Platen committed
23
24
25
`diffusers` is more modularized than `transformers`. The idea is that researchers and engineers can use only parts of the library easily for the own use cases.
It could become a central place for all kinds of models, schedulers, training utils and processors that one can mix and match for one's own use case.
Both models and scredulers should be load- and saveable from the Hub.
Patrick von Platen's avatar
Patrick von Platen committed
26
27
28
29
30

Example:

```python
import torch
Patrick von Platen's avatar
Patrick von Platen committed
31
32
33
34
35
36
from diffusers import UNetModel, GaussianDDPMScheduler
import PIL
import numpy as np

generator = torch.Generator()
generator = generator.manual_seed(6694729458485568)
Patrick von Platen's avatar
Patrick von Platen committed
37
torch_device = "cuda" if torch.cuda.is_available() else "cpu"
Patrick von Platen's avatar
Patrick von Platen committed
38
39
40
41
42
43
44
45
46
47
48

# 1. Load models
scheduler = GaussianDDPMScheduler.from_config("fusing/ddpm-lsun-church")
model = UNetModel.from_pretrained("fusing/ddpm-lsun-church").to(torch_device)

# 2. Sample gaussian noise
image = scheduler.sample_noise((1, model.in_channels, model.resolution, model.resolution), device=torch_device, generator=generator)

# 3. Denoise                                                                                                                                           
for t in reversed(range(len(scheduler))):
    # i) define coefficients for time step t
patil-suraj's avatar
patil-suraj committed
49
50
    clipped_image_coeff = 1 / torch.sqrt(scheduler.get_alpha_prod(t))
    clipped_noise_coeff = torch.sqrt(1 / scheduler.get_alpha_prod(t) - 1)
Patrick von Platen's avatar
Patrick von Platen committed
51
    image_coeff = (1 - scheduler.get_alpha_prod(t - 1)) * torch.sqrt(scheduler.get_alpha(t)) / (1 - scheduler.get_alpha_prod(t))
patil-suraj's avatar
patil-suraj committed
52
    clipped_coeff = torch.sqrt(scheduler.get_alpha_prod(t - 1)) * scheduler.get_beta(t) / (1 - scheduler.get_alpha_prod(t))
Patrick von Platen's avatar
Patrick von Platen committed
53
54
55
56
57
58
59

    # ii) predict noise residual
    with torch.no_grad():
        noise_residual = model(image, t)

    # iii) compute predicted image from residual
    # See 2nd formula at https://github.com/hojonathanho/diffusion/issues/5#issue-896554416 for comparison
patil-suraj's avatar
patil-suraj committed
60
    pred_mean = clipped_image_coeff * image - clipped_noise_coeff * noise_residual
Patrick von Platen's avatar
Patrick von Platen committed
61
    pred_mean = torch.clamp(pred_mean, -1, 1)
patil-suraj's avatar
patil-suraj committed
62
    prev_image = clipped_coeff * pred_mean + image_coeff * image
Patrick von Platen's avatar
Patrick von Platen committed
63
64
65
66
67
68
69
70

    # iv) sample variance
    prev_variance = scheduler.sample_variance(t, prev_image.shape, device=torch_device, generator=generator)

    # v) sample  x_{t-1} ~ N(prev_image, prev_variance)
    sampled_prev_image = prev_image + prev_variance
    image = sampled_prev_image

Patrick von Platen's avatar
Patrick von Platen committed
71
# process image to PIL
Patrick von Platen's avatar
Patrick von Platen committed
72
73
74
75
image_processed = image.cpu().permute(0, 2, 3, 1)
image_processed = (image_processed + 1.0) * 127.5
image_processed = image_processed.numpy().astype(np.uint8)
image_pil = PIL.Image.fromarray(image_processed[0])
Patrick von Platen's avatar
Patrick von Platen committed
76
77

# save image
Patrick von Platen's avatar
Patrick von Platen committed
78
image_pil.save("test.png")
Patrick von Platen's avatar
Patrick von Platen committed
79
80
```

Patrick von Platen's avatar
Patrick von Platen committed
81
82
## 2. `diffusers` as a collection of most important Diffusion systems (GLIDE, Dalle, ...)
`models` directory in repository hosts the complete code necessary for running a diffusion system as well as to train it. A `DiffusionPipeline` class allows to easily run the diffusion model in inference:
Patrick von Platen's avatar
Patrick von Platen committed
83
84
85
86

Example:

```python
Suraj Patil's avatar
Suraj Patil committed
87
from diffusers import DiffusionPipeline
Patrick von Platen's avatar
Patrick von Platen committed
88
89
import PIL.Image
import numpy as np
Patrick von Platen's avatar
Patrick von Platen committed
90

Patrick von Platen's avatar
Patrick von Platen committed
91
# load model and scheduler
Suraj Patil's avatar
Suraj Patil committed
92
ddpm = DiffusionPipeline.from_pretrained("fusing/ddpm-lsun-bedroom")
Patrick von Platen's avatar
Patrick von Platen committed
93
94

# run pipeline in inference (sample random noise and denoise)
Patrick von Platen's avatar
Patrick von Platen committed
95
96
image = ddpm()

Patrick von Platen's avatar
Patrick von Platen committed
97
# process image to PIL
Patrick von Platen's avatar
Patrick von Platen committed
98
99
100
101
image_processed = image.cpu().permute(0, 2, 3, 1)
image_processed = (image_processed + 1.0) * 127.5
image_processed = image_processed.numpy().astype(np.uint8)
image_pil = PIL.Image.fromarray(image_processed[0])
Patrick von Platen's avatar
Patrick von Platen committed
102
103

# save image
Patrick von Platen's avatar
Patrick von Platen committed
104
image_pil.save("test.png")
Patrick von Platen's avatar
Patrick von Platen committed
105
106
```

Patrick von Platen's avatar
Patrick von Platen committed
107
108
109
110
## Library structure:

```
├── models
Patrick von Platen's avatar
Patrick von Platen committed
111
112
113
114
115
│   ├── audio
│   │   └── fastdiff
│   │       ├── modeling_fastdiff.py
│   │       ├── README.md
│   │       └── run_fastdiff.py
Patrick von Platen's avatar
Patrick von Platen committed
116
│   ├── __init__.py
Patrick von Platen's avatar
Patrick von Platen committed
117
118
119
120
121
122
│   └── vision
│       ├── dalle2
│       │   ├── modeling_dalle2.py
│       │   ├── README.md
│       │   └── run_dalle2.py
│       ├── ddpm
Patrick von Platen's avatar
Patrick von Platen committed
123
│       │   ├── example.py
Patrick von Platen's avatar
Patrick von Platen committed
124
125
126
127
128
│       │   ├── modeling_ddpm.py
│       │   ├── README.md
│       │   └── run_ddpm.py
│       ├── glide
│       │   ├── modeling_glide.py
Patrick von Platen's avatar
Patrick von Platen committed
129
│       │   ├── modeling_vqvae.py.py
Patrick von Platen's avatar
Patrick von Platen committed
130
│       │   ├── README.md
Patrick von Platen's avatar
Patrick von Platen committed
131
│       │   └── run_glide.py
Patrick von Platen's avatar
Patrick von Platen committed
132
133
134
135
│       ├── imagen
│       │   ├── modeling_dalle2.py
│       │   ├── README.md
│       │   └── run_dalle2.py
Patrick von Platen's avatar
Patrick von Platen committed
136
│       ├── __init__.py
Patrick von Platen's avatar
Patrick von Platen committed
137
138
139
140
│       └── latent_diffusion
│           ├── modeling_latent_diffusion.py
│           ├── README.md
│           └── run_latent_diffusion.py
Patrick von Platen's avatar
Patrick von Platen committed
141
142
143
144
├── pyproject.toml
├── README.md
├── setup.cfg
├── setup.py
Patrick von Platen's avatar
Patrick von Platen committed
145
146
147
148
149
150
├── src
│   └── diffusers
│       ├── configuration_utils.py
│       ├── __init__.py
│       ├── modeling_utils.py
│       ├── models
Patrick von Platen's avatar
Patrick von Platen committed
151
152
│       │   ├── __init__.py
│       │   ├── unet_glide.py
Patrick von Platen's avatar
Patrick von Platen committed
153
│       │   └── unet.py
Patrick von Platen's avatar
Patrick von Platen committed
154
│       ├── pipeline_utils.py
Patrick von Platen's avatar
Patrick von Platen committed
155
156
│       └── schedulers
│           ├── gaussian_ddpm.py
Patrick von Platen's avatar
Patrick von Platen committed
157
│           ├── __init__.py
Patrick von Platen's avatar
Patrick von Platen committed
158
159
160
├── tests
│   └── test_modeling_utils.py
```