"vscode:/vscode.git/clone" did not exist on "406524821457fb52123d7b3e433e016b4a2a1d2f"
README.md 6.19 KB
Newer Older
Patrick von Platen's avatar
Patrick von Platen committed
1
2
# Diffusers

Patrick von Platen's avatar
Patrick von Platen committed
3
## Definitions
Patrick von Platen's avatar
Patrick von Platen committed
4

Patrick von Platen's avatar
Patrick von Platen committed
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
**Models**: Single neural network that models p_θ(x_t-1|x_t) and is trained to “denoise” to image
*Examples: UNet, Conditioned UNet, 3D UNet, Transformer UNet*

![model_diff_1_50](https://user-images.githubusercontent.com/23423619/171610307-dab0cd8b-75da-4d4e-9f5a-5922072e2bb5.png)

**Samplers**: Algorithm to *train* and *sample* from **Model**. Defines alpha and beta schedule, timesteps, etc..
*Example: Vanilla DDPM, DDIM, PMLS, DEIN*

![sampling](https://user-images.githubusercontent.com/23423619/171608981-3ad05953-a684-4c82-89f8-62a459147a07.png)
![training](https://user-images.githubusercontent.com/23423619/171608964-b3260cce-e6b4-4841-959d-7d8ba4b8d1b2.png)

**Diffusion Pipeline**: End-to-end pipeline that includes multiple diffusion models, possible text encoders, CLIP
*Example: GLIDE,CompVis/Latent-Diffusion, Imagen, DALL-E*

![imagen](https://user-images.githubusercontent.com/23423619/171609001-c3f2c1c9-f597-4a16-9843-749bf3f9431c.png)
Patrick von Platen's avatar
Patrick von Platen committed
20

Patrick von Platen's avatar
Patrick von Platen committed
21
22
23
## 1. `diffusers` as a central modular diffusion and sampler library

`diffusers` should be more modularized than `transformers` so that parts of it can be easily used in other libraries.
Patrick von Platen's avatar
Patrick von Platen committed
24
It could become a central place for all kinds of models, schedulers, training utils and processors required when using diffusion models in audio, vision, ... 
Patrick von Platen's avatar
Patrick von Platen committed
25
26
27
28
29
30
One should be able to save both models and samplers as well as load them from the Hub.

Example:

```python
import torch
Patrick von Platen's avatar
Patrick von Platen committed
31
32
33
34
35
36
from diffusers import UNetModel, GaussianDDPMScheduler
import PIL
import numpy as np

generator = torch.Generator()
generator = generator.manual_seed(6694729458485568)
Patrick von Platen's avatar
Patrick von Platen committed
37
torch_device = "cuda" if torch.cuda.is_available() else "cpu"
Patrick von Platen's avatar
Patrick von Platen committed
38
39
40
41
42
43
44
45
46
47
48

# 1. Load models
scheduler = GaussianDDPMScheduler.from_config("fusing/ddpm-lsun-church")
model = UNetModel.from_pretrained("fusing/ddpm-lsun-church").to(torch_device)

# 2. Sample gaussian noise
image = scheduler.sample_noise((1, model.in_channels, model.resolution, model.resolution), device=torch_device, generator=generator)

# 3. Denoise                                                                                                                                           
for t in reversed(range(len(scheduler))):
    # i) define coefficients for time step t
patil-suraj's avatar
patil-suraj committed
49
50
    clipped_image_coeff = 1 / torch.sqrt(scheduler.get_alpha_prod(t))
    clipped_noise_coeff = torch.sqrt(1 / scheduler.get_alpha_prod(t) - 1)
Patrick von Platen's avatar
Patrick von Platen committed
51
    image_coeff = (1 - scheduler.get_alpha_prod(t - 1)) * torch.sqrt(scheduler.get_alpha(t)) / (1 - scheduler.get_alpha_prod(t))
patil-suraj's avatar
patil-suraj committed
52
    clipped_coeff = torch.sqrt(scheduler.get_alpha_prod(t - 1)) * scheduler.get_beta(t) / (1 - scheduler.get_alpha_prod(t))
Patrick von Platen's avatar
Patrick von Platen committed
53
54
55
56
57
58
59

    # ii) predict noise residual
    with torch.no_grad():
        noise_residual = model(image, t)

    # iii) compute predicted image from residual
    # See 2nd formula at https://github.com/hojonathanho/diffusion/issues/5#issue-896554416 for comparison
patil-suraj's avatar
patil-suraj committed
60
    pred_mean = clipped_image_coeff * image - clipped_noise_coeff * noise_residual
Patrick von Platen's avatar
Patrick von Platen committed
61
    pred_mean = torch.clamp(pred_mean, -1, 1)
patil-suraj's avatar
patil-suraj committed
62
    prev_image = clipped_coeff * pred_mean + image_coeff * image
Patrick von Platen's avatar
Patrick von Platen committed
63
64
65
66
67
68
69
70

    # iv) sample variance
    prev_variance = scheduler.sample_variance(t, prev_image.shape, device=torch_device, generator=generator)

    # v) sample  x_{t-1} ~ N(prev_image, prev_variance)
    sampled_prev_image = prev_image + prev_variance
    image = sampled_prev_image

Patrick von Platen's avatar
Patrick von Platen committed
71
# process image to PIL
Patrick von Platen's avatar
Patrick von Platen committed
72
73
74
75
image_processed = image.cpu().permute(0, 2, 3, 1)
image_processed = (image_processed + 1.0) * 127.5
image_processed = image_processed.numpy().astype(np.uint8)
image_pil = PIL.Image.fromarray(image_processed[0])
Patrick von Platen's avatar
Patrick von Platen committed
76
77

# save image
Patrick von Platen's avatar
Patrick von Platen committed
78
image_pil.save("test.png")
Patrick von Platen's avatar
Patrick von Platen committed
79
80
81
82
83
84
85
86
```

## 2. `diffusers` as a collection of most import Diffusion models (GLIDE, Dalle, ...)
`models` directory in repository hosts complete diffusion training code & pipelines. Easily load & saveable from the Hub. Will be possible to use just from pip `diffusers` version:

Example:

```python
Suraj Patil's avatar
Suraj Patil committed
87
from diffusers import DiffusionPipeline
Patrick von Platen's avatar
Patrick von Platen committed
88
89
import PIL.Image
import numpy as np
Patrick von Platen's avatar
Patrick von Platen committed
90

Patrick von Platen's avatar
Patrick von Platen committed
91
# load model and scheduler
Suraj Patil's avatar
Suraj Patil committed
92
ddpm = DiffusionPipeline.from_pretrained("fusing/ddpm-lsun-bedroom")
Patrick von Platen's avatar
Patrick von Platen committed
93
94

# run pipeline in inference (sample random noise and denoise)
Patrick von Platen's avatar
Patrick von Platen committed
95
96
image = ddpm()

Patrick von Platen's avatar
Patrick von Platen committed
97
# process image to PIL
Patrick von Platen's avatar
Patrick von Platen committed
98
99
100
101
image_processed = image.cpu().permute(0, 2, 3, 1)
image_processed = (image_processed + 1.0) * 127.5
image_processed = image_processed.numpy().astype(np.uint8)
image_pil = PIL.Image.fromarray(image_processed[0])
Patrick von Platen's avatar
Patrick von Platen committed
102
103

# save image
Patrick von Platen's avatar
Patrick von Platen committed
104
image_pil.save("test.png")
Patrick von Platen's avatar
Patrick von Platen committed
105
106
```

Patrick von Platen's avatar
Patrick von Platen committed
107
108
109
110
## Library structure:

```
├── models
Patrick von Platen's avatar
Patrick von Platen committed
111
112
113
114
115
│   ├── audio
│   │   └── fastdiff
│   │       ├── modeling_fastdiff.py
│   │       ├── README.md
│   │       └── run_fastdiff.py
Patrick von Platen's avatar
Patrick von Platen committed
116
│   ├── __init__.py
Patrick von Platen's avatar
Patrick von Platen committed
117
118
119
120
121
122
│   └── vision
│       ├── dalle2
│       │   ├── modeling_dalle2.py
│       │   ├── README.md
│       │   └── run_dalle2.py
│       ├── ddpm
Patrick von Platen's avatar
Patrick von Platen committed
123
│       │   ├── example.py
Patrick von Platen's avatar
Patrick von Platen committed
124
125
126
127
128
│       │   ├── modeling_ddpm.py
│       │   ├── README.md
│       │   └── run_ddpm.py
│       ├── glide
│       │   ├── modeling_glide.py
Patrick von Platen's avatar
Patrick von Platen committed
129
│       │   ├── modeling_vqvae.py.py
Patrick von Platen's avatar
Patrick von Platen committed
130
│       │   ├── README.md
Patrick von Platen's avatar
Patrick von Platen committed
131
│       │   └── run_glide.py
Patrick von Platen's avatar
Patrick von Platen committed
132
133
134
135
│       ├── imagen
│       │   ├── modeling_dalle2.py
│       │   ├── README.md
│       │   └── run_dalle2.py
Patrick von Platen's avatar
Patrick von Platen committed
136
│       ├── __init__.py
Patrick von Platen's avatar
Patrick von Platen committed
137
138
139
140
│       └── latent_diffusion
│           ├── modeling_latent_diffusion.py
│           ├── README.md
│           └── run_latent_diffusion.py
Patrick von Platen's avatar
Patrick von Platen committed
141
142
143
144
├── pyproject.toml
├── README.md
├── setup.cfg
├── setup.py
Patrick von Platen's avatar
Patrick von Platen committed
145
146
147
148
149
150
├── src
│   └── diffusers
│       ├── configuration_utils.py
│       ├── __init__.py
│       ├── modeling_utils.py
│       ├── models
Patrick von Platen's avatar
Patrick von Platen committed
151
152
│       │   ├── __init__.py
│       │   ├── unet_glide.py
Patrick von Platen's avatar
Patrick von Platen committed
153
│       │   └── unet.py
Patrick von Platen's avatar
Patrick von Platen committed
154
│       ├── pipeline_utils.py
Patrick von Platen's avatar
Patrick von Platen committed
155
156
│       └── schedulers
│           ├── gaussian_ddpm.py
Patrick von Platen's avatar
Patrick von Platen committed
157
│           ├── __init__.py
Patrick von Platen's avatar
Patrick von Platen committed
158
159
160
├── tests
│   └── test_modeling_utils.py
```