README.md 6.64 KB
Newer Older
Patrick von Platen's avatar
Patrick von Platen committed
1
2
<p align="center">
    <br>
Anton Lozhkov's avatar
Anton Lozhkov committed
3
    <img src="docs/source/imgs/diffusers_library.jpg" width="400"/>
Patrick von Platen's avatar
Patrick von Platen committed
4
5
6
    <br>
<p>
<p align="center">
Anton Lozhkov's avatar
Anton Lozhkov committed
7
    <a href="https://github.com/huggingface/diffusers/blob/main/LICENSE">
Patrick von Platen's avatar
Patrick von Platen committed
8
9
10
        <img alt="GitHub" src="https://img.shields.io/github/license/huggingface/datasets.svg?color=blue">
    </a>
    <a href="https://github.com/huggingface/diffusers/releases">
Anton Lozhkov's avatar
Anton Lozhkov committed
11
        <img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/diffusers.svg">
Patrick von Platen's avatar
Patrick von Platen committed
12
13
14
15
16
17
18
19
20
21
22
23
24
    </a>
    <a href="CODE_OF_CONDUCT.md">
        <img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg">
    </a>
</p>

🤗 Diffusers provides pretrained diffusion models across multiple modalities, such as vision and audio, and serves
as a modular toolbox for inference and training of diffusion models.

More precisely, 🤗 Diffusers offers:

- State-of-the-art diffusion pipelines that can be run in inference with just a couple of lines of code (see [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines)).
- Various noise schedulers that can be used interchangeably for the prefered speed vs. quality trade-off in inference (see [src/diffusers/schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers)).
Suraj Patil's avatar
Suraj Patil committed
25
- Multiple types of models, such as UNet, that can be used as building blocks in an end-to-end diffusion system (see [src/diffusers/models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models)).
Patrick von Platen's avatar
up  
Patrick von Platen committed
26
- Training examples to show how to train the most popular diffusion models (see [examples](https://github.com/huggingface/diffusers/tree/main/examples)).
Patrick von Platen's avatar
Patrick von Platen committed
27

Patrick von Platen's avatar
Patrick von Platen committed
28
## Definitions
Patrick von Platen's avatar
Patrick von Platen committed
29

Kashif Rasul's avatar
Kashif Rasul committed
30
**Models**: Neural network that models $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$ (see image below) and is trained end-to-end to *denoise* a noisy input to an image.
Patrick von Platen's avatar
Patrick von Platen committed
31
*Examples*: UNet, Conditioned UNet, 3D UNet, Transformer UNet
Patrick von Platen's avatar
Patrick von Platen committed
32

Nathan Lambert's avatar
Nathan Lambert committed
33
34
35
36
37
38
<p align="center">
    <img src="https://user-images.githubusercontent.com/10695622/174349667-04e9e485-793b-429a-affe-096e8199ad5b.png" width="800"/>
    <br>
    <em> Figure from DDPM paper (https://arxiv.org/abs/2006.11239). </em>
<p>
    
Patrick von Platen's avatar
Patrick von Platen committed
39
40
41
**Schedulers**: Algorithm class for both **inference** and **training**.
The class provides functionality to compute previous image according to alpha, beta schedule as well as predict noise for training.
*Examples*: [DDPM](https://arxiv.org/abs/2006.11239), [DDIM](https://arxiv.org/abs/2010.02502), [PNDM](https://arxiv.org/abs/2202.09778), [DEIS](https://arxiv.org/abs/2204.13902)
Patrick von Platen's avatar
Patrick von Platen committed
42

Nathan Lambert's avatar
Nathan Lambert committed
43
44
45
46
47
48
<p align="center">
    <img src="https://user-images.githubusercontent.com/10695622/174349706-53d58acc-a4d1-4cda-b3e8-432d9dc7ad38.png" width="800"/>
    <br>
    <em> Sampling and training algorithms. Figure from DDPM paper (https://arxiv.org/abs/2006.11239). </em>
<p>
    
Patrick von Platen's avatar
Patrick von Platen committed
49

Patrick von Platen's avatar
Patrick von Platen committed
50
**Diffusion Pipeline**: End-to-end pipeline that includes multiple diffusion models, possible text encoders, ...
Patrick von Platen's avatar
Patrick von Platen committed
51
*Examples*: Glide, Latent-Diffusion, Imagen, DALL-E 2
Patrick von Platen's avatar
Patrick von Platen committed
52

Nathan Lambert's avatar
Nathan Lambert committed
53
54
55
56
57
58
<p align="center">
    <img src="https://user-images.githubusercontent.com/10695622/174348898-481bd7c2-5457-4830-89bc-f0907756f64c.jpeg" width="550"/>
    <br>
    <em> Figure from ImageGen (https://imagen.research.google/). </em>
<p>
    
Patrick von Platen's avatar
Patrick von Platen committed
59
60
## Philosophy

milyiyo's avatar
milyiyo committed
61
- Readability and clarity is prefered over highly optimized code. A strong importance is put on providing readable, intuitive and elementary code design. *E.g.*, the provided [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers) are separated from the provided [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and provide well-commented code that can be read alongside the original paper.
Patrick von Platen's avatar
Patrick von Platen committed
62
63
64
- Diffusers is **modality independent** and focusses on providing pretrained models and tools to build systems that generate **continous outputs**, *e.g.* vision and audio.
- Diffusion models and schedulers are provided as consise, elementary building blocks whereas diffusion pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box, should stay as close as possible to their original implementation and can include components of other library, such as text-encoders. Examples for diffusion pipelines are [Glide](https://github.com/openai/glide-text2im) and [Latent Diffusion](https://github.com/CompVis/latent-diffusion).

Patrick von Platen's avatar
Patrick von Platen committed
65
66
## Quickstart

Patrick von Platen's avatar
Patrick von Platen committed
67
68
**Check out this notebook: https://colab.research.google.com/drive/1nMfF04cIxg6FujxsNYi9kiTRrzj4_eZU?usp=sharing**

Patrick von Platen's avatar
Patrick von Platen committed
69
70
### Installation

Patrick von Platen's avatar
Patrick von Platen committed
71
```
Patrick von Platen's avatar
Patrick von Platen committed
72
pip install diffusers  # should install diffusers 0.0.4
Patrick von Platen's avatar
Patrick von Platen committed
73
```
Patrick von Platen's avatar
Patrick von Platen committed
74

Kashif Rasul's avatar
Kashif Rasul committed
75
### 1. `diffusers` as a toolbox for schedulers and models
Patrick von Platen's avatar
Patrick von Platen committed
76

Patrick von Platen's avatar
Patrick von Platen committed
77
78
`diffusers` is more modularized than `transformers`. The idea is that researchers and engineers can use only parts of the library easily for the own use cases.
It could become a central place for all kinds of models, schedulers, training utils and processors that one can mix and match for one's own use case.
Patrick von Platen's avatar
Patrick von Platen committed
79
Both models and schedulers should be load- and saveable from the Hub.
Patrick von Platen's avatar
Patrick von Platen committed
80

Patrick von Platen's avatar
Patrick von Platen committed
81
82
For more examples see [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers) and [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models)

Patrick von Platen's avatar
Patrick von Platen committed
83
#### **Example for Unconditonal Image generation [DDPM](https://arxiv.org/abs/2006.11239):**
Patrick von Platen's avatar
Patrick von Platen committed
84
85
86

```python
import torch
Patrick von Platen's avatar
Patrick von Platen committed
87
88
from diffusers import UNetUnconditionalModel, DDIMScheduler
import PIL.Image
Patrick von Platen's avatar
Patrick von Platen committed
89
import numpy as np
Patrick von Platen's avatar
Patrick von Platen committed
90
import tqdm
Patrick von Platen's avatar
Patrick von Platen committed
91

Patrick von Platen's avatar
Patrick von Platen committed
92
torch_device = "cuda" if torch.cuda.is_available() else "cpu"
Patrick von Platen's avatar
Patrick von Platen committed
93
94

# 1. Load models
Patrick von Platen's avatar
Patrick von Platen committed
95
96
scheduler = DDIMScheduler.from_config("fusing/ddpm-celeba-hq", tensor_format="pt")
unet = UNetUnconditionalModel.from_pretrained("fusing/ddpm-celeba-hq", ddpm=True).to(torch_device)
Patrick von Platen's avatar
Patrick von Platen committed
97
98

# 2. Sample gaussian noise
Patrick von Platen's avatar
Patrick von Platen committed
99
100
generator = torch.manual_seed(23)
unet.image_size = unet.resolution
Patrick von Platen's avatar
Patrick von Platen committed
101
image = torch.randn(
Patrick von Platen's avatar
Patrick von Platen committed
102
   (1, unet.in_channels, unet.image_size, unet.image_size),
Suraj Patil's avatar
Suraj Patil committed
103
   generator=generator,
Patrick von Platen's avatar
Patrick von Platen committed
104
105
)
image = image.to(torch_device)
Patrick von Platen's avatar
Patrick von Platen committed
106

Patrick von Platen's avatar
Patrick von Platen committed
107
# 3. Denoise
Patrick von Platen's avatar
Patrick von Platen committed
108
109
num_inference_steps = 50
eta = 0.0  # <- deterministic sampling
Patrick von Platen's avatar
Patrick von Platen committed
110
scheduler.set_timesteps(num_inference_steps)
Patrick von Platen's avatar
Patrick von Platen committed
111

Patrick von Platen's avatar
Patrick von Platen committed
112
for t in tqdm.tqdm(scheduler.timesteps):
Kashif Rasul's avatar
Kashif Rasul committed
113
    # 1. predict noise residual
114
    with torch.no_grad():
Patrick von Platen's avatar
Patrick von Platen committed
115
        residual = unet(image, t)["sample"]
Kashif Rasul's avatar
Kashif Rasul committed
116

Patrick von Platen's avatar
Patrick von Platen committed
117
    prev_image = scheduler.step(residual, t, image, eta)["prev_sample"]
Kashif Rasul's avatar
Kashif Rasul committed
118

Patrick von Platen's avatar
Patrick von Platen committed
119
120
    # 3. set current image to prev_image: x_t -> x_t-1
    image = prev_image
Kashif Rasul's avatar
Kashif Rasul committed
121

Patrick von Platen's avatar
Patrick von Platen committed
122
# 4. process image to PIL
Patrick von Platen's avatar
Patrick von Platen committed
123
124
125
126
image_processed = image.cpu().permute(0, 2, 3, 1)
image_processed = (image_processed + 1.0) * 127.5
image_processed = image_processed.numpy().astype(np.uint8)
image_pil = PIL.Image.fromarray(image_processed[0])
Patrick von Platen's avatar
Patrick von Platen committed
127

Patrick von Platen's avatar
Patrick von Platen committed
128
129
130
131
132
133
134
# 5. save image
image_pil.save("generated_image.png")
``` 

#### **Example for Unconditonal Image generation [LDM](https://github.com/CompVis/latent-diffusion):**

```python
Patrick von Platen's avatar
Patrick von Platen committed
135
```