README.md 9.39 KB
Newer Older
Patrick von Platen's avatar
Patrick von Platen committed
1
2
<p align="center">
    <br>
Anton Lozhkov's avatar
Anton Lozhkov committed
3
    <img src="docs/source/imgs/diffusers_library.jpg" width="400"/>
Patrick von Platen's avatar
Patrick von Platen committed
4
5
6
    <br>
<p>
<p align="center">
Anton Lozhkov's avatar
Anton Lozhkov committed
7
    <a href="https://github.com/huggingface/diffusers/blob/main/LICENSE">
Patrick von Platen's avatar
Patrick von Platen committed
8
9
10
        <img alt="GitHub" src="https://img.shields.io/github/license/huggingface/datasets.svg?color=blue">
    </a>
    <a href="https://github.com/huggingface/diffusers/releases">
Anton Lozhkov's avatar
Anton Lozhkov committed
11
        <img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/diffusers.svg">
Patrick von Platen's avatar
Patrick von Platen committed
12
13
14
15
16
17
18
19
20
21
22
23
24
    </a>
    <a href="CODE_OF_CONDUCT.md">
        <img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg">
    </a>
</p>

🤗 Diffusers provides pretrained diffusion models across multiple modalities, such as vision and audio, and serves
as a modular toolbox for inference and training of diffusion models.

More precisely, 🤗 Diffusers offers:

- State-of-the-art diffusion pipelines that can be run in inference with just a couple of lines of code (see [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines)).
- Various noise schedulers that can be used interchangeably for the prefered speed vs. quality trade-off in inference (see [src/diffusers/schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers)).
Suraj Patil's avatar
Suraj Patil committed
25
- Multiple types of models, such as UNet, that can be used as building blocks in an end-to-end diffusion system (see [src/diffusers/models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models)).
Patrick von Platen's avatar
up  
Patrick von Platen committed
26
- Training examples to show how to train the most popular diffusion models (see [examples](https://github.com/huggingface/diffusers/tree/main/examples)).
Patrick von Platen's avatar
Patrick von Platen committed
27

Patrick von Platen's avatar
Patrick von Platen committed
28
## Definitions
Patrick von Platen's avatar
Patrick von Platen committed
29

Kashif Rasul's avatar
Kashif Rasul committed
30
**Models**: Neural network that models $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$ (see image below) and is trained end-to-end to *denoise* a noisy input to an image.
Patrick von Platen's avatar
Patrick von Platen committed
31
*Examples*: UNet, Conditioned UNet, 3D UNet, Transformer UNet
Patrick von Platen's avatar
Patrick von Platen committed
32

Nathan Lambert's avatar
Nathan Lambert committed
33
34
35
36
37
38
<p align="center">
    <img src="https://user-images.githubusercontent.com/10695622/174349667-04e9e485-793b-429a-affe-096e8199ad5b.png" width="800"/>
    <br>
    <em> Figure from DDPM paper (https://arxiv.org/abs/2006.11239). </em>
<p>
    
Patrick von Platen's avatar
Patrick von Platen committed
39
40
41
**Schedulers**: Algorithm class for both **inference** and **training**.
The class provides functionality to compute previous image according to alpha, beta schedule as well as predict noise for training.
*Examples*: [DDPM](https://arxiv.org/abs/2006.11239), [DDIM](https://arxiv.org/abs/2010.02502), [PNDM](https://arxiv.org/abs/2202.09778), [DEIS](https://arxiv.org/abs/2204.13902)
Patrick von Platen's avatar
Patrick von Platen committed
42

Nathan Lambert's avatar
Nathan Lambert committed
43
44
45
46
47
48
<p align="center">
    <img src="https://user-images.githubusercontent.com/10695622/174349706-53d58acc-a4d1-4cda-b3e8-432d9dc7ad38.png" width="800"/>
    <br>
    <em> Sampling and training algorithms. Figure from DDPM paper (https://arxiv.org/abs/2006.11239). </em>
<p>
    
Patrick von Platen's avatar
Patrick von Platen committed
49

Patrick von Platen's avatar
Patrick von Platen committed
50
**Diffusion Pipeline**: End-to-end pipeline that includes multiple diffusion models, possible text encoders, ...
Patrick von Platen's avatar
Patrick von Platen committed
51
*Examples*: Glide, Latent-Diffusion, Imagen, DALL-E 2
Patrick von Platen's avatar
Patrick von Platen committed
52

Nathan Lambert's avatar
Nathan Lambert committed
53
54
55
56
57
58
<p align="center">
    <img src="https://user-images.githubusercontent.com/10695622/174348898-481bd7c2-5457-4830-89bc-f0907756f64c.jpeg" width="550"/>
    <br>
    <em> Figure from ImageGen (https://imagen.research.google/). </em>
<p>
    
Patrick von Platen's avatar
Patrick von Platen committed
59
60
## Philosophy

milyiyo's avatar
milyiyo committed
61
- Readability and clarity is prefered over highly optimized code. A strong importance is put on providing readable, intuitive and elementary code design. *E.g.*, the provided [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers) are separated from the provided [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and provide well-commented code that can be read alongside the original paper.
Patrick von Platen's avatar
Patrick von Platen committed
62
63
64
- Diffusers is **modality independent** and focusses on providing pretrained models and tools to build systems that generate **continous outputs**, *e.g.* vision and audio.
- Diffusion models and schedulers are provided as consise, elementary building blocks whereas diffusion pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box, should stay as close as possible to their original implementation and can include components of other library, such as text-encoders. Examples for diffusion pipelines are [Glide](https://github.com/openai/glide-text2im) and [Latent Diffusion](https://github.com/CompVis/latent-diffusion).

Patrick von Platen's avatar
Patrick von Platen committed
65
66
## Quickstart

Lysandre Debut's avatar
Lysandre Debut committed
67
68
69
70
71
72
In order to get started, we recommend taking a look at two notebooks:

- The [Diffusers](https://colab.research.google.com/drive/1aEFVu0CvcIBzSNIQ7F71ujYYplAX4Bml?usp=sharing#scrollTo=PzW5ublpBuUt) notebook, which showcases an end-to-end example of usage for diffusion models, schedulers and pipelines.
  Take a look at this notebook to learn how to use the pipeline abstraction, which takes care of everything (model, scheduler, noise handling) for you, but also to get an understanding of each independent building blocks in the library.
- The [Training diffusers](https://colab.research.google.com/drive/1qqJmz7JJ04suJzEF4Hn4-Acb8rfL-eA3?usp=sharing) notebook, which summarizes diffuser model training methods. This notebook takes a step-by-step approach to training your
  diffuser model on an image dataset, with explanatory graphics.
Patrick von Platen's avatar
Patrick von Platen committed
73

Patrick von Platen's avatar
Patrick von Platen committed
74
75
### Installation

Patrick von Platen's avatar
Patrick von Platen committed
76
```
Patrick von Platen's avatar
Patrick von Platen committed
77
pip install diffusers  # should install diffusers 0.0.4
Patrick von Platen's avatar
Patrick von Platen committed
78
```
Patrick von Platen's avatar
Patrick von Platen committed
79

Kashif Rasul's avatar
Kashif Rasul committed
80
### 1. `diffusers` as a toolbox for schedulers and models
Patrick von Platen's avatar
Patrick von Platen committed
81

Patrick von Platen's avatar
Patrick von Platen committed
82
83
`diffusers` is more modularized than `transformers`. The idea is that researchers and engineers can use only parts of the library easily for the own use cases.
It could become a central place for all kinds of models, schedulers, training utils and processors that one can mix and match for one's own use case.
Patrick von Platen's avatar
Patrick von Platen committed
84
Both models and schedulers should be load- and saveable from the Hub.
Patrick von Platen's avatar
Patrick von Platen committed
85

Patrick von Platen's avatar
Patrick von Platen committed
86
87
For more examples see [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers) and [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models)

Patrick von Platen's avatar
Patrick von Platen committed
88
#### **Example for Unconditonal Image generation [DDPM](https://arxiv.org/abs/2006.11239):**
Patrick von Platen's avatar
Patrick von Platen committed
89
90
91

```python
import torch
Patrick von Platen's avatar
Patrick von Platen committed
92
from diffusers import UNet2DModel, DDIMScheduler
Patrick von Platen's avatar
Patrick von Platen committed
93
import PIL.Image
Patrick von Platen's avatar
Patrick von Platen committed
94
import numpy as np
Patrick von Platen's avatar
Patrick von Platen committed
95
import tqdm
Patrick von Platen's avatar
Patrick von Platen committed
96

Patrick von Platen's avatar
Patrick von Platen committed
97
torch_device = "cuda" if torch.cuda.is_available() else "cpu"
Patrick von Platen's avatar
Patrick von Platen committed
98
99

# 1. Load models
Patrick von Platen's avatar
Patrick von Platen committed
100
scheduler = DDIMScheduler.from_config("fusing/ddpm-celeba-hq", tensor_format="pt")
Patrick von Platen's avatar
Patrick von Platen committed
101
unet = UNet2DModel.from_pretrained("fusing/ddpm-celeba-hq", ddpm=True).to(torch_device)
Patrick von Platen's avatar
Patrick von Platen committed
102
103

# 2. Sample gaussian noise
Patrick von Platen's avatar
Patrick von Platen committed
104
105
generator = torch.manual_seed(23)
unet.image_size = unet.resolution
Patrick von Platen's avatar
Patrick von Platen committed
106
image = torch.randn(
Patrick von Platen's avatar
Patrick von Platen committed
107
   (1, unet.in_channels, unet.image_size, unet.image_size),
Suraj Patil's avatar
Suraj Patil committed
108
   generator=generator,
Patrick von Platen's avatar
Patrick von Platen committed
109
110
)
image = image.to(torch_device)
Patrick von Platen's avatar
Patrick von Platen committed
111

Patrick von Platen's avatar
Patrick von Platen committed
112
# 3. Denoise
Patrick von Platen's avatar
Patrick von Platen committed
113
114
num_inference_steps = 50
eta = 0.0  # <- deterministic sampling
Patrick von Platen's avatar
Patrick von Platen committed
115
scheduler.set_timesteps(num_inference_steps)
Patrick von Platen's avatar
Patrick von Platen committed
116

Patrick von Platen's avatar
Patrick von Platen committed
117
for t in tqdm.tqdm(scheduler.timesteps):
Kashif Rasul's avatar
Kashif Rasul committed
118
    # 1. predict noise residual
119
    with torch.no_grad():
Patrick von Platen's avatar
Patrick von Platen committed
120
        residual = unet(image, t)["sample"]
Kashif Rasul's avatar
Kashif Rasul committed
121

Patrick von Platen's avatar
Patrick von Platen committed
122
    prev_image = scheduler.step(residual, t, image, eta)["prev_sample"]
Kashif Rasul's avatar
Kashif Rasul committed
123

Patrick von Platen's avatar
Patrick von Platen committed
124
125
    # 3. set current image to prev_image: x_t -> x_t-1
    image = prev_image
Kashif Rasul's avatar
Kashif Rasul committed
126

Patrick von Platen's avatar
Patrick von Platen committed
127
# 4. process image to PIL
Patrick von Platen's avatar
Patrick von Platen committed
128
129
130
131
image_processed = image.cpu().permute(0, 2, 3, 1)
image_processed = (image_processed + 1.0) * 127.5
image_processed = image_processed.numpy().astype(np.uint8)
image_pil = PIL.Image.fromarray(image_processed[0])
Patrick von Platen's avatar
Patrick von Platen committed
132

Patrick von Platen's avatar
Patrick von Platen committed
133
134
135
136
137
138
139
# 5. save image
image_pil.save("generated_image.png")
``` 

#### **Example for Unconditonal Image generation [LDM](https://github.com/CompVis/latent-diffusion):**

```python
Patrick von Platen's avatar
Patrick von Platen committed
140
```
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169


## In the works

For the first release, 🤗 Diffusers focuses on text-to-image diffusion techniques. However, diffusers can be used for much more than that! Over the upcoming releases, we'll be focusing on:

- Diffusers for audio
- Diffusers for reinforcement learning (initial work happening in https://github.com/huggingface/diffusers/pull/105).
- Diffusers for video generation
- Diffusers for molecule generation (initial work happening in https://github.com/huggingface/diffusers/pull/54)

A few pipeline components are already being worked on, namely:

- BDDMPipeline for spectrogram-to-sound vocoding
- GLIDEPipeline to support OpenAI's GLIDE model
- Grad-TTS for text to audio generation / conditional audio generation

We want diffusers to be a toolbox useful for diffusers models in general; if you find yourself limited in any way by the current API, or would like to see additional models, schedulers, or techniques, please open a [GitHub issue](https://github.com/huggingface/diffusers/issues) mentioning what you would like to see.

## Credits

This library concretizes previous work by many different authors and would not have been possible without their great research and implementations. We'd like to thank, in particular, the following implementations which have helped us in our development and without which the API could not have been as polished today:

- @CompVis' latent diffusion models library, available [here](https://github.com/CompVis/latent-diffusion)
- @hojonathanho original DDPM implementation, available [here](https://github.com/hojonathanho/diffusion) as well as the extremely useful translation into PyTorch by @pesser, available [here](https://github.com/pesser/pytorch_diffusion)
- @ermongroup's DDIM implementation, available [here](https://github.com/ermongroup/ddim).
- @yang-song's Score-VE and Score-VP implementations, available [here](https://github.com/yang-song/score_sde_pytorch)

We also want to thank @heejkoo for the very helpful overview of papers, code and resources on diffusion models, available [here](https://github.com/heejkoo/Awesome-Diffusion-Models).