reusing_seeds.md 4.8 KB
Newer Older
Aryan's avatar
Aryan committed
1
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2
3
4
5
6
7
8
9
10
11
12

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

Steven Liu's avatar
Steven Liu committed
13
# Reproducibility
14

Steven Liu's avatar
Steven Liu committed
15
Diffusion is a random process that generates a different output every time. For certain situations like testing and replicating results, you want to generate the same result each time, across releases and platforms within a certain tolerance range.
16

Steven Liu's avatar
Steven Liu committed
17
This guide will show you how to control sources of randomness and enable deterministic algorithms.
18

Steven Liu's avatar
Steven Liu committed
19
## Generator
20

Steven Liu's avatar
Steven Liu committed
21
Pipelines rely on [torch.randn](https://pytorch.org/docs/stable/generated/torch.randn.html), which uses a different random seed each time, to create the initial noisy tensors. To generate the same output on a CPU or GPU, use a [Generator](https://docs.pytorch.org/docs/stable/generated/torch.Generator.html) to manage how random values are generated.
22
23

> [!TIP]
Steven Liu's avatar
Steven Liu committed
24
> If reproducibility is important to your use case, we recommend always using a CPU `Generator`. The performance loss is often negligible and you'll generate more similar values.
25

Steven Liu's avatar
Steven Liu committed
26
27
28
29
<hfoptions id="generator">
<hfoption id="GPU">

The GPU uses a different random number generator than the CPU. Diffusers solves this issue with the [`~utils.torch_utils.randn_tensor`] function to create the random tensor on a CPU and then moving it to the GPU. This function is used everywhere inside the pipeline and you don't need to explicitly call it.
30

Steven Liu's avatar
Steven Liu committed
31
Use [manual_seed](https://docs.pytorch.org/docs/stable/generated/torch.manual_seed.html) as shown below to set a seed.
32

Steven Liu's avatar
Steven Liu committed
33
```py
34
import torch
35
36
import numpy as np
from diffusers import DDIMPipeline
37

Steven Liu's avatar
Steven Liu committed
38
39
ddim = DDIMPipeline.from_pretrained("google/ddpm-cifar10-32", device_map="cuda")
generator = torch.manual_seed(0)
40
41
image = ddim(num_inference_steps=2, output_type="np", generator=generator).images
print(np.abs(image).sum())
42
43
```

44
</hfoption>
Steven Liu's avatar
Steven Liu committed
45
<hfoption id="CPU">
46

Steven Liu's avatar
Steven Liu committed
47
Set `device="cpu"` in the `Generator` and use [manual_seed](https://docs.pytorch.org/docs/stable/generated/torch.manual_seed.html) to set a seed for generating random numbers.
48

Steven Liu's avatar
Steven Liu committed
49
```py
50
51
52
53
import torch
import numpy as np
from diffusers import DDIMPipeline

Steven Liu's avatar
Steven Liu committed
54
55
ddim = DDIMPipeline.from_pretrained("google/ddpm-cifar10-32")
generator = torch.Generator(device="cpu").manual_seed(0)
56
57
58
59
image = ddim(num_inference_steps=2, output_type="np", generator=generator).images
print(np.abs(image).sum())
```

Steven Liu's avatar
Steven Liu committed
60
61
</hfoption>
</hfoptions>
62

Steven Liu's avatar
Steven Liu committed
63
The `Generator` object should be passed to the pipeline instead of an integer seed. `Generator` maintains a *random state* that is consumed and modified when used. Once consumed, the same `Generator` object produces different results in subsequent calls, even across different pipelines, because it's *state* has changed.
64

Steven Liu's avatar
Steven Liu committed
65
```py
66
67
generator = torch.manual_seed(0)

Steven Liu's avatar
Steven Liu committed
68
69
70
71
for _ in range(5):
-    image = pipeline(prompt, generator=generator)
+    image = pipeline(prompt, generator=torch.manual_seed(0))
```
72
73
74

## Deterministic algorithms

Steven Liu's avatar
Steven Liu committed
75
PyTorch supports [deterministic algorithms](https://docs.pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms) - where available - for certain operations so they produce the same results. Deterministic algorithms may be slower and decrease performance.
Steven Liu's avatar
Steven Liu committed
76

Steven Liu's avatar
Steven Liu committed
77
Use Diffusers' [enable_full_determinism](https://github.com/huggingface/diffusers/blob/142f353e1c638ff1d20bd798402b68f72c1ebbdd/src/diffusers/utils/testing_utils.py#L861) function to enable deterministic algorithms.
Steven Liu's avatar
Steven Liu committed
78
79

```py
Steven Liu's avatar
Steven Liu committed
80
81
82
import torch
from diffusers_utils import enable_full_determinism

83
84
85
enable_full_determinism()
```

Steven Liu's avatar
Steven Liu committed
86
Under the hood, `enable_full_determinism` works by:
87

Steven Liu's avatar
Steven Liu committed
88
89
90
- Setting the environment variable [CUBLAS_WORKSPACE_CONFIG](https://docs.nvidia.com/cuda/cublas/index.html#results-reproducibility) to `:16:8` to only use one buffer size during rntime. Non-deterministic behavior occurs when operations are used in more than one CUDA stream.
- Disabling benchmarking to find the fastest convolution operation by setting `torch.backends.cudnn.benchmark=False`. Non-deterministic behavior occurs because the benchmark may select different algorithms each time depending on hardware or benchmarking noise.
- Disabling TensorFloat32 (TF32) operations in favor of more precise and consistent full-precision operations.
91

Steven Liu's avatar
Steven Liu committed
92

Steven Liu's avatar
Steven Liu committed
93
## Resources
94

Steven Liu's avatar
Steven Liu committed
95
We strongly recommend reading PyTorch's developer notes about [Reproducibility](https://docs.pytorch.org/docs/stable/notes/randomness.html). You can try to limit randomness, but it is not *guaranteed* even with an identical seed.