Commits · 7482178162b779506a54538f2cf2565c8b88c597 · renzhc / diffusers_dcu

03 Nov, 2022 5 commits

(#1115) · 74821781

Suraj Patil authored Nov 03, 2022



* make accelerate hard dep

* default fast init

* move params to cpu when device map is None

* handle device_map=None

* handle torch < 1.9

* remove device_map="auto"

* style

* add accelerate in torch extra

* remove accelerate from extras["test"]

* raise an error if torch is available but not accelerate

* update installation docs

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* improve defautl loading speed even further, allow disabling fats loading

* address review comments

* adapt the tests

* fix test_stable_diffusion_fast_load

* fix test_read_init

* temp fix for dummy checks

* Trigger Build

* Apply suggestions from code review
Co-authored-by: Anton Lozhkov <anton@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Anton Lozhkov <anton@huggingface.co>

74821781

VQ-diffusion (#658) · ef2ea33c

Will Berman authored Nov 03, 2022



* Changes for VQ-diffusion VQVAE

Add specify dimension of embeddings to VQModel:
`VQModel` will by default set the dimension of embeddings to the number
of latent channels. The VQ-diffusion VQVAE has a smaller
embedding dimension, 128, than number of latent channels, 256.

Add AttnDownEncoderBlock2D and AttnUpDecoderBlock2D to the up and down
unet block helpers. VQ-diffusion's VQVAE uses those two block types.

* Changes for VQ-diffusion transformer

Modify attention.py so SpatialTransformer can be used for
VQ-diffusion's transformer.

SpatialTransformer:
- Can now operate over discrete inputs (classes of vector embeddings) as well as continuous.
- `in_channels` was made optional in the constructor so two locations where it was passed as a positional arg were moved to kwargs
- modified forward pass to take optional timestep embeddings

ImagePositionalEmbeddings:
- added to provide positional embeddings to discrete inputs for latent pixels

BasicTransformerBlock:
- norm layers were made configurable so that the VQ-diffusion could use AdaLayerNorm with timestep embeddings
- modified forward pass to take optional timestep embeddings

CrossAttention:
- now may optionally take a bias parameter for its query, key, and value linear layers

FeedForward:
- Internal layers are now configurable

ApproximateGELU:
- Activation function in VQ-diffusion's feedforward layer

AdaLayerNorm:
- Norm layer modified to incorporate timestep embeddings

* Add VQ-diffusion scheduler

* Add VQ-diffusion pipeline

* Add VQ-diffusion convert script to diffusers

* Add VQ-diffusion dummy objects

* Add VQ-diffusion markdown docs

* Add VQ-diffusion tests

* some renaming

* some fixes

* more renaming

* correct

* fix typo

* correct weights

* finalize

* fix tests

* Apply suggestions from code review
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* finish

* finish

* up
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

ef2ea33c

Continuation of #1035 (#1120) · 269109db

Pedro Cuenca authored Nov 03, 2022



* remove batch size from repeat

* repeat empty string if uncond_tokens is none

* fix inpaint pipes

* return back whitespace to pass code quality

* Apply suggestions from code review

* Fix typos.
Co-authored-by: Had <had-95@yandex.ru>

269109db

feat: add repaint (#974) · d38c8043

Revist authored Nov 03, 2022



* feat: add repaint

* fix: fix quality check with `make fix-copies`

* fix: remove old unnecessary arg

* chore: change default to DDPM (looks better in experiments)

* ".to(device)" changed to "device="
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* make generator device-specific
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* make generator device-specific and change shape
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* fix: add preprocessing for image and mask
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* fix: update test
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* Update src/diffusers/pipelines/repaint/pipeline_repaint.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Add docs and examples

* Fix toctree
Co-authored-by: fja <fja@zurich.ibm.com>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Anton Lozhkov <anton@huggingface.co>

d38c8043

Allow saving `None` pipeline components (#1118) · 4a38166a
Anton Lozhkov authored Nov 03, 2022
```
* Allow saving `None` pipeline components

* support flax as well

* style
```
4a38166a

02 Nov, 2022 9 commits

[Loading] Ignore unneeded files (#1107) · c39a511b
Patrick von Platen authored Nov 02, 2022
```
* [Loading] Ignore unneeded files

* up
```
c39a511b

Training to predict x0 in training example (#1031) · cbcd0512

Denis authored Nov 02, 2022



* changed training example to add option to train model that predicts x0 (instead of eps), changed DDPM pipeline accordingly

* Revert "changed training example to add option to train model that predicts x0 (instead of eps), changed DDPM pipeline accordingly"

This reverts commit c5efb525648885f2e7df71f4483a9f248515ad61.

* changed training example to add option to train model that predicts x0 (instead of eps), changed DDPM pipeline accordingly

* fixed code style
Co-authored-by: lukovnikov <lukovnikov@users.noreply.github.com>

cbcd0512

[Flax] time embedding (#1081) · 0b61cea3

Kashif Rasul authored Nov 02, 2022

* initial get_sinusoidal_embeddings

* added asserts

* better var name

* fix docs

0b61cea3

Fix tests for equivalence of DDIM and DDPM pipelines (#1069) · 5cd29d62

Grigory Sizov authored Nov 02, 2022

* Fix equality test for ddim and ddpm

* add docs for use_clipped_model_output in DDIM

* fix inline comment

* reorder imports in test_pipelines.py

* Ignore use_clipped_model_output if scheduler doesn't take it

5cd29d62

Fix a small typo of a variable name (#1063) · 1216a3b1
Omiita authored Nov 02, 2022
```
Fix a small typo

fix a typo in `models/attention.py`.
weight -> width
```
1216a3b1

[CI] Framework and hardware-specific CI tests (#997) · 4e59bcc6

Anton Lozhkov authored Nov 02, 2022

* [WIP][CI] Framework and hardware-specific docker images for CI tests

* username

* fix cpu

* try out the image

* push latest

* update workspace

* no root isolation for actions

* add a flax image

* flax and onnx matrix

* fix runners

* add reports

* onnxruntime image

* retry tpu

* fix

* fix

* build onnxruntime

* naming

* onnxruntime-gpu image

* onnxruntime-gpu image, slow tests

* latest jax version

* trigger flax

* run flax tests in one thread

* fast flax tests on cpu

* fast flax tests on cpu

* trigger slow tests

* rebuild torch cuda

* force cuda provider

* fix onnxruntime tests

* trigger slow

* don't specify gpu for tpu

* optimize

* memory limit

* fix flax tests

* disable docker cache

4e59bcc6

Rename latent (#1102) · d53ffbbd
Patrick von Platen authored Nov 02, 2022
```
* Rename latent

* uP
```
d53ffbbd

Integration tests precision improvement for inpainting (#1052) · 8ee21915

Lewington-pitsos authored Nov 02, 2022



* improve test precision

get tests passing with greater precision using lewington images

* make old numpy load function a wrapper around a more flexible numpy loading function

* adhere to black formatting

* add more black formatting

* adhere to isort

* loosen precision and replace path
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

8ee21915

Up to 2x speedup on GPUs using memory efficient attention (#532) · 98c42134

MatthieuTPHR authored Nov 02, 2022



* 2x speedup using memory efficient attention

* remove einops dependency

* Swap K, M in op instantiation

* Simplify code, remove unnecessary maybe_init call and function, remove unused self.scale parameter

* make xformers a soft dependency

* remove one-liner functions

* change one letter variable to appropriate names

* Remove Env variable dependency, remove MemoryEfficientCrossAttention class and use enable_xformers_memory_efficient_attention method

* Add memory efficient attention toggle to img2img and inpaint pipelines

* Clearer management of xformers' availability

* update optimizations markdown to add info about memory efficient attention

* add benchmarks for TITAN RTX

* More detailed explanation of how the mem eff benchmark were ran

* Removing autocast from optimization markdown

* import_utils: import torch only if is available
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>

98c42134

31 Oct, 2022 7 commits

Remove some unused parameter in CrossAttnUpBlock2D (#1034) · 7fb4b882

Laurent Mazare authored Oct 31, 2022

Remove some unused parameter

The `downsample_padding` parameter does not seem to be used in `CrossAttnUpBlock2D` (or by any up block for that matter) so removing it.

7fb4b882

Remove nn sequential (#1086) · 888468dd
Patrick von Platen authored Oct 31, 2022
```
* Remove nn sequential

* up
```
888468dd
[Tests] Fix slow tests (#1087) · 17c2c060
Patrick von Platen authored Oct 31, 2022

17c2c060

[Better scheduler docs] Improve usage examples of schedulers (#890) · c18941b0

Patrick von Platen authored Oct 31, 2022



* [Better scheduler docs] Improve usage examples of schedulers

* finish

* fix warnings and add test

* finish

* more replacements

* adapt fast tests hf token

* correct more

* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Integrate compatibility with euler
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

c18941b0

k-diffusion-euler (#1019) · a1ea8c01

hlky authored Oct 31, 2022



* k-diffusion-euler

* make style make quality

* make fix-copies

* fix tests for euler a

* Update src/diffusers/schedulers/scheduling_euler_ancestral_discrete.py
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* Update src/diffusers/schedulers/scheduling_euler_ancestral_discrete.py
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* Update src/diffusers/schedulers/scheduling_euler_discrete.py
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* Update src/diffusers/schedulers/scheduling_euler_discrete.py
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* remove unused arg and method

* update doc

* quality

* make flake happy

* use logger instead of warn

* raise error instead of deprication

* don't require scipy

* pass generator in step

* fix tests

* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/test_scheduler.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* remove unused generator

* pass generator as extra_step_kwargs

* update tests

* pass generator as kwarg

* pass generator as kwarg

* quality

* fix test for lms

* fix tests
Co-authored-by: patil-suraj <surajp815@gmail.com>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

a1ea8c01

Allow `safety_checker` to be `None` when using CPU offload (#1078) · bf7b0bc2
Pedro Cuenca authored Oct 31, 2022
```
Allow None safety_checker when using CPU offload.
```
bf7b0bc2
Fix pipelines user_agent, ignore CI requests (#1058) · 1606eb99
Anton Lozhkov authored Oct 31, 2022
```
* Fix pipelines user_agent, ignore CI requests

* fix circular import

* N/A versions

* N/A versions
```
1606eb99

30 Oct, 2022 1 commit

Move safety detection to model call in Flax safety checker (#1023) · 8e4fd686

Jonatan Kłosko authored Oct 30, 2022

* Move safety detection to model call in Flax safety checker

* Update src/diffusers/pipelines/stable_diffusion/safety_checker_flax.py

8e4fd686

29 Oct, 2022 2 commits

Experimental: allow fp16 in `mps` (#961) · 95414bd6

Pedro Cuenca authored Oct 29, 2022

* Docs: refer to pre-RC version of PyTorch 1.13.0.

* Remove temporary workaround for unavailable op.

* Update comment to make it less ambiguous.

* Remove use of contiguous in mps.

It appears to not longer be necessary.

* Special case: use einsum for much better performance in mps

* Update mps docs.

* MPS: make pipeline work in half precision.

95414bd6

clean incomplete pages (#1008) · 12fd0736
Nathan Lambert authored Oct 29, 2022

12fd0736

28 Oct, 2022 5 commits
- [Tests] no random latents anymore (#1045) · d37f08da
  Patrick von Platen authored Oct 28, 2022
  
  d37f08da
- [Tests] Better prints (#1043) · c4ef1efe
  Patrick von Platen authored Oct 28, 2022
  
  c4ef1efe
- Fix some failing tests (#1041) · 8d6487f3
  Patrick von Platen authored Oct 28, 2022
```
* up

* up

* up

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py

* Apply suggestions from code review
```
  8d6487f3
- [Tests] Improve unet / vae tests (#1018) · a80480f0
  Patrick von Platen authored Oct 28, 2022
```
* improve tests

* up

* finish

* upload

* add init

* up

* finish vae

* finish

* reduce loading time with device_map

* remove device_map from CPU

* uP
```
  a80480f0
- fix `F.interpolate()` for large batch sizes (#1006) · ab079f27
  Nouamane Tazi authored Oct 28, 2022
```
* fix `upsample_nearest_nhwc` for large bsz

* fix `upsample_nearest_nhwc` for large bsz
```
  ab079f27
27 Oct, 2022 5 commits

Support grayscale images in `numpy_to_pil` (#1025) · fb38bb16
Anton Lozhkov authored Oct 27, 2022

fb38bb16

Document sequential CPU offload method on Stable Diffusion pipeline (#1024) · de00c632

Pi Esposito authored Oct 27, 2022



* document cpu offloading method

* address review comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

de00c632

Deprecate `init_git_repo`, refactor `train_unconditional.py` (#1022) · fbcc3833
Anton Lozhkov authored Oct 27, 2022
```
Deprecate `init_git_repo` and `push_to_hub`, refactor `train_unconditional.py`
```
fbcc3833
[Accelerate model loading] Fix meta device and super low memory usage (#1016) · 3be9fa97
Patrick von Platen authored Oct 27, 2022
```
* [Accelerate model loading] Fix meta device and super low memory usage

* better naming
```
3be9fa97

Continuation of #942: additional float64 failure (#996) · 1d04e1b4

Pedro Cuenca authored Oct 27, 2022

* Add failing test for #940.

* Do not use torch.float64 in mps.

* style

* Temporarily skip add_noise for IPNDMScheduler.

Until #990 is addressed.

* Fix additional float64 error in mps.

* Improve add_noise test

* Slight edit – I think it's clearer this way.

1d04e1b4

26 Oct, 2022 4 commits

[inpaint pipeline] fix bug for multiple prompts inputs (#959) · bd06dd02
Hu Ye authored Oct 26, 2022

bd06dd02

minimal stable diffusion GPU memory usage with accelerate hooks (#850) · b2e2d141

Pi Esposito authored Oct 26, 2022

* add method to enable cuda with minimal gpu usage to stable diffusion

* add test to minimal cuda memory usage

* ensure all models but unet are onn torch.float32

* move to cpu_offload along with minor internal changes to make it work

* make it test against accelerate master branch

* coming back, its official: I don't know how to make it test againt the master branch from accelerate

* make it install accelerate from master on tests

* go back to accelerate>=0.11

* undo prettier formatting on yml files

* undo prettier formatting on yml files againn

b2e2d141

Fix typos (#978) · cc436087
Yuta Hayashibe authored Oct 26, 2022

cc436087

Do not use torch.float64 on the mps device (#942) · 0343d8f5

Pedro Cuenca authored Oct 26, 2022

* Add failing test for #940.

* Do not use torch.float64 in mps.

* style

* Temporarily skip add_noise for IPNDMScheduler.

Until #990 is addressed.

0343d8f5

25 Oct, 2022 2 commits
- [Dance Diffusion] Better naming (#981) · 59f0ce82
  Patrick von Platen authored Oct 25, 2022
```
uP
```
  59f0ce82
- [Dance Diffusion] FP16 (#980) · 365ff8f7
  Patrick von Platen authored Oct 25, 2022
```
* add in fp16

* up
```
  365ff8f7