Commits · ce1c27adc85916a802db579df789d990fd14e8bc · renzhc / diffusers_dcu

19 Dec, 2022 1 commit
- [Revision] Don't recommend using revision (#1764) · ce1c27ad
  Patrick von Platen authored Dec 19, 2022
  
  ce1c27ad
16 Dec, 2022 1 commit

Docs: recommend xformers (#1724) · acd31781

Pedro Cuenca authored Dec 16, 2022

* Fix links to flash attention.

* Add xformers installation instructions.

* Make link to xformers install more prominent.

* Link to xformers install from training docs.

acd31781

01 Dec, 2022 1 commit

Add doc for Stable Diffusion on Habana Gaudi (#1496) · 2579d421

regisss authored Dec 01, 2022

* Add doc for Stable Diffusion on Habana Gaudi

* Make style

* Add benchmark

* Center-align columns in the benchmark table

2579d421

29 Nov, 2022 1 commit

StableDiffusion: Decode latents separately to run larger batches (#1150) · c28d3c82

Ilmari Heikkinen authored Nov 29, 2022



* StableDiffusion: Decode latents separately to run larger batches

* Move VAE sliced decode under enable_vae_sliced_decode and vae.enable_sliced_decode

* Rename sliced_decode to slicing

* fix whitespace

* fix quality check and repository consistency

* VAE slicing tests and documentation

* API doc hooks for VAE slicing

* reformat vae slicing tests

* Skip VAE slicing for one-image batches

* Documentation tweaks for VAE slicing
Co-authored-by: Ilmari Heikkinen <ilmari@fhtr.org>

c28d3c82

03 Nov, 2022 1 commit
- Docs: Do not require PyTorch nightlies (#1123) · 118c5be9
  Pedro Cuenca authored Nov 03, 2022
```
Do not require PyTorch nightlies.
```
  118c5be9
02 Nov, 2022 1 commit

Up to 2x speedup on GPUs using memory efficient attention (#532) · 98c42134

MatthieuTPHR authored Nov 02, 2022



* 2x speedup using memory efficient attention

* remove einops dependency

* Swap K, M in op instantiation

* Simplify code, remove unnecessary maybe_init call and function, remove unused self.scale parameter

* make xformers a soft dependency

* remove one-liner functions

* change one letter variable to appropriate names

* Remove Env variable dependency, remove MemoryEfficientCrossAttention class and use enable_xformers_memory_efficient_attention method

* Add memory efficient attention toggle to img2img and inpaint pipelines

* Clearer management of xformers' availability

* update optimizations markdown to add info about memory efficient attention

* add benchmarks for TITAN RTX

* More detailed explanation of how the mem eff benchmark were ran

* Removing autocast from optimization markdown

* import_utils: import torch only if is available
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>

98c42134

29 Oct, 2022 1 commit
- Fix speedup ratio in fp16.mdx (#837) · fc0ca474
  Minwoo Byeon authored Oct 29, 2022
  
  fc0ca474
27 Oct, 2022 1 commit

Document sequential CPU offload method on Stable Diffusion pipeline (#1024) · de00c632

Pi Esposito authored Oct 27, 2022



* document cpu offloading method

* address review comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

de00c632

25 Oct, 2022 1 commit

mps changes for PyTorch 1.13 (#926) · 3d02c921

Pedro Cuenca authored Oct 25, 2022



* Docs: refer to pre-RC version of PyTorch 1.13.0.

* Remove temporary workaround for unavailable op.

* Update comment to make it less ambiguous.

* Remove use of contiguous in mps.

It appears to not longer be necessary.

* Special case: use einsum for much better performance in mps

* Update mps docs.

* Minor doc update.

* Accept suggestion
Co-authored-by: Anton Lozhkov <anton@huggingface.co>
Co-authored-by: Anton Lozhkov <anton@huggingface.co>

3d02c921

24 Oct, 2022 1 commit

v1-5 docs updates (#921) · 8aac1f99

apolinario authored Oct 24, 2022



* Update README.md

Additionally add FLAX so the model card can be slimmer and point to this page

* Find and replace all

* v-1-5 -> v1-5

* revert test changes

* Update README.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update docs/source/quicktour.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update README.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/quicktour.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update README.md
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Revert certain references to v1-5

* Docs changes

* Apply suggestions from code review
Co-authored-by: apolinario <joaopaulo.passos+multimodal@gmail.com>
Co-authored-by: anton-l <anton@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

8aac1f99

11 Oct, 2022 1 commit

`mps`: Alternative implementation for `repeat_interleave` (#766) · 24b8b5cf

Pedro Cuenca authored Oct 11, 2022



* mps: alt. implementation for repeat_interleave

* style

* Bump mps version of PyTorch in the documentation.

* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Simplify: do not check for device.

* style

* Fix repeat dimensions:

- The unconditional embeddings are always created from a single prompt.
- I was shadowing the batch_size var.

* Split long lines as suggested by Suraj.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

24b8b5cf

05 Oct, 2022 2 commits
- [Docs] Advertise fp16 instead of autocast (#740) · 4deb16e8
  Patrick von Platen authored Oct 05, 2022
```
up
```
  4deb16e8
- No more use_auth_token=True (#733) · 78744b6a
  Patrick von Platen authored Oct 05, 2022
```
* up

* uP

* uP

* make style

* Apply suggestions from code review

* up

* finish
```
  78744b6a
04 Oct, 2022 1 commit

Fix typos (#718) · 7e92c5bc

Yuta Hayashibe authored Oct 04, 2022



* Fix typos

* Update examples/dreambooth/train_dreambooth.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

7e92c5bc

30 Sep, 2022 2 commits

[docs] fix table in fp16.mdx (#683) · daa22050
Nouamane Tazi authored Sep 30, 2022

daa22050

Optimize Stable Diffusion (#371) · 9ebaea54

Nouamane Tazi authored Sep 30, 2022

* initial commit

* make UNet stream capturable

* try to fix noise_pred value

* remove cuda graph and keep NB

* non blocking unet with PNDMScheduler

* make timesteps np arrays for pndm scheduler
because lists don't get formatted to tensors in `self.set_format`

* make max async in pndm

* use channel last format in unet

* avoid moving timesteps device in each unet call

* avoid memcpy op in `get_timestep_embedding`

* add `channels_last` kwarg to `DiffusionPipeline.from_pretrained`

* update TODO

* replace `channels_last` kwarg with `memory_format` for more generality

* revert the channels_last changes to leave it for another PR

* remove non_blocking when moving input ids to device

* remove blocking from all .to() operations at beginning of pipeline

* fix merging

* fix merging

* model can run in other precisions without autocast

* attn refactoring

* Revert "attn refactoring"

This reverts commit 0c70c0e189cd2c4d8768274c9fcf5b940ee310fb.

* remove restriction to run conv_norm in fp32

* use `baddbmm` instead of `matmul`for better in attention for better perf

* removing all reshapes to test perf

* Revert "removing all reshapes to test perf"

This reverts commit 006ccb8a8c6bc7eb7e512392e692a29d9b1553cd.

* add shapes comments

* hardcore whats needed for jitting

* Revert "hardcore whats needed for jitting"

This reverts commit 2fa9c698eae2890ac5f8e367ca80532ecf94df9a.

* Revert "remove restriction to run conv_norm in fp32"

This reverts commit cec592890c32da3d1b78d38b49e4307aedf459b9.

* revert using baddmm in attention's forward

* cleanup comment

* remove restriction to run conv_norm in fp32. no quality loss was noticed

This reverts commit cc9bc1339c998ebe9e7d733f910c6d72d9792213.

* add more optimizations techniques to docs

* Revert "add shapes comments"

This reverts commit 31c58eadb8892f95478cdf05229adf678678c5f4.

* apply suggestions

* make quality

* apply suggestions

* styling

* `scheduler.timesteps` are now arrays so we dont need .to()

* remove useless .type()

* use mean instead of max in `test_stable_diffusion_inpaint_pipeline_k_lms`

* move scheduler timestamps to correct device if tensors

* add device to `set_timesteps` in LMSD scheduler

* `self.scheduler.set_timesteps` now uses device arg for schedulers that accept it

* quick fix

* styling

* remove kwargs from schedulers `set_timesteps`

* revert to using max in K-LMS inpaint pipeline test

* Revert "`self.scheduler.set_timesteps` now uses device arg for schedulers that accept it"

This reverts commit 00d5a51e5c20d8d445c8664407ef29608106d899.

* move timesteps to correct device before loop in SD pipeline

* apply previous fix to other SD pipelines

* UNet now accepts tensor timesteps even on wrong device, to avoid errors
- it shouldnt affect performance if timesteps are alrdy on correct device
- it does slow down performance if they're on the wrong device

* fix pipeline when timesteps are arrays with strides

9ebaea54

08 Sep, 2022 3 commits

Initial ONNX doc (TODO: Installation) (#426) · 1a79969d
Pedro Cuenca authored Sep 08, 2022

1a79969d
[Docs] Minor fixes in optimization section (#420) · 98f34683
Patrick von Platen authored Sep 08, 2022
```
* uP

* more
```
98f34683

Docs: fp16 page (#404) · c29d81c3

Pedro Cuenca authored Sep 08, 2022



* Initial version of `fp16` page.

* Fix typo in README.

* Change titles of fp16 section in toctree.

* PR suggestion
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* PR suggestion
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Clarify attention slicing is useful even for batches of 1

Explained by @patrickvonplaten after a suggestion by @keturn.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Do not talk about `batches` in `enable_attention_slicing`.

* Use Tip (just for fun), add link to method.

* Comment about fp16 results looking the same as float32 in practice.

* Style: docstring line wrapping.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

c29d81c3

07 Sep, 2022 2 commits
- Docs: optimization / special hardware (#390) · 492f5c9a
  Pedro Cuenca authored Sep 07, 2022
```
Add mps documentation.
```
  492f5c9a
- [Docs] Let's go (#385) · 5a38033d
  Patrick von Platen authored Sep 07, 2022
  
  5a38033d