- 19 Dec, 2022 1 commit
-
-
Patrick von Platen authored
-
- 16 Dec, 2022 1 commit
-
-
Pedro Cuenca authored
* Fix links to flash attention. * Add xformers installation instructions. * Make link to xformers install more prominent. * Link to xformers install from training docs.
-
- 01 Dec, 2022 1 commit
-
-
regisss authored
* Add doc for Stable Diffusion on Habana Gaudi * Make style * Add benchmark * Center-align columns in the benchmark table
-
- 29 Nov, 2022 1 commit
-
-
Ilmari Heikkinen authored
* StableDiffusion: Decode latents separately to run larger batches * Move VAE sliced decode under enable_vae_sliced_decode and vae.enable_sliced_decode * Rename sliced_decode to slicing * fix whitespace * fix quality check and repository consistency * VAE slicing tests and documentation * API doc hooks for VAE slicing * reformat vae slicing tests * Skip VAE slicing for one-image batches * Documentation tweaks for VAE slicing Co-authored-by:Ilmari Heikkinen <ilmari@fhtr.org>
-
- 03 Nov, 2022 1 commit
-
-
Pedro Cuenca authored
Do not require PyTorch nightlies.
-
- 02 Nov, 2022 1 commit
-
-
MatthieuTPHR authored
* 2x speedup using memory efficient attention * remove einops dependency * Swap K, M in op instantiation * Simplify code, remove unnecessary maybe_init call and function, remove unused self.scale parameter * make xformers a soft dependency * remove one-liner functions * change one letter variable to appropriate names * Remove Env variable dependency, remove MemoryEfficientCrossAttention class and use enable_xformers_memory_efficient_attention method * Add memory efficient attention toggle to img2img and inpaint pipelines * Clearer management of xformers' availability * update optimizations markdown to add info about memory efficient attention * add benchmarks for TITAN RTX * More detailed explanation of how the mem eff benchmark were ran * Removing autocast from optimization markdown * import_utils: import torch only if is available Co-authored-by:Nouamane Tazi <nouamane98@gmail.com>
-
- 29 Oct, 2022 1 commit
-
-
Minwoo Byeon authored
-
- 27 Oct, 2022 1 commit
-
-
Pi Esposito authored
* document cpu offloading method * address review comments Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com>
-
- 25 Oct, 2022 1 commit
-
-
Pedro Cuenca authored
* Docs: refer to pre-RC version of PyTorch 1.13.0. * Remove temporary workaround for unavailable op. * Update comment to make it less ambiguous. * Remove use of contiguous in mps. It appears to not longer be necessary. * Special case: use einsum for much better performance in mps * Update mps docs. * Minor doc update. * Accept suggestion Co-authored-by:
Anton Lozhkov <anton@huggingface.co> Co-authored-by:
Anton Lozhkov <anton@huggingface.co>
-
- 24 Oct, 2022 1 commit
-
-
apolinario authored
* Update README.md Additionally add FLAX so the model card can be slimmer and point to this page * Find and replace all * v-1-5 -> v1-5 * revert test changes * Update README.md Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> * Update docs/source/quicktour.mdx Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update README.md Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update docs/source/quicktour.mdx Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update README.md Co-authored-by:
Suraj Patil <surajp815@gmail.com> * Revert certain references to v1-5 * Docs changes * Apply suggestions from code review Co-authored-by:
apolinario <joaopaulo.passos+multimodal@gmail.com> Co-authored-by:
anton-l <anton@huggingface.co> Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> Co-authored-by:
Suraj Patil <surajp815@gmail.com>
-
- 11 Oct, 2022 1 commit
-
-
Pedro Cuenca authored
* mps: alt. implementation for repeat_interleave * style * Bump mps version of PyTorch in the documentation. * Apply suggestions from code review Co-authored-by:
Suraj Patil <surajp815@gmail.com> * Simplify: do not check for device. * style * Fix repeat dimensions: - The unconditional embeddings are always created from a single prompt. - I was shadowing the batch_size var. * Split long lines as suggested by Suraj. Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by:
Suraj Patil <surajp815@gmail.com>
-
- 05 Oct, 2022 2 commits
-
-
Patrick von Platen authored
up
-
Patrick von Platen authored
* up * uP * uP * make style * Apply suggestions from code review * up * finish
-
- 04 Oct, 2022 1 commit
-
-
Yuta Hayashibe authored
* Fix typos * Update examples/dreambooth/train_dreambooth.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> Co-authored-by:
Pedro Cuenca <pedro@huggingface.co>
-
- 30 Sep, 2022 2 commits
-
-
Nouamane Tazi authored
-
Nouamane Tazi authored
* initial commit * make UNet stream capturable * try to fix noise_pred value * remove cuda graph and keep NB * non blocking unet with PNDMScheduler * make timesteps np arrays for pndm scheduler because lists don't get formatted to tensors in `self.set_format` * make max async in pndm * use channel last format in unet * avoid moving timesteps device in each unet call * avoid memcpy op in `get_timestep_embedding` * add `channels_last` kwarg to `DiffusionPipeline.from_pretrained` * update TODO * replace `channels_last` kwarg with `memory_format` for more generality * revert the channels_last changes to leave it for another PR * remove non_blocking when moving input ids to device * remove blocking from all .to() operations at beginning of pipeline * fix merging * fix merging * model can run in other precisions without autocast * attn refactoring * Revert "attn refactoring" This reverts commit 0c70c0e189cd2c4d8768274c9fcf5b940ee310fb. * remove restriction to run conv_norm in fp32 * use `baddbmm` instead of `matmul`for better in attention for better perf * removing all reshapes to test perf * Revert "removing all reshapes to test perf" This reverts commit 006ccb8a8c6bc7eb7e512392e692a29d9b1553cd. * add shapes comments * hardcore whats needed for jitting * Revert "hardcore whats needed for jitting" This reverts commit 2fa9c698eae2890ac5f8e367ca80532ecf94df9a. * Revert "remove restriction to run conv_norm in fp32" This reverts commit cec592890c32da3d1b78d38b49e4307aedf459b9. * revert using baddmm in attention's forward * cleanup comment * remove restriction to run conv_norm in fp32. no quality loss was noticed This reverts commit cc9bc1339c998ebe9e7d733f910c6d72d9792213. * add more optimizations techniques to docs * Revert "add shapes comments" This reverts commit 31c58eadb8892f95478cdf05229adf678678c5f4. * apply suggestions * make quality * apply suggestions * styling * `scheduler.timesteps` are now arrays so we dont need .to() * remove useless .type() * use mean instead of max in `test_stable_diffusion_inpaint_pipeline_k_lms` * move scheduler timestamps to correct device if tensors * add device to `set_timesteps` in LMSD scheduler * `self.scheduler.set_timesteps` now uses device arg for schedulers that accept it * quick fix * styling * remove kwargs from schedulers `set_timesteps` * revert to using max in K-LMS inpaint pipeline test * Revert "`self.scheduler.set_timesteps` now uses device arg for schedulers that accept it" This reverts commit 00d5a51e5c20d8d445c8664407ef29608106d899. * move timesteps to correct device before loop in SD pipeline * apply previous fix to other SD pipelines * UNet now accepts tensor timesteps even on wrong device, to avoid errors - it shouldnt affect performance if timesteps are alrdy on correct device - it does slow down performance if they're on the wrong device * fix pipeline when timesteps are arrays with strides
-
- 08 Sep, 2022 3 commits
-
-
Pedro Cuenca authored
-
Patrick von Platen authored
* uP * more
-
Pedro Cuenca authored
* Initial version of `fp16` page. * Fix typo in README. * Change titles of fp16 section in toctree. * PR suggestion Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> * PR suggestion Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> * Clarify attention slicing is useful even for batches of 1 Explained by @patrickvonplaten after a suggestion by @keturn. Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> * Do not talk about `batches` in `enable_attention_slicing`. * Use Tip (just for fun), add link to method. * Comment about fp16 results looking the same as float32 in practice. * Style: docstring line wrapping. Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com>
-
- 07 Sep, 2022 2 commits
-
-
Pedro Cuenca authored
Add mps documentation.
-
Patrick von Platen authored
-