Up to 2x speedup on GPUs using memory efficient attention (#532)
* 2x speedup using memory efficient attention
* remove einops dependency
* Swap K, M in op instantiation
* Simplify code, remove unnecessary maybe_init call and function, remove unused self.scale parameter
* make xformers a soft dependency
* remove one-liner functions
* change one letter variable to appropriate names
* Remove Env variable dependency, remove MemoryEfficientCrossAttention class and use enable_xformers_memory_efficient_attention method
* Add memory efficient attention toggle to img2img and inpaint pipelines
* Clearer management of xformers' availability
* update optimizations markdown to add info about memory efficient attention
* add benchmarks for TITAN RTX
* More detailed explanation of how the mem eff benchmark were ran
* Removing autocast from optimization markdown
* import_utils: import torch only if is available
Co-authored-by:
Nouamane Tazi <nouamane98@gmail.com>
Showing
Please register or sign in to comment