• MatthieuTPHR's avatar
    Up to 2x speedup on GPUs using memory efficient attention (#532) · 98c42134
    MatthieuTPHR authored
    
    
    * 2x speedup using memory efficient attention
    
    * remove einops dependency
    
    * Swap K, M in op instantiation
    
    * Simplify code, remove unnecessary maybe_init call and function, remove unused self.scale parameter
    
    * make xformers a soft dependency
    
    * remove one-liner functions
    
    * change one letter variable to appropriate names
    
    * Remove Env variable dependency, remove MemoryEfficientCrossAttention class and use enable_xformers_memory_efficient_attention method
    
    * Add memory efficient attention toggle to img2img and inpaint pipelines
    
    * Clearer management of xformers' availability
    
    * update optimizations markdown to add info about memory efficient attention
    
    * add benchmarks for TITAN RTX
    
    * More detailed explanation of how the mem eff benchmark were ran
    
    * Removing autocast from optimization markdown
    
    * import_utils: import torch only if is available
    Co-authored-by: default avatarNouamane Tazi <nouamane98@gmail.com>
    98c42134
attention.py 18.5 KB