• Nouamane Tazi's avatar
    Optimize Stable Diffusion (#371) · 9ebaea54
    Nouamane Tazi authored
    * initial commit
    
    * make UNet stream capturable
    
    * try to fix noise_pred value
    
    * remove cuda graph and keep NB
    
    * non blocking unet with PNDMScheduler
    
    * make timesteps np arrays for pndm scheduler
    because lists don't get formatted to tensors in `self.set_format`
    
    * make max async in pndm
    
    * use channel last format in unet
    
    * avoid moving timesteps device in each unet call
    
    * avoid memcpy op in `get_timestep_embedding`
    
    * add `channels_last` kwarg to `DiffusionPipeline.from_pretrained`
    
    * update TODO
    
    * replace `channels_last` kwarg with `memory_format` for more generality
    
    * revert the channels_last changes to leave it for another PR
    
    * remove non_blocking when moving input ids to device
    
    * remove blocking from all .to() operations at beginning of pipeline
    
    * fix merging
    
    * fix merging
    
    * model can run in other precisions without autocast
    
    * attn refactoring
    
    * Revert "attn refactoring"
    
    This reverts commit 0c70c0e189cd2c4d8768274c9fcf5b940ee310fb.
    
    * remove restriction to run conv_norm in fp32
    
    * use `baddbmm` instead of `matmul`for better in attention for better perf
    
    * removing all reshapes to test perf
    
    * Revert "removing all reshapes to test perf"
    
    This reverts commit 006ccb8a8c6bc7eb7e512392e692a29d9b1553cd.
    
    * add shapes comments
    
    * hardcore whats needed for jitting
    
    * Revert "hardcore whats needed for jitting"
    
    This reverts commit 2fa9c698eae2890ac5f8e367ca80532ecf94df9a.
    
    * Revert "remove restriction to run conv_norm in fp32"
    
    This reverts commit cec592890c32da3d1b78d38b49e4307aedf459b9.
    
    * revert using baddmm in attention's forward
    
    * cleanup comment
    
    * remove restriction to run conv_norm in fp32. no quality loss was noticed
    
    This reverts commit cc9bc1339c998ebe9e7d733f910c6d72d9792213.
    
    * add more optimizations techniques to docs
    
    * Revert "add shapes comments"
    
    This reverts commit 31c58eadb8892f95478cdf05229adf678678c5f4.
    
    * apply suggestions
    
    * make quality
    
    * apply suggestions
    
    * styling
    
    * `scheduler.timesteps` are now arrays so we dont need .to()
    
    * remove useless .type()
    
    * use mean instead of max in `test_stable_diffusion_inpaint_pipeline_k_lms`
    
    * move scheduler timestamps to correct device if tensors
    
    * add device to `set_timesteps` in LMSD scheduler
    
    * `self.scheduler.set_timesteps` now uses device arg for schedulers that accept it
    
    * quick fix
    
    * styling
    
    * remove kwargs from schedulers `set_timesteps`
    
    * revert to using max in K-LMS inpaint pipeline test
    
    * Revert "`self.scheduler.set_timesteps` now uses device arg for schedulers that accept it"
    
    This reverts commit 00d5a51e5c20d8d445c8664407ef29608106d899.
    
    * move timesteps to correct device before loop in SD pipeline
    
    * apply previous fix to other SD pipelines
    
    * UNet now accepts tensor timesteps even on wrong device, to avoid errors
    - it shouldnt affect performance if timesteps are alrdy on correct device
    - it does slow down performance if they're on the wrong device
    
    * fix pipeline when timesteps are arrays with strides
    9ebaea54
resnet.py 18.3 KB