[feat] implement `record_stream` when using CUDA streams during group offloading (#11081)
* implement record_stream for better performance. * fix * style. * merge #11097 * Update src/diffusers/hooks/group_offloading.py Co-authored-by:Aryan <aryan@huggingface.co> * fixes * docstring. * remaining todos in low_cpu_mem_usage * tests * updates to docs. --------- Co-authored-by:
Aryan <aryan@huggingface.co>
Showing
Please register or sign in to comment