feat: async CPU offloading for Python backend (#624)
* tmp * update * update * finished the offloading impl * the offloading is buggy * update utils * the offloading is still buggy * update * correctness and speedup done; need to check the vram overhead * done * final debugging * update * update * correct now * fix * update * use per-layer offloading * fix the offloading on 5090 * support setting the num_blocks_on_gpu * change the import name
Showing
| ... | ... | @@ -5,4 +5,4 @@ facexlib |
| onnxruntime | ||
| # ip-adapter | ||
| timm | ||
| diffusers>=0.33.1 | ||
| diffusers==0.35 |
Please register or sign in to comment