-
Muyang Li authored
* tmp * update * update * finished the offloading impl * the offloading is buggy * update utils * the offloading is still buggy * update * correctness and speedup done; need to check the vram overhead * done * final debugging * update * update * correct now * fix * update * use per-layer offloading * fix the offloading on 5090 * support setting the num_blocks_on_gpu * change the import name
eb901251
This project manages its dependencies using pip.
Learn more