• Muyang Li's avatar
    feat: async CPU offloading for Python backend (#624) · eb901251
    Muyang Li authored
    * tmp
    
    * update
    
    * update
    
    * finished the offloading impl
    
    * the offloading is buggy
    
    * update utils
    
    * the offloading is still buggy
    
    * update
    
    * correctness and speedup done; need to check the vram overhead
    
    * done
    
    * final debugging
    
    * update
    
    * update
    
    * correct now
    
    * fix
    
    * update
    
    * use per-layer offloading
    
    * fix the offloading on 5090
    
    * support setting the num_blocks_on_gpu
    
    * change the import name
    eb901251
This project manages its dependencies using pip. Learn more
requirements.txt 97 Bytes