• Aryan's avatar
    Module Group Offloading (#10503) · 9a147b82
    Aryan authored
    
    
    * update
    
    * fix
    
    * non_blocking; handle parameters and buffers
    
    * update
    
    * Group offloading with cuda stream prefetching (#10516)
    
    * cuda stream prefetch
    
    * remove breakpoints
    
    * update
    
    * copy model hook implementation from pab
    
    * update; ~very workaround based implementation but it seems to work as expected; needs cleanup and rewrite
    
    * more workarounds to make it actually work
    
    * cleanup
    
    * rewrite
    
    * update
    
    * make sure to sync current stream before overwriting with pinned params
    
    not doing so will lead to erroneous computations on the GPU and cause bad results
    
    * better check
    
    * update
    
    * remove hook implementation to not deal with merge conflict
    
    * re-add hook changes
    
    * why use more memory when less memory do trick
    
    * why still use slightly more memory when less memory do trick
    
    * optimise
    
    * add model tests
    
    * add pipeline tests
    
    * update docs
    
    * add layernorm and groupnorm
    
    * address review comments
    
    * improve tests; add docs
    
    * improve docs
    
    * Apply suggestions from code review
    Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
    
    * apply suggestions from code review
    
    * update tests
    
    * apply suggestions from review
    
    * enable_group_offloading -> enable_group_offload for naming consistency
    
    * raise errors if multiple offloading strategies used; add relevant tests
    
    * handle .to() when group offload applied
    
    * refactor some repeated code
    
    * remove unintentional change from merge conflict
    
    * handle .cuda()
    
    ---------
    Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
    9a147b82
vq_model.py 7.72 KB